Optical Flow Based Detection and Tracking of Moving Objects For Autonomous Vehicles
Optical Flow Based Detection and Tracking of Moving Objects For Autonomous Vehicles
Abstract—Accurate velocity estimation of surrounding moving range estimations. Therefore, in this study, we focus on
objects and their trajectories are critical elements of perception developing a DATMO technique primarily tailored for LiDAR
systems in Automated/Autonomous Vehicles (AVs) with a direct perception sensors. However, we believe such a technique can
impact on their safety. These are non-trivial problems due to
the diverse types and sizes of such objects and their dynamic be easily adapted to any type of perception sensor, such as
and random behaviour. Recent point cloud based solutions often depth cameras, that provides a point cloud output.
use Iterative Closest Point (ICP) techniques, which are known to When it comes to designing a DATMO technique, min-
have certain limitations. For example, their computational costs imising its processing cost/time and estimation error are two
are high due to their iterative nature, and their estimation error significant challenges. If an AV uses a LiDAR perception
often deteriorates as the relative velocities of the target objects
increase (>2 m/sec). Motivated by such shortcomings, this paper sensor, the velocity of the surrounding moving objects can
first proposes a novel Detection and Tracking of Moving Objects be calculated by corresponding two consecutive point cloud
(DATMO) for AVs based on an optical flow technique, which scans. In traditional approaches, an object detection function
is proven to be computationally efficient and highly accurate is used as the first processing stage to identify objects in
for such problems. This is achieved by representing the driving two consecutive LiDAR scans. Then, a tracking algorithm is
scenario as a vector field and applying vector calculus theories to
ensure spatiotemporal continuity. We also report the results of a applied to compute the velocity for the objects of interest [9].
comprehensive performance evaluation of the proposed DATMO Although these techniques are computationally efficient, their
technique, carried out in this study using synthetic and real- accuracy depends on the accuracy of the underlying object
world data. The results of this study demonstrate the superiority detection algorithms. While enhancing various elements of the
of the proposed technique, compared to the DATMO techniques techniques, for example, by utilising the geometric models
in the literature, in terms of estimation accuracy and processing
time in a wide range of relative velocities of moving objects. of the moving objects in the detection process, can improve
Finally, we evaluate and discuss the sensitivity of the estimation the performance of this category of DATMO techniques,
error of the proposed DATMO technique to various system and such techniques do not perform well in many scenarios [21].
environmental parameters, as well as the relative velocities of the For example, if a vehicle travels at the speed of 90 km/hr
moving objects. on a highway, the lateral velocity estimation error of such
Index Terms—Autonomous vehicles, optical flow, LiDAR, point techniques can exceed 2 m/sec. This is not regarded as an
cloud, DATMO, MODT, state estimation. acceptable input to the planning modules that determine cut-
in/cut-out intentions of vulnerable road users [22].
I. I NTRODUCTION The problem can be exacerbated because the geometric
models of the moving objects can significantly vary for various
CCURATE, reliable and fast perception of the surround-
A ing environment is one of the most important technical
challenges in the safe deployment of Autonomous/Automated
road users. This can have a direct impact on the estimation
errors of the aforementioned DATMO techniques. To address
these problems, a different category of DATMO techniques
Vehicle (AV) technologies. This problem includes the detec-
has emerged in the literature [23]–[26]. These techniques
tion of surrounding objects and estimation of their states, i.e.,
often use point cloud registration algorithms such as Itera-
their position and velocity. A particularly important element
tive Closest Points (ICP) [22] and track all moving points
of this problem is associated with the Detection and Tracking
in the point cloud [23]; hence, they are more accurate in
of Moving Objects (DATMO), a.k.a. Moving Object Detection
velocity estimation. However, these DATMO techniques are
and Tracking (MODT) in some studies of the literature [1].
computationally expensive because of their iterative nature
There is a wide range of DATMO techniques in the literature
[28]. Furthermore, the performance of the underlying point
tailored for AVs that use camera perception sensors [2]–
cloud registration methods can deteriorate when the deviation
[4], LiDAR perception sensors [5]–[7] and Radar perception
between two consecutive point cloud scans increases. For
sensors [8]. LiDAR perception sensors are particularly popular
instance, if the relative speed of the Ego Vehicle (EV) and a
for AVs as they inherently provide a wide field of view (FOV)
target object of interest is high (e.g., 12 km/hr), the dislocation
and robust point clouds that can be used for highly accurate
of the consecutive LiDAR scans is usually large, which can
1 Mehrdad Dianati holds part-time professorial posts at the School of result in a large error in ICP-based point cloud registration
Electronics, Electrical Engineering and Computer Science (EEECS), Queen’s algorithms [21]. This can result in poor performance of the
University of Belfast and WMG at the University of Warwick. Other authors latter category of the DATMO techniques.
are with WMG, University of Warwick, e-mail: {mreza.alipour, m.dianati,
sajjad.mozaffari, r.woodman}@warwick.ac.uk, [email protected] Motivated by the above challenges, in this paper, a novel
# Corresponding author: [email protected] DATMO technique is proposed for AVs that use LiDAR
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2
TABLE I
A BRIEF REVIEW OF THE L I DAR- BASED DATMO METHODS IN THE LITERATURE .
L : L EARNING - BASED , ICP : INTERACTIVE CLOSEST POINT METHOD , G : G RID - BASED REPRESENTATION
[7]g [19]g [20] • tracking all scanned points • high computational cost
[21]icp [22]icp • DATMO performance • higher the relative velocity
Model-Free doesn’t rely on geometric for moving objects → lower
(Point-Based) [23]icp [24]icp
[25]l [26] [27]gicp+l shape tracking performance
perception sensors. The proposed technique is inspired by ered. In the next step, the KITTI tracking dataset from real
optical flow algorithm [29]. In our approach, the 3D LiDAR driving scenarios [30] is used. Each one of the synthetic and
scans are initially converted to 2.5D motion grids (Fig. 3) KITTI datasets serves a different purpose in our study. The
inspired by [19]; then, the 2D velocity of each cell in the grid synthetic dataset enables generating a wide range of driving
is estimated by comparing two consecutive LiDAR scans. In scenarios and various target vehicles (shape, velocity, dimen-
the next step, a series of grid mask filters such as temporal sion, etc.), which is not practically feasible in real-world data
and rigid-body continuity filters are applied to eliminate false collection campaigns. The flexibility of this type of dataset
positive detection. The LiDAR points are classified based becomes even more important when it comes to analysing
on their velocity vectors, and each class is associated with error sensitivity to different factors that are easy to change or
a moving object. Finally, a Kalman Filter is used to track sweep in synthetically generated scenarios. On the other hand,
the velocity and position of the detected moving objects, testing the proposed DATMO techniques with data collected
considering their dynamic model. The main contributions of by contemporary sensors in real-world conditions helped us
this paper are summarized as follows: validate its performance in the real world. Comparing the
• Adopting optical flow technique to process the 3D point estimation error distribution shows that the proposed DATMO
cloud data instead of complex ICP algorithms. This outperform the state-of-the-art in both speed and yaw angle
is used to generate a grid-based velocity vector field estimation. Moreover, the computational cost (without parallel
representing a dynamic driving environment. calculations) shows improvements of about 10%, whereas
• Introducing a two-layer filter applied to the velocity parallelising the proposed method is easily available and could
vector field eliminating the false positives and erroneous improve this metric even more significantly. The proposed
vectors. These filters are designed based on the spatial error sensitivity analysis also revealed a meaningful correlation
continuity of the vector field (rigid-body assumption) and between the configuration of the TV and estimation error
temporal propagation to improve the estimation perfor- from which the researcher could benefit to develop motion
mance results. planning/prediction algorithms.
• Introducing novel error model for velocity estimation as The rest of the paper is organized as follows. An overview
a function of a configuration set for target vehicles (TVs) of the existing related work in the literature is described in Sec-
w.r.t the ego vehicle (EV). This offers further insights tion II. The system model and problem formulation are given
to be incorporated into the downstream modules such in Section III. The proposed DATMO method is explained
as motion planning/prediction in the autonomous vehicle in Section IV. The performance evaluation methodology and
framework. results are discussed in Sections V to VI. Finally, the key
The performance of the proposed technique compared to findings and conclusions of the study are given in Section.VII.
the ICP-based methods, such as the one in [21], [27], model-
free [14] and model-based indirect tracking methods [18], is II. RELATED WORKS
evaluated in two steps. First, the compared DATMO techniques In order to review available point cloud based DATMO/-
are evaluated on a synthetic dataset generated in MATLAB MODT approaches in the literature, in this paper, they are
scenario designer, where various driving contexts are consid- categorized into two main classes: 1) detection-based methods
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3
Fig. 1. High-level schematic system diagram of optical flow based DATMO for AVs. ω, vx , and vy all are n × m matrices. The same colour code (blocks
and signals) is used to expand and explain in different sections
also known as traditional methods [31]; 2) direct tracking However, the tracking accuracy declines for moving objects
methods which is further divided into model-based and point- with different shapes and geometry like cyclists or pedestrians.
based approaches. The second subclass of the direct tracking approach (point-
based tracking) is a geometric model free in which every
A. Detection-Based Tracking point is tracked in consecutive LiDAR scans. However, these
scanned point clouds could be used directly or represented
The detection-based or indirect algorithms track the ab-
in the form of 2D/3D grid space before being used in a
stracted objects, patterns, bounding boxes, or clusters [10],
grid-based tracking algorithms [19]. The key advantage of
[11] by applying different filters such as variants of the Kalman
the point-based DATMO stems from the fact that there is no
filter or particle filter. Therefore, the tracking performance
assumption about the geometric shape of the object, and the
for these methods relies on both classification algorithms (or
objects are classified/detected based on tracking corresponding
pattern recognition) and filter structure [9]. There is a great
scanned points on them. But tracking all points makes the
number of research aimed to improve the object tracking
computation process expensive and limits the method in terms
task by developing enhanced classification/clustering steps
of the maximum number of moving objects in a scene [20]. To
(before tracking) using learning-based [12], [13] or geometric
overcome this challenge, before tracking scanned points they
model estimation [1] algorithms, but all are still classified
are divided into static and moving categories by generating a
under the first category of DATMO methods. There are other
static obstacle map (SOM) [21] or filter objects of interested
studies in this category, such as [14], focused on reducing
with the help of deep learning methods [27]..
the computational complexity by applying the classification
and subsequent tracking only on the moving point. In a Point cloud registration (PCR) algorithms are widely used
detection-based approach the detection and tracking steps are in model-free DATMO methods. After clustering point clouds
independent and various sensor data could be used in the in consecutive scans, corresponding clusters are detected and
detection algorithm without changing the tracking part. PCR algorithms such as interactive closest point (ICP) [32]
are applied to each set (two clusters from the same object
in different time steps) to calculate precise relative motion
B. Direct Tracking [21]–[26]. Although, low standard deviation of error has been
In the direct tracking methods, the sensor model and/or reported for tracking velocity (0.4 m/sec) and orientation
object’s geometric model is used to estimate corresponding (1.81 deg) for the moving objects [21], these methods suffer
points in space without prior detection. This method could be from a number of considerable drawbacks. First of all, the
further divided into model-based and model-free (point-based) computational time is not deterministic and depends on the
approaches. number of moving objects. Secondly, The performance of the
In model-bases direct tracking DATMO algorithms, prior ICP algorithm highly depends on the initial conditions and
knowledge about the geometric shape and dynamic model the performance deteriorates when the relative velocity of the
of the moving vehicles are used to track the states and the moving objects (with respect to EV) increases. Finally, because
geometric shape of the objects [15], [16] without detecting the the PCR algorithms are based on the iterative optimization
objects first [9], [18]. Tracking the geometry helps to predict process, parallelizing these algorithms is not simple and
the dynamic properties with higher precision and discard straightforward. Various methods reviewed in this section are
tracked objects with strange shapes or geometry changes. summarized along with advantages/disadvantages in Table.I.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 4
Fig. 6. Expanded schematic system diagram of optical flow based DATMO for AVs. NOTE: for reading the system diagrams used in this paper the signals
are expanded (rectangles with dashed line frame) to illustrate data that is carried between processing blocks. And, MMR stands for memory.
Ẋn = f (Xn , U )
ẋn vn cos θn − v + ln ω
ẏn vn sin θn − ln ω (7)
v
θ̇n = ωn − ω ; U=
ω
v̇n ka
Fig. 7. Tracking process system diagram. kα
ω̇n
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 7
V. P ERFORMANCE E VALUATION
An experimental test is designed to evaluate and verify the
performance of the designed DATMO algorithm. Two main
objectives are targeted in this section. Comparing the proposed
DATMO with state-of-the-art (SOTA) methods, and obtaining
an error model for estimation accuracy.
First and foremost we statistically compare the performance
of the DATMO method with SOTA geometric model-free
approach (GMFA) developed in [21] which has been proven
to be more efficient than the geometric model-based tracking
(MBT) method proposed in [37]. However, the proposed
method is further compared against SOTA model free [14] and
model-based [18] direct tracking methods as well. The GMFA
algorithm is regenerated and evaluated, while the quantitative
performance of the other methods are obtained from [14] and
[18]. Therefore, the later experiments are consistent with those
specified in these studies.
Fig. 8. Target vehicle’s (TV) configuration (position, orientation, and velocity) In addition, we further investigate how the state estimation
relative to the ego vehicle (EV) error changes as a function of the deriving environment i.e
configuration of EV and TV. Regarding these objectives, the
experimental evaluation is conducted in two different steps.
Initially, synthetic data is generated to evaluate the algorithm
1) Measurements in EKF and Updating Tracks: In the in various custom situations, and in the next step, the algorithm
Kalman filter structure three measurements are used for each is tested on a real-world dataset of KITTI.
moving object i.e two linear velocities and one angular veloc-
ity in z direction. As illustrated in Fig.7, these measurements A. Datasets
are calculated by clustering masked velocity vector fields
{v̄x , v̄y , ω̄} provided by optical flow and taking the mean 1) Synthetic Data Generation and Simulation: In order to
value of each cluster. In the proposed approach, Euclidean evaluate the proposed method for estimating the state of the
distance is utilized for clustering vectors, and mean position target vehicles in diverse possible configurations, generating
and velocity are fed into the EKF algorithm to estimate the a synthetic dataset is essential. In addition, the estimation
state vector for each moving object using motion dynamics of error is calculated more accurately in the simulation compared
Eq.7. with real-world datasets such as KITTI for which the ground
truth of objects’ velocity has not been provided directly. In
All clustered points should be either assigned to an existing this study, the TV’s configuration space is defined by three
track or initialised on a new track. Similar to [21], in our variables (Fig. 8): distance to EV (ln ), relative orientation
approach, the clusters are assigned to the predicted tracks (βn ), and relative velocity (∆υn = υn − υ). The aim is
via GNN. Each cluster is assigned to at most one track to design scenarios covering all possible configurations for
based on a 4D feature vector [xm , ym , λ1 , λ2 ] containing the investigating the meaningful relations between the estimation
mean position and shape of the cluster (independent of the error and these three variables, in addition to assessing the
orientation) in the motion plane. Two components in the estimation accuracy. Therefore, the flexibility in changing dif-
features vector showing the shape of a cluster are eigenvalues ferent configurations provided by synthetic datasets is another
of the covariance matrix of the points in a cluster (λ1 , λ2 ). reason that justifies utilising this type of dataset.
So, a cluster is assigned to a track if the Euclidean distance The driving scenario designer toolbox in MATLAB is used
between their feature vectors is less than a threshold γ. to generate synthetic scenarios and add a LiDAR sensor to
collect point cloud data. As illustrated in Fig. 9-(b), three
The final step in managing the tracks is confirming and/or
different types of TVs are simulated in synthetic scenarios in-
deleting tracks. Every one of these two procedures is done by
cluding sedan, van, and cyclist. Moreover, for considering EV
a 2D integer vector. A track is confirmed when M1 number
motion effect completely, a nonzero curvature is considered
of measurements/detection is assigned to it in the last N1
to avoid zero yaw rate for EV. The LiDAR sensor parameters
updates (M1 < N1 ). And similarly, a confirmed track is
are adjusted according to what is used in the KITTI dataset
deleted if in the last N2 (M2 < N2 ) consecutive updates, no
collection sensor. The point cloud data from a simulated scene
measurement is assigned to it M2 times. It should be noted
has been plotted in Fig. 9-(c).
that the coordinate system used in this section is attached to
In order to cover all possible cases for the n-th target vehicle
EV with a configuration shown in Fig.8.
configurations ({ln , βn , ∆υn }), each scenario contains a target
The interaction between different processes is depicted in vehicle (sedan, van, or cyclist) moving in the same multi-lane
the assembled system diagram of Fig.6. This system diagram road in which EV moves in one of the lanes (with a speed of
is a detailed version of Fig.1. 20 m/s). TVs move with 10 different speeds (10 to 40 with a
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 8
Fig. 9. Synthetic primary scenario generated in Matlab scenario designer: trajectories design parameters for different TVs (a), a 3D meshed object in scene
(b), and bird’s eye view scanned point cloud excluding ground points in EV coordinate system (c)
Fig. 10. Proposed DATMO algorithm result for KITTI dataset: tracking IDs and velocity vectors plotted on the front camera image (left), and bird’s eye view
scanned point cloud excluding ground points (right)
step of 2 m/s) in a lane and drive in two modes: first, keep (secondary scenarios). In these scenarios the TV moves with
the same lane, and second, overtake back and forth between the speed of 6 m/sec along i) straight right-angled, ii) right
two lanes with trajectories defined by two parameters of s turn, and iii) circular paths (see [14] for details).
and n = w/2 shown in Fig.9-(a). In the case of changing
lanes/overtaking, two values of 2 and 4 seconds are used for 2) KITTI Dataset: The final evaluation is conducted using
s (assuming constant speed). And finally, the lateral offset of the real-world KITTI tracking dataset for multi-object tracking
the TV start lane from EV’s lane varies from -80m to 80m task. Besides the ground truth labels, only LiDAR data from
(with a step of 1m), The cyclists’ trajectories include only this dataset is used in the current study for the estimation
lane-keeping i.e without any lateral motion. There is only one task, however, the colour images of each frame are also
TV in each generated scenario to prevent occlusion, although, used to plot velocity vectors in image coordinate (Fig.10-left)
in Fig.9 three TVs are depicted which is a combination of using transformation matrix (velo-to-cam). Moreover, since
three scenarios to be more informative. there is no ground truth label for the velocity of objects
in the driving environment, it is obtained by tracking the
We refer to all synthetic scenarios described above as pri- centre of the 3D bounding boxes. In the KITTI dataset, the
mary scenarios. To further compare the performance metrics bounding box coordinates are provided in the camera frame
against model-free [14] and model-based [18] indirect tracking whereas the estimated velocity values are obtained in the
methods, simulation scenarios designed in [14] are replicated LiDAR coordinate system. Therefore, the calculated velocities
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9
P r = (T P )/(T P + F P )
(9)
Re = (T P )/(T P + F N )
The last metric to quantify the estimation performance is
the time each algorithm takes to process an instance of the
LiDAR scan to detect and estimate the state of the moving
objects.
C. Results
The evaluation results are divided into three sections. Firstly,
a stochastic comparative analysis is conducted with model-free
direct tracking methods in [21] (GMFA) and [27]. Secondly,
simulation results compare the proposed method with model- Fig. 12. Comparative error distribution of the proposed and GMFA [21]
free and model-based indirect tracking methods developed algorithms for the primary synthetic and KITTI datasets
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10
TABLE II
E XPERIMENTAL EVALUATION RESULTS OF THE PROPOSED METHOD AND GMFA [21] FOR BOTH SYNTHETIC SIMULATION AND KITTI DATASET
TABLE III relative velocity (∥∆υ∥ ≤ 1 and ∥∆υ∥ > 1 m/s), because the
C OMPARING TO OTHER MODEL - FREE AND LEARNING - BASED MOTION GMFA method of [21] is developed for detecting and tracking
ESTIMATION METHODS
of moving objects with “low relative speed”. Therefore, in
Speed Error order to check if the GMFA method is replicated properly,
Ref time [ms] Training needed
[m/sec] the estimation errors for low relative speeds (∥∆υ∥ ≤ 1 m/s)
Wang et al. [38] 1.69 80 Yes should be less than what was reported in [21]. It should be
Liu et al. [39] 4.37 - Yes noted that since there is no exact velocity ground truth label for
Li et al. [27] 0.42 240 Partially the KITTI tracking dataset, the calculated error for this dataset
Proposed 0.44 142 No even for low speeds is not comparable directly with values
reported in [21], and we use the replicated GMFA algorithm
instead to only compare the final estimation error with the
TABLE IV proposed approach performance. Moreover, the processing
C OMPARING THE MEAN AND MAX ESTIMATION ERRORS AGAINST MODEL time reported in this table is the average time the computing
FREE [14] AND MODEL - BASED [18] INDIRECT TRACKING METHODS .
R ESULTS OBTAINED FROM THE SECONDARY SCENARIOS IN [14]. unit (Intel Core(TM) i7-7600 CPU @ 2.80GHz) needs for each
cycle excluding the first step of each sequence which needs
secondary Speed [m/sec] Direction [deg] extra initialization time. The breakdown of computational
scenar- Method complexity for different processes within the framework is
ios mean max mean max
given in Table. VI. Since we believe that the core process
Wang et al. [14] 0.25 0.43 0.52 1.49
of optical flow in the proposed method could be parallelized
i Zhang et al. [18] 0.38 0.59 1.23 1.93 using off-the-shelf tools, this process was implemented on both
Proposed 0.09 0.31 0.18 0.51 CPU and GPU (GeForce RTX 2080 Ti) for the simulation sce-
Wang et al. [14] 0.40 0.70 0.80 2.53 narios. The results indicate an 80% improvement in processing
ii Zhang et al. [18] 0.52 1.00 1.53 4.02 time for GPU compared to CPU.
Proposed 0.21 0.47 0.83 1.70 The estimation error and computational complexity com-
Wang et al. [14] 0.29 0.40 1.65 2.50 parison with other model-free and learning-based motion esti-
iii Zhang et al. [18] 0.43 0.84 2.25 5.09
mation methods is summarised in Table. III. The performance
metrics of other methods and evaluation conditions are adopted
Proposed 0.19 0.44 0.41 1.82
from [27]. The results are based on the KITTI tracking dataset,
sequences 0000, 0005, and 0010. The objects within a radius
of 50 m are considered. The results, represented in Table. III,
methods is illustrated for KITTI and synthetic datasets sepa- include other learning-based motion estimation methods as
rately in the left and right columns, respectively. Furthermore, additional references. The comparison suggests that although
for each distribution, a normal distribution function has been our method’s performance in terms of speed accuracy is
fitted with the standard deviation value printed in the top left comparable to that of [27], there is a significant improvement
for both methods using the same colour codes. Similar to [21] in computational complexity. This is attributed to the iterative
the standard deviation values are used to compare the accuracy nature of the ICP method used in [27]. Moreover, all other
of DATMO methods. methods in this table are data-driven and will need retraining
Finally, the detection and estimation results of two methods the different situation or sensor configurations; otherwise, the
and two datasets are summarized in Table.II. Precision and performance may decline. While our method is deterministic
recall metrics are for evaluating moving object detection while and does not need training.
the standard deviation and time columns report the result of the
state estimation accuracy and computational cost, respectively. 2) Comparison with Indirect Tracking Methods: Table IV
The results are further reported for two different ranges of presents secondary simulation results for comparison with in-
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11
only one moving object, and the GMFA is based on point cloud
registration for which the processing time depends on the
number of detected moving point clusters, the computational
effort is more consistent and less than that of KITTI scenarios
in which the number of moving objects are more than one
in most sequences. All computations in this study were done
on CPU without parallel processing, whereas the proposed
method which is based on an optical flow algorithm, has
the potential to be implemented on GPU to accelerate the
computations. This is another advantage of this approach over
point cloud registration-based methods such as GMFA that
use optimization and this makes it more difficult to parallelize Fig. 14. Velocity of points on moving rigid body (vehicle)
the computations. Recently, Contemporary GPUs dedicate
hardware to particularly accelerate optical flow algorithms up
to 10 times faster [40]. Therefore, using parallel computation the grid size and implementing the algorithm using parallel
will accelerate the processing even more for the proposed computing to calculate optical flow.
approach.
Overall, the comparison results indicate that the proposed ACKNOWLEDGMENT
method’s performance in state estimation and computational
This research is sponsored by the Centre for Doctoral
cost is comparable with the state-of-the-art method (GMFA),
Training to Advance the Deployment of Future Mobility
and as the last part of analysing the results error sensitivity
Technologies (CDT FMT) at the University of Warwick.
of the proposed method is considered. The estimation error
sensitivity to the configuration of the TV, illustrated in Fig.13,
shows that the error magnitude is more sensitive to the A PPENDIX
orientation of the TV when the target vehicle is located at D ERIVING A NGULAR VELOCITY FROM VELOCITY VECTOR
farther distances (ln > 45 m). The way that the error value FIELD
changes with respect to the orientation of the TV (βn ) is also
interesting. The error increases at three specific orientations: Assuming rigid body motion, the angular velocity
n ocould be
βn = 0, 90, 180 deg. regarding Fig.8, the first (βn = 0 deg) obtained from the velocity vector field. If î, ĵ, k̂ are unit
and last (βn = 180 deg) orientations correspond to the vectors in {x, y, z}, respectively, and considering notation
configuration in which TV facing or backing on to the EV, used in Fig. 14, the angular velocity for planar motion is
whereas the second orientation (βn = 90 deg) is for the derived as follows:
case in which TV’s side is toward the EV i.e LiDAR sensor v = vc + ω × r
location. One of the possible reasons for this correlation could = vc + ω × (R − Rc )
be the fact that in these configurations the scanned point cloud = (vc − ω × Rc ) + ω × R
is no longer scattered in 3D space and mostly on a 2D plane.
For instance, in βn = 90 deg most of the scanned points are Rewriting this equation by substituting R = xî+y ĵ, and Vc =
from the side of the vehicle. However, to elaborate more on the vc − ω × Rc = Vcx î + Vcy ĵ:
error model and consider all involved factors more research is
required in future studies. v = Vc + −ωy î + ωxĵ
= (Vcx − ωy) î + (Vcy + ωx) ĵ
VII. C ONCLUSION
And by applying the curl operator to both sides, the angular
In this study, a novel DATMO technique was proposed using velocity is obtained based on the curl of the vector field v. It
a Farneback optical flow algorithm. This study revealed the should be noted that the rigid body assumption makes the curl
promising potential of this approach in terms of accuracy and independent of the position and linear velocity of the centre
processing costs. Similar to traditional GMFA techniques, the c.
optical-flow-based technique approach proposed and studied
in this paper demonstrated good resilience against variations ∇×v = [∂/(∂x) (Vcy + ωx) − ∂/(∂y) (Vcx − ωy)] k̂
of the object sizes in driving scenes. Analysis of the error sen- = 2ω k̂
sitivity to the configuration of the target vehicle in this study ⇒ ω = 0.5 (∇ × v)
revealed meaningful correlations which could be used in future
for error modelling. Our results showed that the error values
increase when the TV moving in radial (βn = 0, 180 deg) R EFERENCES
and tangential (βn = 90 deg) directions in distances farther [1] M. Sualeh and G.-W. Kim, “Dynamic multi-lidar based multiple object
than 50 m. It shall be noted that Small size objects such detection and tracking,” Sensors, vol. 19, no. 6, p. 1474, 2019.
[2] M. Y. Abbass, K.-C. Kwon, N. Kim, S. A. Abdelwahab, F. E. A.
as pedestrians were not covered in our study. Further studies El-Samie, and A. A. Khalaf, “A survey on online learning for visual
could explore estimating the state of pedestrians by reducing tracking,” The Visual Computer, vol. 37, no. 5, pp. 993–1014, 2021.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13
[3] C. Premachandra, S. Ueda, and Y. Suzuki, “Detection and tracking [25] J. Groß, A. Ošep, and B. Leibe, “Alignnet-3d: Fast point cloud regis-
of moving objects at road intersections using a 360-degree camera tration of partially observed objects,” in 2019 International Conference
for driver assistance and automated driving,” IEEE Access, vol. 8, pp. on 3D Vision (3DV). IEEE, 2019, pp. 623–632.
135 652–135 660, 2020. [26] J. Kim, H. Lee, and K. Yi, “Online static probability map and odometry
[4] S. Sivaraman and M. M. Trivedi, “Looking at vehicles on the road: A estimation using automotive lidar for urban autonomous driving,” in
survey of vision-based vehicle detection, tracking, and behavior anal- 2021 IEEE International Intelligent Transportation Systems Conference
ysis,” IEEE transactions on intelligent transportation systems, vol. 14, (ITSC). IEEE, 2021, pp. 2674–2681.
no. 4, pp. 1773–1795, 2013. [27] J. Li, X. Huang, and J. Zhan, “High-precision motion detection and
[5] M. Kusenbach, M. Himmelsbach, and H.-J. Wuensche, “A new geo- tracking based on point cloud registration and radius search,” IEEE
metric 3d lidar feature for model creation and classification of moving Transactions on Intelligent Transportation Systems, 2023.
objects,” in 2016 IEEE Intelligent Vehicles Symposium (IV). IEEE, [28] E. Arnold, S. Mozaffari, and M. Dianati, “Fast and robust registration
2016, pp. 272–278. of partially overlapping point clouds,” IEEE Robotics and Automation
[6] A. Börcs, B. Nagy, and C. Benedek, “Instant object detection in lidar Letters, vol. 7, no. 2, pp. 1502–1509, 2021.
point clouds,” IEEE Geoscience and Remote Sensing Letters, vol. 14, [29] G. Farnebäck, “Two-frame motion estimation based on polynomial
no. 7, pp. 992–996, 2017. expansion,” in Scandinavian conference on Image analysis. Springer,
[7] S. Steyer, G. Tanzmeister, and D. Wollherr, “Grid-based environment 2003, pp. 363–370.
estimation using evidential mapping and particle tracking,” IEEE Trans- [30] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous
actions on Intelligent Vehicles, vol. 3, no. 3, pp. 384–396, 2018. driving? the kitti vision benchmark suite,” in Conference on Computer
Vision and Pattern Recognition (CVPR), 2012.
[8] M. C. Hutchison, J. A. Pautler, and M. A. Smith, “Traffic light signal
[31] A. Petrovskaya, M. Perrollaz, L. Oliveira, L. Spinello, R. Triebel,
system using radar-based target detection and tracking,” Oct. 26 2010,
A. Makris, J.-D. Yoder, C. Laugier, U. Nunes, and P. Bessière, “Aware-
uS Patent 7,821,422.
ness of road scene participants for autonomous driving,” Handbook of
[9] Y. Ye, L. Fu, and B. Li, “Object detection and tracking using multi-layer Intelligent Vehicles, pp. 1383–1432, 2012.
laser for autonomous urban driving,” in 2016 IEEE 19th International [32] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,”
Conference on Intelligent Transportation Systems (ITSC). IEEE, 2016, in Sensor fusion IV: control paradigms and data structures, vol. 1611.
pp. 259–264. Spie, 1992, pp. 586–606.
[10] L. Spinello, R. Triebel, and R. Siegwart, “Multiclass multimodal detec- [33] D. Fortun, P. Bouthemy, and C. Kervrann, “Optical flow modeling and
tion and tracking in urban environments,” The International Journal of computation: A survey,” Computer Vision and Image Understanding,
Robotics Research, vol. 29, no. 12, pp. 1498–1515, 2010. vol. 134, pp. 1–21, 2015.
[11] B. Douillard, D. Fox, F. Ramos et al., “Laser and vision based outdoor [34] J. Tanaś and A. Kotyra, “Comparison of optical flow algorithms
object mapping.” in Robotics: Science and Systems, vol. 8, 2008. performance on flame image sequences,” in Photonics Applications
[12] M. Himmelsbach, A. Mueller, T. Lüttel, and H.-J. Wünsche, “Lidar- in Astronomy, Communications, Industry, and High Energy Physics
based 3d object perception,” in Proceedings of 1st international work- Experiments 2017, vol. 10445. SPIE, 2017, pp. 243–249.
shop on cognition for technical systems, vol. 1, 2008. [35] K. M. Urwin, Advanced calculus and vector field theory. Elsevier,
[13] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv- 2014.
rcnn: Point-voxel feature set abstraction for 3d object detection,” in [36] J. Casey, “A treatment of rigid body dynamics,” Journal of Applied
Proceedings of the IEEE/CVF Conference on Computer Vision and Mechanics, vol. 50, pp. 905–907, 1983.
Pattern Recognition, 2020, pp. 10 529–10 538. [37] H. Cho, Y.-W. Seo, B. V. Kumar, and R. R. Rajkumar, “A multi-sensor
[14] H. Wang and B. Liu, “Detection and tracking dynamic vehicles for fusion system for moving object detection and tracking in urban driving
autonomous driving based on 2-d point scans,” IEEE Systems Journal, environments,” in 2014 IEEE International Conference on Robotics and
2022. Automation (ICRA). IEEE, 2014, pp. 1836–1843.
[15] A. Petrovskaya and S. Thrun, “Model based vehicle detection and [38] Q. Wang, J. Chen, J. Deng, X. Zhang, and K. Zhang, “Simultaneous
tracking for autonomous urban driving,” Autonomous Robots, vol. 26, pose estimation and velocity estimation of an ego vehicle and moving
no. 2, pp. 123–139, 2009. obstacles using lidar information only,” IEEE Transactions on Intelligent
[16] J. An and E. Kim, “Novel vehicle bounding box tracking using a low- Transportation Systems, vol. 23, no. 8, pp. 12 121–12 132, 2021.
end 3d laser scanner,” IEEE Transactions on Intelligent Transportation [39] X. Liu, C. R. Qi, and L. J. Guibas, “Flownet3d: Learning scene flow
Systems, vol. 22, no. 6, pp. 3403–3419, 2020. in 3d point clouds,” in Proceedings of the IEEE/CVF conference on
[17] S. Steyer, C. Lenk, D. Kellner, G. Tanzmeister, and D. Wollherr, computer vision and pattern recognition, 2019, pp. 529–537.
“Grid-based object tracking with nonlinear dynamic state and shape [40] A. Medhekar, V. Chiluka, and A. Patait, “Accelerate opencv: Op-
estimation,” IEEE Transactions on Intelligent Transportation Systems, tical flow algorithms with nvidia turing gpus,” https://developer.nvidia.
vol. 21, no. 7, pp. 2874–2893, 2019. com/blog/opencv-optical-flow-algorithms-with-nvidia-turing-gpus/, ac-
[18] X. Zhang, W. Xu, C. Dong, and J. M. Dolan, “Efficient l-shape fitting cessed: 2019-12-05.
for vehicle detection using laser scanners,” in 2017 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2017, pp. 54–59.
[19] A. Asvadi, P. Peixoto, and U. Nunes, “Detection and tracking of moving
objects using 2.5 d motion grids,” in 2015 IEEE 18th International
Conference on Intelligent Transportation Systems. IEEE, 2015, pp. Mohammadreza Alipour Sormoli received the
788–793. M.Sc. degree from the Amirkabir University of
[20] R. Kaestner, J. Maye, Y. Pilat, and R. Siegwart, “Generative object Technology (Tehran Polytechnic) in 2017. worked
detection and tracking in 3d range data,” in 2012 IEEE International as a research assistant at Koc University and is
Conference on Robotics and Automation. IEEE, 2012, pp. 3075–3081. currently working toward the PhD degree in au-
tonomous driving technology at the University of
[21] H. Lee, J. Yoon, Y. Jeong, and K. Yi, “Moving object detection and
Warwick (WMG). His research interests include
tracking based on interaction of static obstacle map and geometric
robotics, mechatronics, control and dynamics of
model-free approachfor urban autonomous driving,” IEEE Transactions
autonomous systems.
on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3275–3284,
2020.
[22] H. Lee, H. Lee, D. Shin, and K. Yi, “Moving objects tracking based
on geometric model-free approach with particle filter using automotive
lidar,” IEEE Transactions on Intelligent Transportation Systems, 2022.
[23] F. Moosmann and C. Stiller, “Joint self-localization and tracking of
generic objects in 3d range data,” in 2013 IEEE International Conference
on Robotics and Automation. IEEE, 2013, pp. 1146–1152.
[24] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard, “Motion-based
detection and tracking in 3d lidar scans,” in 2016 IEEE international
conference on robotics and automation (ICRA). IEEE, 2016, pp. 4508–
4513.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 14