0% found this document useful (0 votes)
7 views

Multi-Object_Tracking_Algorithm_for_Unmanned_Vehic

Uploaded by

Ziad Ayman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Multi-Object_Tracking_Algorithm_for_Unmanned_Vehic

Uploaded by

Ziad Ayman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Multi-Object Tracking Algorithm for Unmanned


Vehicle Autonomous Driving Scene Based on
Online Spatiotemporal Feature Correlation
1 2,3 1 2
Haijun Li , Zhuye Xu , Changxi Ma , Xiao Tang
1
School of Transportation, Lanzhou Jiaotong University, Lanzhou 730070, China
2
School of New Energy and Power Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
3
National Engineering Research Center of Highway Maintenance Technology, Changsha University of Science & Technology, Changsha, 410114,
Hunan, China

Corresponding author: Zhuye Xu (e-mail: [email protected]).


This work was supported in part by the Natural Science Foundation of Gansu Province (Grant No.22JR5RA343); Open Fund of National
Engineering Research Center of Highway Maintenance Technology (Grant No.kfj220108); Gansu Provincial Science and Technology Plan
(Key R&D Plan-Industrial) (Grant No.22YF7GA142).

ABSTRACT Aiming at the problem that the multi-object tracking algorithm is difficult to accurately
design the object feature model and data association algorithm in the process of unmanned vehicle
autonomous driving, a multi-object tracking algorithm based on online spatiotemporal feature correlation
for unmanned vehicle autonomous driving scene (MOTA-BOSFCFUVADS) is proposed. Firstly, the
algorithm performs object detection on the training samples, calibrates the coordinates of the detection
results in the time dimension and the coordinates of the space dimension, eliminates the detection results
whose confidence is less than the set value, and eliminates the overlapping boundaries in the detection
through non-maximum suppression. Secondly, we use Kalman filter to predict the position of the tracking
object in the current frame, then build the feature model of the object in the time dimension and the space
dimension respectively, and fuse the temporal feature model of the tracking object with the spatial feature
model, thereby, the spatiotemporal feature model of the tracking object is obtained. Finally, the
spatiotemporal feature response of the object in the current frame is detected online, and the spatiotemporal
feature response is correlated with the spatiotemporal object feature model of the tracking object, and then
the similarity metric matching matrix obtained by fusion is calculated, and the tracking is solved by using
the Hungarian algorithm. The optimal correlation pair between the object historical trajectory and the
detection response, and update the parameters of the object spatiotemporal feature model. In addition, we
use the MOT2015 database to test the effectiveness of the algorithm. The results show that the proposed
algorithm has better tracking performance than the other two algorithms, and can effectively track multiple
object continuously in time and space.

KEYWORDS Autonomous driving of unmanned vehicles, Multi-object tracking, Spatiotemporal feature


model, Data association

I. INTRODUCTION automatic driving[8]. In the application scenario of the


Recent years, with the rapid development of artificial multi-target tracking algorithm, the trajectory of the
intelligence[1], many countries have begun to vigorously tracking object changes frequently. Furthermore, new
develop and build smart cities and intelligent tracking objects may enter the scene at any time, or some
transportation, resulting in a substantial increase in the tracking objects may leave the tracking scene, and the
amount of data of public transportation surveillance similarity between tracking targets is very high. Occlusion
videos[2]. It is essential to effectively analyze and use occurs between the tracking targets during the tracking
traffic video data to extract valuable information to help process. Therefore, the data correlation between the
optimize urban traffic and improve public safety [3]. As the tracking object detection of the current frame in the video
key algorithm of artificial intelligence, the multi-object surveillance data and the existing tracking object motion
tracking algorithm is the core of the research field of trajectories has become the focus and difficulty of the
image and video processing. It has been widely used in multi-object tracking algorithm.
many scenarios such as intelligent transportation [4], video The multi-object tracking algorithm can be divided into
behavior analysis[5], aerospace[6], intelligent robots[7], and the multi-object tracking algorithm based on offline data

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

association and the multi-object tracking algorithm based efficiency of Hungarian algorithms in data association, the
on online data association according to the data accuracy and efficiency of multi object tracking have been
association method. Multi-object tracking algorithms greatly improved. In order to comprehensively evaluate
based on offline data association perform global the effectiveness of 3D multi-target tracking, they
optimization for all target detection responses in all proposed a new 3D multi-target tracking evaluation tool
frames, and can effectively deal with problems such as that can more accurately measure the performance
object loss caused by object occlusion, detector false improvement brought by the combination of 3D Kalman
detection, and false identity switching. The other tracking filter and Hungarian algorithm; On the basis of Kalman
algorithm is online data association and it has become the filtering, Zhao [13] et al. combined optical flow histogram
mainstream research direction of multi-object tracking features and Local Binary Patterns (LBP) respectively to
algorithm[9]. complete multi-object tracking tasks accurately and in real
The multi-object tracking algorithm based on online time; According to the optical flow histogram, Mohamed
[14]
data association mainly adopts the detection tracking et al. established a correction model of the detector and
framework, its basic idea is to use a target detector to the tracker in the framework of particle filtering, regarded
detect the target of interest in each frame of the video. each detection area as a sampling sample, and used the
Then, the detected target is characterized by modeling, frame-by-frame data correlation between the detection
and by calculating the feature similarity between the target response and the tracker to realize online multi-level
feature model and the detection feature, and solving the detection. In addition, Avidan S [15] proposed a tracking
final cost matrix according to the similarity matrix, the algorithm based on support vector machine, and applied
difference between the historical trajectory of the target the Support Vector Machine (SVM) classifier to the
and the detection response of the current frame is optical flow tracker, by using the pre-trained SVM to
determined. The optimal matching degree of association, detect the vehicles in the video sequence, and at the same
so as to complete the multi-object tracking task. time use the optical flow equation to calculate the offset
In the process of online data association to determine between two adjacent frames to achieve continuous
the target trajectory, many scholars combine the target tracking of multi-target vehicles; Naiel[16] et al. proposed
detection response and filtering technology to solve the an online real-time multi-object tracking algorithm. This
problem of object position prediction and estimation by method present a collaborative model between a pre-
using Kalman filtering [10], and assist in screening the trained object detector and a number of single-object
correct object detection response as the associated object, online trackers within the particle filtering framework, and
so as to achieve detection and tracking purpose. Kalman effective update the appearance model of each tracker;
filtering combines previous states with observed data to Yang[17-19] et al. proposed an exchange object context
generate the optimal estimate of the current state of the model, which made full use of the context information of
system. In target tracking scenarios, Kalman filtering can objects between two adjacent frames, and then used a
be used to predict the next position of the target and adjust novel color histogram descriptor to calculate the similarity
this prediction based on new observation data, thereby and background smoothness between objects in two
tracking the target more accurately. The Hungarian adjacent frames. In order to correlate the object detected
algorithm is mainly used to find the optimal connection between two adjacent frames; Bae [20] et al. proposed a
between the historical trajectory of the tracked target and multi-object tracking method based on trajectory
the detection response, while ensuring the overall confidence coefficient and online discriminative
performance of the algorithm, to find the optimal appearance learning, and established object trajectory
correlation between each tracked target and the detection segments and trajectory confidence coefficients.
response, thereby improving tracking accuracy. Wang[11] According to the different confidence coefficients value,
et al. designed an object detection and tracking method the object trajectory segment and the detection response,
that combines Kalman filtering and Hungarian algorithm. or the object trajectory segment and other trajectory
Firstly, they effectively detect moving objects and segments are associated with hierarchical data, which can
perform adaptive labeling by using differential methods effectively reduce the tracking error caused by complex
between adjacent frames. Then assign an independent situations such as background changes and frequent
Kalman filter to each labeled object to accurately predict occlusions.
the future position of the target. On this basis, by applying The above relevant literature calculates similarity based
the Hungarian algorithm to solve the data association on manually designed visual features and motion features
problem, not only has the accuracy of target location obtained by filtering algorithms, and then associates the
prediction been improved, but also the efficiency of target new tracked object with existing trajectories in the time
detection and data association has been optimized; dimension. Although this method has certain effects, it
Weng[12] et al. directly combined the three-dimensional does not consider the spatial characteristics of the tracked
Kalman filter with the Hungarian algorithm for object target, so it cannot directly obtain the trajectory and
state estimation and data association. By utilizing the quantity of the target. However, spatial information is a
advantages of Kalman filters in state estimation and the very important part. Spatial information can represent the

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

orientation of a tracked object in the physical world, Figure 1 shows the flow of the multi-object tracking
defining attributes such as the position and size of the algorithm for autonomous vehicle autonomous driving
target. On this basis, combining time information can scenarios based on online spatiotemporal feature association.
better understand and predict the behavior of the target. The whole process mainly includes the data processing stage,
Based on this theory, considering the temporal and spatial the object feature model establishment stage and the
characteristics of tracking targets, and considering these spatiotemporal data association stage. In the data processing
two information comprehensively, compared with relying stage: the object detection is performed on the training
solely on visual features and filtering algorithms, it can samples, and the coordinates of the detection results in the
improve the effectiveness of multi-target tracking. By time dimension and the coordinates in the space dimension
associating the temporal and spatial features of the tracked are calibrated. Eliminate detection results with confidence
target, eliminating redundant tracked targets, and less than the set value and eliminate overlapping bounding
obtaining motion trajectories over a period of time, we can boxes in detection through non-maximum suppression; In the
more accurately obtain the number of targets and track object feature model establishment stage: the position of the
their movements, thus achieving multi-target tracking. object in the current frame is predicted and tracking by
Section 2 introduces the proposed method in detail and Kalman filtering, and the target feature model is constructed
explains it through a flow chart, section 3 shows the in the temporal and spatial dimensions; In the spatiotemporal
source of the experimental data and analyses the results. data association stage, the feature response of the tracking
Section 4 concludes the paper with a summary of the object in the current frame is first detected online, and its
method. spatiotemporal feature is extracted, and then the
spatiotemporal feature response is correlated with the object
II. MOTA-BOSFCFUVADS feature model established in the previous stage. Then
The core problem of the multi-object tracking algorithm is calculate the similarity metric matching matrix obtained by
to design an efficient and simple target feature model and fusion, and then use the Hungarian algorithm to optimize the
data association algorithm. By establishing a feature similarity metric matching matrix to obtain the optimal
model for the target of interest in each frame, and using spatiotemporal data association result. When the optimal
target features as metrics. Then, according to the data spatiotemporal data correlation result is less than the set
association algorithm, the same target in two adjacent threshold, the best tracking result is output; when the optimal
frames is matched. spatiotemporal data correlation result is greater than the set
threshold, the parameters of the object spatiotemporal feature
A. THE PROCESS OF ALGORITHM MOTA- model are adjusted and updated. Thus, the object feature
BOSFCFUVADS model can be optimally associated with the spatiotemporal
features of the current frame.
Start

Traini ng sampl e

Data Invalid Time and


Object
Processi ng stage Target Space
detection
culling calibration

Object feature Const ruct the


Kalman filter
model Object
predicts feature
establishment spatiotemporal
locat ion
stage feature model

Spatiot emporal Hungarian


Data
data association algorithm
association
stage optim al

Spatiotemporal No
feature name of
thres hold=threshold
Online value
detection
Yes
Object Output
sample tracking resul t

End
FIGURE 1. The flow chart of MOTA-BOSFCFUVAD

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

After the tracking object is calibrated in time and space


B. DATA PROCESSING dimensions, we use Kalman filter to predict the object spatial
During the driving process, the unmanned vehicle needs to position of each frame, by transforming the spatial position in
detect the tracking object in advance within a safe range, so the object feature model into a search area near the predicted
as to provide a basis for subsequent decision-making. position, to obtain the probability of the true detection
Therefore, as long as the unmanned vehicle is the center of response. Similar to Figure 2, we set the search area as a
the circle, the safety distance is the radius to make a circle, as circle, the center of the circle as the predicted position of the
shown in Figure 2. The blue area in the figure represents the Kalman filter, and the radius is proportional to the number of
range beyond the safe distance. If the object falls in the blue frames where the object is in an unrelated state. Suppose the
area, it will be directly eliminated. If the object falls within a center position and radius of the search area are respectively
safe distance, the object is retained. ( )and , and the coordinates of the center position of the
rectangular area of the detection response of the current
frame tracking object is ( ). Then the condition for to be
in the search area is:

√( - ) ( - ) (4)

And is proportional to N, where N is the number of


frames in which the target is in an unrelated state. And the
is the current frame tracking object. After the search area is
determined, the spatial and temporal features of the object
FIGURE 2. Unmanned vehicles detect and track object rectangular area in the training samples are extracted, and
When the unmanned vehicle detects the tracking target then the spatial and temporal feature models are constructed.
within a safe distance, it is assumed that the coordinates of For spatial features, according to the rectangular image of
the tracking target detected by the unmanned vehicle at time the tracking object in the training sample, the discriminative
are , the coordinates of the tracking target obtained at projection vector is obtained in the tracking object area
time are , and the coordinates of the tracking target through the Incremental Linear Discriminative Analysis
measured at time are , and satisfy . In this (ILDA) algorithm, and then the vector is weighted with the
study, we assume that the tracking target is at a safe distance color histogram of the area to obtain the space feature.
at time , the tracking target is at a non-safe distance at time Subsequently, the spatial features of the object rectangular
, and is the critical value of the safe distance. Therefore, image in consecutive frames are collected, and the mean
we need to calculate the coordinates of time according to value of these features is used as the object spatial feature
and . Then through linear interpolation between and , model, denoted as .
the value of the coordinates of the tracking target at time For the temporal features, we use the Histograms of
can be obtained, as shown in formula (1): Optical Flow (HOF) to represent the temporal features,
which are used to describe the size and direction of the
-
( - ) (1) tracking object movement speed. Firstly, the optical flow
-
feature of the tracking object area is calculated by the Horn
Although the coordinate systems of are different, algorithm[21], so as to obtain the optical flow size and
the origin of their coordinate systems can be regarded as the direction of each pixel in the area. Then divide the entire area
same location. Then the coordinates ( ) of the space- into direction intervals equally between , and
transformed tracking target at time can be expressed by count the cumulative size of the optical flow features in the
equations (2) and (3): entire area in each direction interval. Finally, the histogram is
( ) (2) used to represent the HOF feature model of the target,
denoted as .
( ) (3) Finally, the spatial feature model of the rectangular
image of the tracking target and the HOF feature model
Among them, is the azimuth angle of the tracking target, are fused to obtain the spatiotemporal feature model of the
is the driving angle of the unmanned vehicle. is the tracking target, denoted as . For the tracking object
straight-line distance of the unmanned vehicle from the
in the frame, its spatiotemporal feature model can be
tracking target, it can be calculated in terms of . In addition,
when we calculate the coordinates of according to expressed as { } . Since the HOF feature
equation 1, we can obtain the horizontal and vertical represents the size and direction of the moving speed of the
coordinates of respectively. tracking object, it can provide additional motion information
for the spatiotemporal feature model, which is beneficial for
tracking multiple object that vary greatly in space.
C. OBJECT SPATIOTEMPORAL FEATURE MODEL
D. SPATIOTEMPORAL DATA ASSOCIATION

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

In the data association stage, we adopt the spatiotemporal R2014a and 64 bit Windows10 SP1. In addition, the
data association framework. Firstly, the feature response of performance metrics we verified are: Multiple Object
the tracking object in the current frame is detected online, Tracking Precision (MOTP), MT (Mostly Tracking),
and its spatiotemporal features are extracted. Then the Multiple Object Tracking Accuracy (MOTA), False
detection response is correlated with the object feature model Positives (FP), False Negatives (FN) and Identity Switches
established in the previous stage, and the spatial feature (IDS). Among them, FP stands for false positive rate, which
similarity and temporal feature similarity between the is the number of instances where a negative sample is
detection response and the object feature model are incorrectly predicted to be positive. FN stands for false
calculated respectively, and then the similarity metric negative rate, which is the number of instances in which a
matching matrix obtained by fusion is calculated. positive example is incorrectly predicted as a negative
The calculation process of the spatial feature similarity example. MOTP is used to evaluate the average overlap rate
between the detection response and the object feature model between the estimated trajectory and the real trajectory; MT
is as follows: As can be seen from the above: For the represents the proportion of time that the tracked object has
tracking object in the frame, its spatiotemporal feature been tracked for more than half of the given sequence, and
model can be expressed as { }. We assume this tracked object is called "Mostly Tracked. MT is used to
that the spatiotemporal feature response of the tracking object evaluate the integrity of the tracking trajectory.", which can
reflect the ability of target re-detection; IDS is used to
in the current frame is { }, the spatial features
evaluate the performance of a tracker in tracking multiple
are ,the HOF features are . Then calculate the targets. If the tracker mistakenly switches the ID of one
similarity between the detection response and the tracking target to another in consecutive frames, an "Identity
object spatial feature model, as shown in equation (5): Switch" occurs, indicating confusion in the tracker's
handling of multi-target tracking tasks; MOTA is a
( ) ( ) (5) cumulative accuracy measure that integrates false positives,
‖ ‖‖ ‖
false negatives, and identity miss witching.
The temporal feature similarity between the detection ∑( )
response and the object feature model is calculated as follows: - ∑
(7)
The Bhattacharyya distance is used to calculate the HOF
feature similarity, as shown in equation (6): where is the index number of the video frame and is
the number of object ground truths. To verify the
( ) performance superiority of the proposed algorithm,
( ) ( ) (- ) (6)
√ comparative experiments are required. Due to the use of
deep learning in both the Nicolai and Bae algorithms,
Where is the standard deviation of the Gaussian which perform well on target tracking tasks with similar
function, ( ) represents the HOF feature Bhattacharyya characteristics and share similar characteristics with our
distance between the detection response and the tracking proposed algorithm, we chose Nicolai, Bae, and our
object . After finding the temporal and spatial feature algorithm for comparative analysis, and use the above
similarity between the detection response and the object performance indicators as the comparison basis. The results
feature model, the similarity measurement matrix of the are shown in table 1. The bold part in table 1 represents the
spatial and HOF features can be obtained. Then the best results observed by each performance indicator.
Hungarian algorithm is used to optimize the similarity metric TABLE I
COMPARISON OF MULTI-OBJECT TRACKING INDICATORS OF DIFFERENT
matching matrix to obtain the optimal spatiotemporal data ALGORITHMS
association result. When the optimal spatiotemporal data
MOTA-
correlation result is less than the set threshold, the best Nicolai Bae
BOSFCFUVADS
tracking result is output; when the optimal spatiotemporal
MOTP 70.1% 74.1% 73.3%
data correlation result is greater than the set threshold, the MOTA 27.7% 26.3% 27.9%
parameters of the object spatiotemporal feature model are MT 18.6% 15.1% 19.0%
adjusted and updated. Thus, the object feature model can be FP 21.1% 23.3% 19.8%
FN 61.7% 66.2% 59.9%
optimally associated with the spatiotemporal features of the
IDS 609 651 611
current frame.
It can be seen from table 1 that Nicolai has the lowest
III. EXPERIMENT ANALYSIS IDS, because Nicolai obtains the target appearance
In this paper, the MOT2015 test platform is used to test the information through the joint deep network and the object
effectiveness of the multi-object tracking algorithm for motion information obtained by Kalman filtering, which
unmanned vehicle autonomous driving scenarios based on significantly reduces the number of object identity jumps.
online spatiotemporal feature correlation[22-23]. The Due to the integration of temporal and spatial feature
simulated hardware environment is as follows: CPU Intel(R) models, our algorithm has increased complexity compared
Core(TM) i5-2430M 2.40 GHz, RAM 8GB, MATLAB to the Nicolai algorithm, decreased sensitivity to specific

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

noise, and resulted in identity switching and improved IDS. improve our understanding and predictive ability towards
And although our algorithm utilizes the Hungarian the target. And our algorithm uses online methods for
algorithm to solve data association, it has certain tracking spatiotemporal characteristics. This means that the
shortcomings in motion prediction, which may lead to algorithm updates and adjusts the tracking model in real-
errors in matching the identity of the same object in the time to adapt to changes in the target, thereby improving
front and back frames, resulting in an increase in IDS. Bae tracking accuracy and stability. In object detection, the
has the best MOTP, because Bae can effectively reduce the detection response of the current frame is obtained by
tracking error caused by complex situations such as comparing it with historical trajectories. In addition, to find
background changes and frequent occlusions by the optimal correlation between the historical trajectory of
establishing target trajectory segments and trajectory the tracked object and the detection response, we used the
confidence coefficients. However, due to the shortcomings Hungarian algorithm. This can greatly reduce the
of our algorithm in processing specific features in object computational cost of data association between detection
detection and data association, complex background response and historical trajectory. Our algorithm has
changes and frequent occlusion may lead to tracking errors, achieved effective tracking of targets by comprehensively
resulting in room for improvement in the performance of considering spatial and temporal characteristics, which has
MOTP. Our algorithm integrates temporal and spatial been validated in the evaluation indicators of MOTA, MT,
feature models on the basis of the above algorithms, FP, and FN. Especially in terms of tracking accuracy, this
improving the tracking accuracy of the algorithm, thus algorithm has a significant improvement compared to the
having the best MOTA, MT, FP, and FN. other two algorithms.
In order to further verify the effectiveness of the algorithm
IV. RESULT in this paper, we selected 525 video images in ADL-Rundle-
Our algorithm first obtains detailed information of the 6 in the MOT2015 database. And the 1st, 3rd, 4th, 5th, 6th,
target by constructing a spatiotemporal feature model, and and 8th frames of the pedestrian video image are verified,
provides a deep understanding of the dynamic behavior of and the results are shown in Figure 3. It can be seen from
the target by integrating spatial and temporal information. Figure 3 that our algorithm can continuously track the motion
In the spatial feature model, we consider spatial trajectories of target 1 to target 7 with high accuracy. It also
characteristics such as the position, size, and shape of the shows that the online spatiotemporal correlation algorithm
target. In the time feature model, we capture the dynamic proposed in this paper can effectively track multiple targets
changes of the target, including velocity and direction. This in time and space.
combination of spatial and temporal characteristics helps to

Frame 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

Frame 3

Frame 4

Frame 5

Frame 6

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

Frame 8

FIGURE 3. The tracking results of MOTA-BOSFCFUVADS


[2] Sun Z H, Chen J, Chao L, et al. A survey of multiple pedestrian
V. CONCLUSION tracking based on trackingbydetection framework[J]. IEEE
Transactions on Circuits and Systems for Video Technology, 2020,
By constructing a spatiotemporal feature model of the 31(5):1819-1833.
tracked object and detecting the spatiotemporal feature [3] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep
response of the current frame, our method achieves online convolutional networks for visual recognition[J]. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-
spatiotemporal feature correlation between the tracked object 1916.
and the detection response. In addition, we use the Hungarian [4] Reed S, Campbell AM, Thomas BW. The value of autonomous
algorithm to solve the optimal correlation between the vehicles for last-mile deliveries in urban environments. Management
Science. 2022 Jan;68(1):280-99.
historical trajectory of the tracked object and the detection [5] Ge Q, Han K, Liu X. Matching and routing for shared autonomous
response, thereby improving the accuracy of tracking and vehicles in congestible network. Transportation research part E:
directly obtaining the motion trajectory of the tracked object. logistics and transportation review. 2021 Dec 1;156:102513.
Therefore, our method provides an important basis for the [6] Mohajerin Esfahani P, Kuhn D. Data-driven distributionally robust
optimization using the Wasserstein metric: Performance guarantees
decision-making of the unmanned vehicle auto drive system. and tractable reformulations. Mathematical Programming. 2018
However, our method also has some limitations. For example, Sep;171(1):115-66.
confusion may arise when dealing with objects with similar [7] Koenig A W , D'Amico S . Fast Algorithm for Fuel-Optimal
Impulsive Control of Linear Systems With Time-Varying Cost[J].
appearances, especially in complex and dense scenes. Institute of Electrical and Electronics Engineers (IEEE), 2021(9).
Similarly, challenges may arise when dealing with occlusion [8] Wang B L , King C T , Chu H K . A Semi-Automatic Video
or non-linear motion. To address these issues, our future Labeling Tool for Autonomous Driving Based on Multi-Object
Detector and Tracker[C]// 2018 Sixth International Symposium on
research will combine deep learning and other visual Computing and Networking (CANDAR). IEEE, 2018.
technologies to improve our algorithm performance. [9] Riahi D , Bilodeau G A . Online multi-object tracking by detection
Compared to existing methods, our algorithm independently based on generative appearance models[J]. Computer Vision and
Image Understanding, 2016, 152:88-102.
models spatial and temporal information, making it more [10] Vidal F B , Alcalde V . Object Tracking by introducing Stochastic
accurate in handling dynamic changes and motion patterns. Filtering into Window-Matching Techniques[C]// International
Secondly, utilizing global data association technology to Symposium on Computational Intelligence in Robotics &
achieve tracking throughout the entire sequence not only Automation. IEEE, 2007.
[11] Wang Z , Jiang X , Xu B , et al. An online multi-object tracking
improves tracking accuracy, but also enables us to obtain approach by adaptive labeling and kalman filter[C]// the 2015
better results in complex environments. Finally, our method Conference. 2015,146-151.
operates in an online manner, enabling it to process real-time [12] Weng X , Wang J , Held D , et al. 3D Multi-Object Tracking: A
Baseline and New Evaluation Metrics[C]// International Conference
video streams, which guarantees the real-time performance on Intelligent Robots and Systems (IROS). 2020.
of autonomous driving systems. It can be seen that the main [13] Zhao Z , Yu S , Wu X , et al. A multi-target tracking algorithm
innovation of our proposed method lies in constructing a data using texture for real-time surveillance.[C]// 2009:2150-2155.
[14] Mohamed, A, Naiel, et al. Online multi-object tracking via robust
association method based on online spatial and temporal collaborative model and sample selection[J]. Computer Vision &
characteristics. This association method combines Image Understanding, 2017,154:94-107.
spatiotemporal characteristic models and global data [15] Avidan S . Ensemble Tracking[J]. IEEE Transactions on Pattern
association techniques, making our algorithm perform well in Analysis and Machine Intelligence, 2007, 29(2):261-271.
[16] Naiel M A , Ahmad M O , Swamy M , et al. Online multi-object
unmanned fleet tracking scenarios and providing real-time tracking via robust collaborative model and sample selection[J].
technical support. Computer Vision and Image Understanding, 2017, 154(1):94-107.
[17] Yang M , Jia Y . Temporal dynamic appearance modeling for online
multi-person tracking[J]. Computer Vision and Image Understanding,
REFERENCES 2016.153:16-28.
[1] Guo G, Wang Q. Fuelefficient en route speed planning and tracking
control of truck platoons[J].IEEE Transactions on Intelligent
Transportation Systems,2018, 20(8):3091-3103.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702

Author Name: Preparation of Papers for IEEE Access (February 2017)

[18] Zeng Chao, Ma Changxi, Wang Ke, Cui Zihao. Parking Occupancy [21] Erik S H, Ari L. Detection of Outliers in Reference Distributions:
Prediction Method Based on Multi Factors and Stacked GRU-LSTM. Performance of Horn,s Algorithm [J]. Clinical Chemistry,
IEEE Access, 2022, 10: 47361-47370. 2005(12):2326-2332.
[19] Zeng Chao, Ma Changxi, Wang Ke, Cui Zihao. Predicting vacant [22] Ma Changxi, Dai Guowen, Zhou Jibiao. Short-Term Traffic Flow
parking space availability: A DWT-Bi-LSTM model. Physica A: Prediction for Urban Road Sections Based on Time Series Analysis
Statistical Mechanics and its Applications, 2022, 599: 127498. and LSTM_BILSTM Method. IEEE Transactions on Intelligent
[20] Bae S H , Yoon K J . Robust Online Multi-object Tracking Based on Transportation Systems, 2022, 23(6): 5615-5624.
Tracklet Confidence and Online Discriminative Appearance [23] Ma Changxi, Wang Chao, Xu Xuecai. A Multi-Objective Robust
Learning[C]// Computer Vision & Pattern Recognition. IEEE, Optimization Model for Customized Bus Routes. IEEE Transactions
2014:1218-1225. on Intelligent Transportation Systems, 2021, 22(4):2359-2370.

FIRST A. HAIJUN LI received the Ph.D. degree in


transportation planning and management from Lanzhou Jiaotong
University in 2018. He is currently a professor with Lanzhou Jiaotong
University. His research interests include ITS, traffic safety, and
hazardous materials transportation.

SECOND B. ZHUYE XU received the Ph.D. degree


from the Lanzhou University of Technology, in 2021.He is currently a
Teacher with Lanzhou Jiaotong University. His main research interest
includes medical image processing.
Dr. Changxi Ma received the B.S. degree in traffic engineering from

THIRD C. CHANGXI MA received the B.S. degree


in traffic engineering from the Huazhong University of Science and
Technology in 2002 and the Ph.D. degree in transportation planning and
management from Lanzhou Jiaotong University in 2013. He is currently a
professor with Lanzhou Jiaotong University. He is the author of three
books and more than 100 articles. His research interests include ITS,
traffic safety, and hazardous materials transportation.

FOURTH D. XIAO TANG received the B.S. degree


from Chengdu Technological University of Science and Technology in
2022, and he is currently pursuing Master degree in School of New Energy
and Power Engineering, Lanzhou Jiaotong University, China. His research
interest includes Intelligent Transportation and image processing.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like