Multi-Object_Tracking_Algorithm_for_Unmanned_Vehic
Multi-Object_Tracking_Algorithm_for_Unmanned_Vehic
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
ABSTRACT Aiming at the problem that the multi-object tracking algorithm is difficult to accurately
design the object feature model and data association algorithm in the process of unmanned vehicle
autonomous driving, a multi-object tracking algorithm based on online spatiotemporal feature correlation
for unmanned vehicle autonomous driving scene (MOTA-BOSFCFUVADS) is proposed. Firstly, the
algorithm performs object detection on the training samples, calibrates the coordinates of the detection
results in the time dimension and the coordinates of the space dimension, eliminates the detection results
whose confidence is less than the set value, and eliminates the overlapping boundaries in the detection
through non-maximum suppression. Secondly, we use Kalman filter to predict the position of the tracking
object in the current frame, then build the feature model of the object in the time dimension and the space
dimension respectively, and fuse the temporal feature model of the tracking object with the spatial feature
model, thereby, the spatiotemporal feature model of the tracking object is obtained. Finally, the
spatiotemporal feature response of the object in the current frame is detected online, and the spatiotemporal
feature response is correlated with the spatiotemporal object feature model of the tracking object, and then
the similarity metric matching matrix obtained by fusion is calculated, and the tracking is solved by using
the Hungarian algorithm. The optimal correlation pair between the object historical trajectory and the
detection response, and update the parameters of the object spatiotemporal feature model. In addition, we
use the MOT2015 database to test the effectiveness of the algorithm. The results show that the proposed
algorithm has better tracking performance than the other two algorithms, and can effectively track multiple
object continuously in time and space.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
association and the multi-object tracking algorithm based efficiency of Hungarian algorithms in data association, the
on online data association according to the data accuracy and efficiency of multi object tracking have been
association method. Multi-object tracking algorithms greatly improved. In order to comprehensively evaluate
based on offline data association perform global the effectiveness of 3D multi-target tracking, they
optimization for all target detection responses in all proposed a new 3D multi-target tracking evaluation tool
frames, and can effectively deal with problems such as that can more accurately measure the performance
object loss caused by object occlusion, detector false improvement brought by the combination of 3D Kalman
detection, and false identity switching. The other tracking filter and Hungarian algorithm; On the basis of Kalman
algorithm is online data association and it has become the filtering, Zhao [13] et al. combined optical flow histogram
mainstream research direction of multi-object tracking features and Local Binary Patterns (LBP) respectively to
algorithm[9]. complete multi-object tracking tasks accurately and in real
The multi-object tracking algorithm based on online time; According to the optical flow histogram, Mohamed
[14]
data association mainly adopts the detection tracking et al. established a correction model of the detector and
framework, its basic idea is to use a target detector to the tracker in the framework of particle filtering, regarded
detect the target of interest in each frame of the video. each detection area as a sampling sample, and used the
Then, the detected target is characterized by modeling, frame-by-frame data correlation between the detection
and by calculating the feature similarity between the target response and the tracker to realize online multi-level
feature model and the detection feature, and solving the detection. In addition, Avidan S [15] proposed a tracking
final cost matrix according to the similarity matrix, the algorithm based on support vector machine, and applied
difference between the historical trajectory of the target the Support Vector Machine (SVM) classifier to the
and the detection response of the current frame is optical flow tracker, by using the pre-trained SVM to
determined. The optimal matching degree of association, detect the vehicles in the video sequence, and at the same
so as to complete the multi-object tracking task. time use the optical flow equation to calculate the offset
In the process of online data association to determine between two adjacent frames to achieve continuous
the target trajectory, many scholars combine the target tracking of multi-target vehicles; Naiel[16] et al. proposed
detection response and filtering technology to solve the an online real-time multi-object tracking algorithm. This
problem of object position prediction and estimation by method present a collaborative model between a pre-
using Kalman filtering [10], and assist in screening the trained object detector and a number of single-object
correct object detection response as the associated object, online trackers within the particle filtering framework, and
so as to achieve detection and tracking purpose. Kalman effective update the appearance model of each tracker;
filtering combines previous states with observed data to Yang[17-19] et al. proposed an exchange object context
generate the optimal estimate of the current state of the model, which made full use of the context information of
system. In target tracking scenarios, Kalman filtering can objects between two adjacent frames, and then used a
be used to predict the next position of the target and adjust novel color histogram descriptor to calculate the similarity
this prediction based on new observation data, thereby and background smoothness between objects in two
tracking the target more accurately. The Hungarian adjacent frames. In order to correlate the object detected
algorithm is mainly used to find the optimal connection between two adjacent frames; Bae [20] et al. proposed a
between the historical trajectory of the tracked target and multi-object tracking method based on trajectory
the detection response, while ensuring the overall confidence coefficient and online discriminative
performance of the algorithm, to find the optimal appearance learning, and established object trajectory
correlation between each tracked target and the detection segments and trajectory confidence coefficients.
response, thereby improving tracking accuracy. Wang[11] According to the different confidence coefficients value,
et al. designed an object detection and tracking method the object trajectory segment and the detection response,
that combines Kalman filtering and Hungarian algorithm. or the object trajectory segment and other trajectory
Firstly, they effectively detect moving objects and segments are associated with hierarchical data, which can
perform adaptive labeling by using differential methods effectively reduce the tracking error caused by complex
between adjacent frames. Then assign an independent situations such as background changes and frequent
Kalman filter to each labeled object to accurately predict occlusions.
the future position of the target. On this basis, by applying The above relevant literature calculates similarity based
the Hungarian algorithm to solve the data association on manually designed visual features and motion features
problem, not only has the accuracy of target location obtained by filtering algorithms, and then associates the
prediction been improved, but also the efficiency of target new tracked object with existing trajectories in the time
detection and data association has been optimized; dimension. Although this method has certain effects, it
Weng[12] et al. directly combined the three-dimensional does not consider the spatial characteristics of the tracked
Kalman filter with the Hungarian algorithm for object target, so it cannot directly obtain the trajectory and
state estimation and data association. By utilizing the quantity of the target. However, spatial information is a
advantages of Kalman filters in state estimation and the very important part. Spatial information can represent the
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
orientation of a tracked object in the physical world, Figure 1 shows the flow of the multi-object tracking
defining attributes such as the position and size of the algorithm for autonomous vehicle autonomous driving
target. On this basis, combining time information can scenarios based on online spatiotemporal feature association.
better understand and predict the behavior of the target. The whole process mainly includes the data processing stage,
Based on this theory, considering the temporal and spatial the object feature model establishment stage and the
characteristics of tracking targets, and considering these spatiotemporal data association stage. In the data processing
two information comprehensively, compared with relying stage: the object detection is performed on the training
solely on visual features and filtering algorithms, it can samples, and the coordinates of the detection results in the
improve the effectiveness of multi-target tracking. By time dimension and the coordinates in the space dimension
associating the temporal and spatial features of the tracked are calibrated. Eliminate detection results with confidence
target, eliminating redundant tracked targets, and less than the set value and eliminate overlapping bounding
obtaining motion trajectories over a period of time, we can boxes in detection through non-maximum suppression; In the
more accurately obtain the number of targets and track object feature model establishment stage: the position of the
their movements, thus achieving multi-target tracking. object in the current frame is predicted and tracking by
Section 2 introduces the proposed method in detail and Kalman filtering, and the target feature model is constructed
explains it through a flow chart, section 3 shows the in the temporal and spatial dimensions; In the spatiotemporal
source of the experimental data and analyses the results. data association stage, the feature response of the tracking
Section 4 concludes the paper with a summary of the object in the current frame is first detected online, and its
method. spatiotemporal feature is extracted, and then the
spatiotemporal feature response is correlated with the object
II. MOTA-BOSFCFUVADS feature model established in the previous stage. Then
The core problem of the multi-object tracking algorithm is calculate the similarity metric matching matrix obtained by
to design an efficient and simple target feature model and fusion, and then use the Hungarian algorithm to optimize the
data association algorithm. By establishing a feature similarity metric matching matrix to obtain the optimal
model for the target of interest in each frame, and using spatiotemporal data association result. When the optimal
target features as metrics. Then, according to the data spatiotemporal data correlation result is less than the set
association algorithm, the same target in two adjacent threshold, the best tracking result is output; when the optimal
frames is matched. spatiotemporal data correlation result is greater than the set
threshold, the parameters of the object spatiotemporal feature
A. THE PROCESS OF ALGORITHM MOTA- model are adjusted and updated. Thus, the object feature
BOSFCFUVADS model can be optimally associated with the spatiotemporal
features of the current frame.
Start
Traini ng sampl e
Spatiotemporal No
feature name of
thres hold=threshold
Online value
detection
Yes
Object Output
sample tracking resul t
End
FIGURE 1. The flow chart of MOTA-BOSFCFUVAD
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
√( - ) ( - ) (4)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
In the data association stage, we adopt the spatiotemporal R2014a and 64 bit Windows10 SP1. In addition, the
data association framework. Firstly, the feature response of performance metrics we verified are: Multiple Object
the tracking object in the current frame is detected online, Tracking Precision (MOTP), MT (Mostly Tracking),
and its spatiotemporal features are extracted. Then the Multiple Object Tracking Accuracy (MOTA), False
detection response is correlated with the object feature model Positives (FP), False Negatives (FN) and Identity Switches
established in the previous stage, and the spatial feature (IDS). Among them, FP stands for false positive rate, which
similarity and temporal feature similarity between the is the number of instances where a negative sample is
detection response and the object feature model are incorrectly predicted to be positive. FN stands for false
calculated respectively, and then the similarity metric negative rate, which is the number of instances in which a
matching matrix obtained by fusion is calculated. positive example is incorrectly predicted as a negative
The calculation process of the spatial feature similarity example. MOTP is used to evaluate the average overlap rate
between the detection response and the object feature model between the estimated trajectory and the real trajectory; MT
is as follows: As can be seen from the above: For the represents the proportion of time that the tracked object has
tracking object in the frame, its spatiotemporal feature been tracked for more than half of the given sequence, and
model can be expressed as { }. We assume this tracked object is called "Mostly Tracked. MT is used to
that the spatiotemporal feature response of the tracking object evaluate the integrity of the tracking trajectory.", which can
reflect the ability of target re-detection; IDS is used to
in the current frame is { }, the spatial features
evaluate the performance of a tracker in tracking multiple
are ,the HOF features are . Then calculate the targets. If the tracker mistakenly switches the ID of one
similarity between the detection response and the tracking target to another in consecutive frames, an "Identity
object spatial feature model, as shown in equation (5): Switch" occurs, indicating confusion in the tracker's
handling of multi-target tracking tasks; MOTA is a
( ) ( ) (5) cumulative accuracy measure that integrates false positives,
‖ ‖‖ ‖
false negatives, and identity miss witching.
The temporal feature similarity between the detection ∑( )
response and the object feature model is calculated as follows: - ∑
(7)
The Bhattacharyya distance is used to calculate the HOF
feature similarity, as shown in equation (6): where is the index number of the video frame and is
the number of object ground truths. To verify the
( ) performance superiority of the proposed algorithm,
( ) ( ) (- ) (6)
√ comparative experiments are required. Due to the use of
deep learning in both the Nicolai and Bae algorithms,
Where is the standard deviation of the Gaussian which perform well on target tracking tasks with similar
function, ( ) represents the HOF feature Bhattacharyya characteristics and share similar characteristics with our
distance between the detection response and the tracking proposed algorithm, we chose Nicolai, Bae, and our
object . After finding the temporal and spatial feature algorithm for comparative analysis, and use the above
similarity between the detection response and the object performance indicators as the comparison basis. The results
feature model, the similarity measurement matrix of the are shown in table 1. The bold part in table 1 represents the
spatial and HOF features can be obtained. Then the best results observed by each performance indicator.
Hungarian algorithm is used to optimize the similarity metric TABLE I
COMPARISON OF MULTI-OBJECT TRACKING INDICATORS OF DIFFERENT
matching matrix to obtain the optimal spatiotemporal data ALGORITHMS
association result. When the optimal spatiotemporal data
MOTA-
correlation result is less than the set threshold, the best Nicolai Bae
BOSFCFUVADS
tracking result is output; when the optimal spatiotemporal
MOTP 70.1% 74.1% 73.3%
data correlation result is greater than the set threshold, the MOTA 27.7% 26.3% 27.9%
parameters of the object spatiotemporal feature model are MT 18.6% 15.1% 19.0%
adjusted and updated. Thus, the object feature model can be FP 21.1% 23.3% 19.8%
FN 61.7% 66.2% 59.9%
optimally associated with the spatiotemporal features of the
IDS 609 651 611
current frame.
It can be seen from table 1 that Nicolai has the lowest
III. EXPERIMENT ANALYSIS IDS, because Nicolai obtains the target appearance
In this paper, the MOT2015 test platform is used to test the information through the joint deep network and the object
effectiveness of the multi-object tracking algorithm for motion information obtained by Kalman filtering, which
unmanned vehicle autonomous driving scenarios based on significantly reduces the number of object identity jumps.
online spatiotemporal feature correlation[22-23]. The Due to the integration of temporal and spatial feature
simulated hardware environment is as follows: CPU Intel(R) models, our algorithm has increased complexity compared
Core(TM) i5-2430M 2.40 GHz, RAM 8GB, MATLAB to the Nicolai algorithm, decreased sensitivity to specific
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
noise, and resulted in identity switching and improved IDS. improve our understanding and predictive ability towards
And although our algorithm utilizes the Hungarian the target. And our algorithm uses online methods for
algorithm to solve data association, it has certain tracking spatiotemporal characteristics. This means that the
shortcomings in motion prediction, which may lead to algorithm updates and adjusts the tracking model in real-
errors in matching the identity of the same object in the time to adapt to changes in the target, thereby improving
front and back frames, resulting in an increase in IDS. Bae tracking accuracy and stability. In object detection, the
has the best MOTP, because Bae can effectively reduce the detection response of the current frame is obtained by
tracking error caused by complex situations such as comparing it with historical trajectories. In addition, to find
background changes and frequent occlusions by the optimal correlation between the historical trajectory of
establishing target trajectory segments and trajectory the tracked object and the detection response, we used the
confidence coefficients. However, due to the shortcomings Hungarian algorithm. This can greatly reduce the
of our algorithm in processing specific features in object computational cost of data association between detection
detection and data association, complex background response and historical trajectory. Our algorithm has
changes and frequent occlusion may lead to tracking errors, achieved effective tracking of targets by comprehensively
resulting in room for improvement in the performance of considering spatial and temporal characteristics, which has
MOTP. Our algorithm integrates temporal and spatial been validated in the evaluation indicators of MOTA, MT,
feature models on the basis of the above algorithms, FP, and FN. Especially in terms of tracking accuracy, this
improving the tracking accuracy of the algorithm, thus algorithm has a significant improvement compared to the
having the best MOTA, MT, FP, and FN. other two algorithms.
In order to further verify the effectiveness of the algorithm
IV. RESULT in this paper, we selected 525 video images in ADL-Rundle-
Our algorithm first obtains detailed information of the 6 in the MOT2015 database. And the 1st, 3rd, 4th, 5th, 6th,
target by constructing a spatiotemporal feature model, and and 8th frames of the pedestrian video image are verified,
provides a deep understanding of the dynamic behavior of and the results are shown in Figure 3. It can be seen from
the target by integrating spatial and temporal information. Figure 3 that our algorithm can continuously track the motion
In the spatial feature model, we consider spatial trajectories of target 1 to target 7 with high accuracy. It also
characteristics such as the position, size, and shape of the shows that the online spatiotemporal correlation algorithm
target. In the time feature model, we capture the dynamic proposed in this paper can effectively track multiple targets
changes of the target, including velocity and direction. This in time and space.
combination of spatial and temporal characteristics helps to
Frame 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
Frame 3
Frame 4
Frame 5
Frame 6
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
Frame 8
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439702
[18] Zeng Chao, Ma Changxi, Wang Ke, Cui Zihao. Parking Occupancy [21] Erik S H, Ari L. Detection of Outliers in Reference Distributions:
Prediction Method Based on Multi Factors and Stacked GRU-LSTM. Performance of Horn,s Algorithm [J]. Clinical Chemistry,
IEEE Access, 2022, 10: 47361-47370. 2005(12):2326-2332.
[19] Zeng Chao, Ma Changxi, Wang Ke, Cui Zihao. Predicting vacant [22] Ma Changxi, Dai Guowen, Zhou Jibiao. Short-Term Traffic Flow
parking space availability: A DWT-Bi-LSTM model. Physica A: Prediction for Urban Road Sections Based on Time Series Analysis
Statistical Mechanics and its Applications, 2022, 599: 127498. and LSTM_BILSTM Method. IEEE Transactions on Intelligent
[20] Bae S H , Yoon K J . Robust Online Multi-object Tracking Based on Transportation Systems, 2022, 23(6): 5615-5624.
Tracklet Confidence and Online Discriminative Appearance [23] Ma Changxi, Wang Chao, Xu Xuecai. A Multi-Objective Robust
Learning[C]// Computer Vision & Pattern Recognition. IEEE, Optimization Model for Customized Bus Routes. IEEE Transactions
2014:1218-1225. on Intelligent Transportation Systems, 2021, 22(4):2359-2370.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4