0% found this document useful (0 votes)

21 views12 pages

Paper 6-A New Framework of Moving Object Tracking

The paper presents a new framework for Moving Object Tracking (MOT) that integrates a Deep Particle Filter with the YOLOv3 object detection method, enabling efficient tracking of objects in three-dimensional space using stereo cameras and IMU data. It addresses challenges posed by moving cameras, such as sudden movements and blurring, by removing features of moving objects to enhance localization accuracy. The proposed system has been tested on the KITTI 2012 dataset, demonstrating good real-time performance and accuracy in object tracking and 3D environment reconstruction.

Uploaded by

NGOC LY QUOC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Paper 6-A New Framework of Moving Object Tracking

Uploaded by

NGOC LY QUOC

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 11, No. 4, 2020

A New Framework of Moving Object Tracking based

on Object Detection-Tracking with Removal of
Moving Features
Ly Quoc Ngoc1, Nguyen Thanh Tin2, Le Bao Tuan3
Department of Computer Vision and Cognitive Cybernetics
VNUHCM–University of Science, Ho Chi Minh city, Vietnam

Abstract—Object Tracking (OT) on a Moving Camera so- objects according to their position. CNN is integrated with
called Moving Object Tracking (MOT) is extremely vital in correlation filter [14] or with particle filter [15], [16]. But
Computer Vision. While other conventional tracking methods these approaches do not take into account the challenges of
based on fixed camera can only track the objects in its range, a moving camera. J. S. Lim and W. H. Kim [17], Y. Chen et al.
moving camera can tackle this issue by following the objects. [18] tried to calculate translation vector between two
Moreover, single tracker is used widely to track object but it is consecutive frames (or two frames from stereo camera).
not effective due to the moving camera because the challenges
such as sudden movements, blurring, pose variation. The paper Based on data acquired by IMU and stereo camera, the
proposes a method inherited by tracking by detection approach. paper proposed a solution by integration of a single tracker as
It integrates a single tracker with object detection method. The Deep Particle Filter and an object detection method as
proposed tracking system can track object efficiency and YOLOv3 [19], however, the object would be tracked by its
effectively because object detection method can be used to find three-dimensional center. In traditional object tracking from
the tracked object again if the single tracker loses track. Three static camera, two-dimensional position of tracked object is
main contributions are presented in the paper as follow. First, enough but in MOT, three-dimensional position of tracked
the proposed Unified Visual based-MOT system can do the tasks object must be considered. The challenges must be taken into
such as Localization, 3D Environment Reconstruction and account as the vibration of the camera and the movement of
Tracking based on Stereo Camera and Inertial Measurement
the object. YOLOv3 is the right solution because it can detect
Unit (IMU). Second, it takes into account camera motion and the
objects very quickly and then this result can be used to support
moving objects to improve the precision rate in localization and
tracking. Third, proposed tracking system based on integration
the single tracker be more robust, and the most important
of single tracker as Deep Particle Filter and Object Detection as thing is that it is suitable for real-time applications. In
Yolov3. The overall system is tested on the dataset KITTI 2012, addition, in the localization and three-dimensional
and it has achieved a good accuracy rate in real time. environment reconstruction, the removal of moving objects is
considered to increase accuracy rate. To do that, the paper
Keywords—Moving object tracking; object detection; camera does not rely on estimating 6 degrees of freedom to find out
localization; 3D environment reconstruction; tracking by detection the robot position, but inspired from [20], the paper splits it
into two separate transformations including a rotation
I. INTRODUCTION transformation and a translation transformation. Rotation
In Object Tracking, it is necessary to predict the position transformation is calculated based on IMU and the translation
of object being tracked in the current frame and match them transformation is estimated from the stereo camera. Robot can
with the previous ones to achieve its precise position in the locate by itself based on these two transformations in real
current frame. Many significant works have dealt with environments. Data which is observed from stereo camera-
appearance changes overtime such as color histogram [1], based environments includes two kinds of object: moving
HoG feature [2], SIFT or SURF feature [3], or texture features objects and static objects. If the feature points of moving
like LBP [4].The single tracker based on the popular filters objects are used to estimate the robot position and 3D point
such as Correlation Filter, Kalman Filter or Particle Filter. cloud of environment, the estimated error will increase over
Correlation filter [5] [6] [7] is also used and it acquired high time. Therefore, the paper considers to eliminate feature points
speed and accuracy. Other filters such as Kalman Filter or of moving objects to increase accuracy rate. This is an
Particle Filter are used because they could predict the position improvement of the paper to increase the accuracy rate of
of objects and then match the predicted position with the localization and 3D environment reconstruction. Most of the
previous one. Kalman Filter [8]-[10] could not deal with non- solutions be published have not yet considered the feature
linearity in the measurements because the filter tries to points of moving objects. But in experimental results of the
linearize it using approximation method. Particle filter [11], paper, the accuracy rate with removal of moving features has
[12] are used as it could solve the drawback of Kalman filter. yielded better results than the opposite. To remove moving
Recently, deep neural networks have been applied in tracking objects, the paper uses the background subtraction method
problems. S. Chen and W. Liang [13] used a CNN to with camera motion compensation to detect moving objects
distinguish the background from objects and then track the proposed in [21], [22], the advantage is fast and accurate

35 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

detection of moving objects. Meanwhile in the research [23] it sensor, especially it takes into account moving objects to
is assumed that moving objects are identified as belonging to increase accuracy rate.
movable categories which are likely to move currently or in
the time coming, such as people, dog, cat, and car. For B. Moving Object Tracking
instance, once a person is detected, no matter walking or The moving camera could solve the disadvantages of fixed
standing, it is considered as a potentially moving object and camera. The fixed camera can only track objects within their
remove the features belonging to the region in the image range, if the objects come out of field of view (FOV) of
where the person was detected. The limitation of the proposed camera, it cannot monitor the objects and for realize this
method in [23] is that it is impossible to distinguish moving matter, it should be mounted on the moving framework such
objects or static objects. as robot, drone or an autonomous-driving car.
In the MOT problem, the paper tries to use stereo camera Y. Chen et al. [18] used features such as SIFT, SURF to
and IMU without GPS for the following reasons: The paper match the features between two consecutive frames to find out
would like to test the power of visual information acquired the translation vector of camera and uses it to predict the
from stereo camera in estimating the position of the robot. The position of objects in the frame. J. S. Lim and W. H. Kim [17]
IMU data will provide rotation transformation of the robot estimated motion by distinguishing 16x16 patches between the
motion. Stereo camera integrated with IMU can work better two frames. Each patch has a vector that is the main motion in
than GPS in many environments such as indoors, radio this area and after traverse all the 16x16 patches in two
interference, noisy GPS and in the cases that the input is only consecutive frames, the vector with the highest frequency is
visual information of tracked object. selected as camera motion vector. These frameworks partly
alleviate the effects of fast moving, rotation, vibration of the
In Section II, the paper reviews the previous work in visual cameras.
tracking on a fixed camera as well as moving camera.
Section III describes the proposed methods such as object There are also several ways to matching objects between
localization, 3D environment reconstruction and tracking two images. Q. Zhao et al. [1] matched objects by comparing
algorithm based on stereo camera and IMU. Section IV shows color histograms but this is easy to fail in case there are
experimental results of localization and tracking. The paper regions which have the same colors with the objects. C. Ma et
discusses about the pros and cons of the proposed methods in al. [14] applied CNN to extract features and compared objects
Section V. Conclusion and future works will be presented in by a correlation filter. R. J. Mozhdehi and H. Medeiros [15],
Section VI. T. Zhang et al. [16] inherited the previous framework and
integrated it with Particle Filter.
II. RELATED WORKS
The tracking part will inherit Particle filter to track objects
A. Camera Localization and improve the performance in its prediction and
Robot localization is crucial for many high-level tasks measurement steps. Firstly, the paper will find out a
such as object tracking, obstacle detection and avoidance, translation vector by using feature matching algorithm, and
motion planning, autonomous navigation, local path planning then the position of the tracked object will be solved by
and a waypoint follower, etc. Over the years, many applying deep neural network in conjunction with correlation
researchers have been working on the problem of robot filter.
localization and made certain contributions. David Nistér et al. Moreover, the paper inherits a deep CNN-based object
[24] proposed a system for real-time ego-motion estimation of detection algorithm named YOLOv3 [19] which is very fast
a single camera or stereo camera. Bernd Kitt et al. [25] and quite accurate to detect objects. By combining these
proposed another visual odometry algorithm based on methods, the tracking part has developed an algorithm called
RANSAC outlier rejection technique. Shaojie Shen et al. [20] Tracking by Detection.
used the feature points from stereo images and IMU
information to estimate robot position. S.Prabu and G. Hu [12] However, to track the object in the context of moving
proposed a vision based on localization algorithm which camera and moving object, the tracking part has to track
combinesthe partial depth estimation and particle ﬁlter object in 3D environment (by IMU and stereo cameras) so that
techniques. Yanqing Liu et al. [26] present a robust stereo the tracking system is realistic.
visual odometry using an improved RANSAC based method
(PASAC) that makes the procedure of motion estimation III. METHOD
much faster and more accurate than standard RANSAC. The paper proposes a Unified Visual Based-MOT system
Yuquan Xu et al. [27] propose a novel algorithm for the can do the tasks such as Camera Localization, 3D
problem of three-dimensional point cloud map based on Environment Reconstruction and Object Tracking.
localization using a stereo camera. S. Hong et al. [28]
proposed the real-time autonomous navigation system using A. Camera Localization
only a stereo camera and a low-cost GPS. All the Inspired from the method of [20], the significant
aforementioned research works have provided the improvement is proposed in feature detection stage with
fundamental background knowledge to solve the localization removal of moving feature points. In addition, there are some
problem in this paper. Here, the paper proposes a novel differences between [20] and the paper. Specifically, instead
method to localize a robot using a stereo camera and IMU using a built-in system [20] to get the camera position as

36 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

ground truth, the paper used the ground truth GPS of KITTI In the model, at each moment, the paper gets two images
dataset. from the stereo camera (see Fig. 2). These two images are
used for feature detection and reproduce the 3D positions of
To locate the position of the camera, the paper estimates the features in WCS. However, feature detection and 3D
the camera motion at time t, consists of the translation position reconstruction will not be performed consecutively in
transformation and the rotation transformation of camera pairs of successive images, but they are performed in a given
coordinate system between two consecutive frames based on cycle, which corresponds to 25 consecutive frames (depending
the stereo camera and IMU sensor. The IMU data provides the on the device) (see Fig. 2). This means that at the beginning
rotation matrix for rotation transformation. The feature points the feature detection is made from the two images of stereo
of the image used to estimate the translation, these features camera and reconstructed the 3D position of the features in the
include moving and static features. In this case, moving world coordinate system, and after 25 consecutive frames of
features are noise. Therefore, the paper removes the moving cycle (including frames used for feature detection), the above
feature points to reduce error rate in estimating camera calculation process will be performed again. For 25
position. It is a new point in improving the robot localization consecutive frames of cycle, the feature detection process and
process. The camera location estimation steps are shown in estimate the 3D positions are not performed, instead the
Fig. 1. features will be kept track on the successive image frames
1) Camera model, feature detection and feature tracking: until a new cycle be done. The purpose of this solution is to
Both cameras in the system are calibrated using the Camera reduce computational time but still retain the required
Calibration Toolbox. Both cameras are divided into two accuracy rate.
systems and play different roles: In feature detection stage, the image features play an
important role in locating robot positions. The SURF feature
 Stereo Camera System (right and left cameras): They (Speeded Up Robust Features) [29] is extracted from pairs of
are used to estimate 3D positions of the features in the images of the left and right cameras. FLANN matching
world coordinate system (WCS), initialize the local algorithm (Fast Library for Approximate Nearest Neighbor)
map at the start and update local maps when the local [30] is used for matching the features of two images. In order
map accumulated errors large enough (see Fig. 1 and 7). to remove outliers, Lowe outlier rejection method [31] is used.
 Monocular Camera System (left camera): It is used to This outlier removal supports significant improvement the
estimate robot locations, initialize and update local accuracy of localization.
maps.

Fig. 1. Diagram of Camera Position Estimation. Inspired from [20].

(Suppose the Robot Position is Considered as the Camera Position).

37 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

the moving regions. Finally, remove the features located in the

moving regions. Steps to remove moving features at time t:
Step 1: Image registration: To find the transformation
between two frames at the time t-1 and t, homography
transformation is used. The relationships between these frames
are shown as follow.
𝑃𝑡−1 = 𝐻 ∙ 𝑃𝑡 (1)
where the transform matrix H (homography matrix) is a 3-
by-3 matrix which describes the spatial relationship between
two consecutive image frames.
As in [33], H can be solved by least square criteria with:
𝐻 = 𝑃𝑡−1 𝑃𝑡𝑇 (𝑃𝑡 𝑃𝑡𝑇 )−1 (2)
Fig. 2. Feature Detection and Tracking Diagram of Stereo Camera at where (∙)𝑇 represents the matrix transpose
different Times. Ptl and Ptr are the Feature sets of Left and Right Cameras at and(∙)−1represents the matrix inverse.
Time t.
By multiplying the estimated transform matrix H on the
In the feature tracking stage, the KLT algorithm [32] is pixel positions on current image, they are warped onto the
used to track features through the consecutive image frames. image plane at the previous time instance and the same
This feature tracking is only performed in the left camera background scenes in consecutive frames can approximately
(Monocular Camera System). Tracked features are features overlap with each other, it performs camera motion
(𝑇)
that are detected and estimated at the 3D position at the stereo compensation. Therefore, a new image will get at time t, 𝐼𝑡 ,
camera system at the start of each new cycle. In the tracking this image has the same coordinate system as the image at
process, the moving features are detected and removed in the time t-1, 𝐼𝑡−1 .
image (see Fig. 3).
Step 2: Background subtraction: Perform background
In traditional methods, to estimate camera position, the (𝑇)
subtraction between image 𝐼𝑡 and 𝐼𝑡−1 with a certain
moving and static features are all used. In order to increase the threshold and the moving regions be detected. In addition,
accuracy rate of camera position, the paper proposes removal using Morphology operators to refine the result image to
of moving features. Because these moving features will have a increase the accuracy of moving regions.
position that changes over time, so using the features to
calculate the camera position, the error of the predicted Step 3: Removal of moving features: The feature points of
position will increase over time. The process of detecting and moving regions are moving features, so they are excluded
removing moving features is performed before estimating the (The feature points are converted to the same coordinate
camera position. Here, the paper proposes using the system with the result image).
background subtraction method for two consecutive images to
detect and remove moving objects as suggested in [21]. This
method will find a transformation matrix (called a
homography matrix) between two consecutive images and
then use this homography matrix to transform two consecutive
images into the same coordinate system. Then background
Fig. 3. Diagram to Detect and Remove Moving Objects from Two
subtraction method for these two images is performed to find Consecutive Images of Monocular Camera.
the regions of moving objects on the image. Finally, the
removal of features is performed in moving regions.
Outline of the steps of the removal of moving features
process (see Fig. 3). At time t, there are two consecutive
images from the left camera at the time t-1 and t, are called
𝐼𝑡−1 and 𝐼𝑡 . Besides, at this time, in the feature tracking step
between successive frames, two feature sets of tracking 𝑃𝑡−1
and 𝑃𝑡 are obtained, respectively, for two images 𝐼𝑡−1 and 𝐼𝑡 .
Assuming that 𝑃𝑡−1 = [p1𝑡−1 , … , p𝑁 𝑡−1 ] be the set of N key
points found at time t – 1 and 𝑃𝑡 = [p1𝑡 , … , p𝑁 𝑡 ] be the set of
the tracked points at time t. Here, p𝑖𝑡 = [𝑥𝑡𝑖 , 𝑦𝑡𝑖 ] while 𝑥𝑡𝑖 and
𝑦𝑡𝑖 represent its 2D position in the image. Two sets of 𝑃𝑡−1 and
𝑃𝑡 are used to find the coordinate transformation between the
two images. Then convert two images 𝐼𝑡−1 and 𝐼𝑡 to the same
coordinate system and perform background subtraction to find Fig. 4. 3D Environment Reconstruction from Stereo Camera Model.

38 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

2) 3D Feature location via stereo correspondences: The

3D pose of corresponding feature points is estimated by stereo
correspondences [34] (see Fig. 4). The 3D camera coordinates
of the feature points are obtained based on the following
equations:
𝐵
𝑋 = (𝑥𝑙 − 𝑐𝑢 ) (3)
𝑑
𝐵
𝑌 = (𝑦𝑙 − 𝑐𝑣 ) (4)
𝑑
𝐵𝑓
𝑍= (5)
𝑑
Fig. 5. The Observation of the 𝑝𝑖 Feature Point at Time 𝑡 in the Image Plane
𝑑 = √(𝑥𝑙 − 𝑥𝑟 )2 + (𝑦𝑙 − 𝑦𝑟 )2 (6)
of the Left Camera. Vector 𝑘𝑖𝑡 (Green Color) is Observation Vector of 𝑝𝑖 at 𝑡
where 𝑓 is the focal length of the stereo camera. 𝐵 in the WCS.
represents the baseline between the stereo cameras. 𝑐𝑢 , 𝑐𝑣
represents 𝑥 and 𝑦 coordinate of the principal point. 𝑑 is the The observation of the 𝑖 𝑡ℎ feature point from the
disparity between the feature points in the left and right homogeneous image coordinate system (ICS) 𝑝𝑖𝐼 (𝑢𝑖 , 𝑣𝑖 , 1) will
images. (𝑥𝑙 , 𝑦𝑙 ) and (𝑥𝑟 , 𝑦𝑟 ) ∈ ℝ2 are the coordinates in the be transformed into an observation vector in the camera
left and right images of the feature point, respectively. coordinate system (CCS) 𝐤 𝐂𝑖𝑡 at time 𝑡 and is denoted as,

Thus, (𝑋, 𝑌, 𝑍) is 3D camera coordinates of the feature 𝐤 𝐂𝑖𝑡 = 𝐾 −1 𝑝𝑖𝐼 (8)

points and 𝑍 represents the depth of the feature point. −1
where 𝐾 is the inverse matrix of K, this matrix will
3) Estimate camera position via 2D-3D correspondences: convert a 𝑝𝑖𝐼 point on the image plane into a directional vector
Inspired from the camera location estimation method 𝐤 𝐂𝑖𝑡 , starting from the camera position to 𝐩𝑖 (or from the
presented in [20], the paper improves precision rate of feature point to 𝐩𝑖 ), in the CCS. 𝐾 is a matrix transform from
localization by removal of moving features. Assume that the the CCS to the ICS.
3D local feature map is known. Details of local map Then, the observation vector 𝐤 𝐂𝑖𝑡 is normalized to unit
initialization and maintenance will be presented in the 𝐤𝐂
𝑖𝑡
vector .
following sections. The robot position is assumed that the 3D ‖𝐤𝐂
𝑖𝑡 ‖
position of the left camera in the WCS. Given observations of
Transforming the 𝐤 𝐂𝑖𝑡 observation vectors from the CCS to
a local map consisting of known 3D features at the time 𝑡 − 1
the WCS, 𝐤 𝑖𝑡 and is given by,
and the observation vector of features at the present time t, the
3D position of the camera can be estimated by minimizing the 𝐤 𝑖𝑡 = 𝑅𝐼𝑊 𝑅𝐶𝐼 𝐤 𝐂𝑖𝑡 (9)
sum-of-square reprojection error of the observed features:
where 𝑅𝐼𝑊 is rotation matrix from IMU coordinate system
𝐫 −𝐩𝑖 2 (IMUCS) to WCS, 𝑅𝐶𝐼 is rotation matrix from CCS to IMUCS,
𝐫𝑡∗ = argmin ∑𝑖∈ℑ ‖‖𝐫𝑡 × 𝐤 𝑖𝑡 ‖ (7) 𝑅𝐶𝐼 is obtained from offline camera calibration.𝑅𝐼𝑊 is obtained
𝐫𝑡 𝑡 −𝐩𝑖 ‖
from IMU data at each time 𝑡.
where, as shown in Fig. 6, 𝐫𝑡 is the 3D position of the
camera at time 𝑡 in WCS, 𝐤 𝑖𝑡 is the observation vector of the
𝐫 −𝐩
𝑖 𝑡ℎ feature point at time 𝑡 in WCS (see Fig. 5), 𝐠 𝑖𝑡 = ‖𝐫𝑡 𝑖 ‖ is
𝑡 −𝐩𝑖
the unit truth vector when there is an exact position of 𝐫𝑡 , this
vector has a direction from position 𝐫𝑡 to 3D position of the 𝐩𝑖
feature point, ℑ represents the set of features observed in the
image at time t, 𝐩𝑖 is the 3D position of the 𝑖 𝑡ℎ feature in WCS.
Calculation of observation vector 𝐤 𝑖𝑡 of the 𝑖 𝑡ℎ feature
point at time 𝑡 in WCS (see Fig. 5): The unit feature
observation vectors is crucial in the problem of estimating
camera position. Each feature will provide directional
information from the camera position to the feature position
through observation at the image plane at different times. That
information is used to find camera locations in real
environments.

Fig. 6. Illustrate the Positions of the Left Camera at Time 𝑡 and the Error
between 𝑔𝑖𝑡 and the Observation Vector 𝑘𝑖𝑡 . The Position with the Smallest
Error (Error is Total Area of Red Parallelogram) will be Robot Position at
Time 𝑡. Here, the Position 𝑟𝑡 has the Smallest Error.

39 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

Assume that the camera motion between two consecutive B. 3D Environment Reconstruction
images is small, formula (7) can be approximated as: In this section, 3D environment reconstruction task is
𝐫𝑡 −𝐩𝑖 2 presented (see Fig. 7). The environmental map is a local map.
𝐫𝑡∗ = argmin ∑𝑖∈ℑ ‖ × 𝐤 𝑖𝑡 ‖ (10) The local map is defined as the set of currently tracked 3D
𝐫𝑡 𝑑𝑖
features. The 3D points are calculated from two different
where 𝑑𝑖 = ‖𝐫𝑡 − 𝐩𝑖 ‖ ≈ ‖𝐫𝑡−1 − 𝐩𝑖 ‖ are known methods, one from the stereo camera, the other from the
quantities. By taking the derivative of formula (10) and setting monocular camera. These 3D points will be transferred from
it to zero, a linear system are obtained with the optimal camera the CCS to the WCS of robots at the start and is added to the
position 𝐫𝑡 is the unknown: local map. The 3D features added to the local map are static
feature points, because these will be used to estimate robot
𝕀3 −𝐤𝑖𝑡 𝐤𝑖𝑡 𝑇 𝕀3 −𝐤𝑖𝑡 𝐤𝑖𝑡 𝑇
(∑𝑖∈ℑ ) 𝐫𝑡 = ∑𝑖∈ℑ 𝐩𝑖 (11) positions at different times. For moving feature points, it will
𝑑𝑖 𝑑𝑖
cause an error when estimating the robot position.
where 𝐫𝑡 is the 3D position of the camera at time 𝑡 in WCS,
𝐤 𝑖𝑡 is the observation vector of the 𝑖 𝑡ℎ feature point at time 𝑡
in WCS, 𝐩𝑖 is the 3D position of the 𝑖 𝑡ℎ feature in WCS, ℑ
represents the set of features observed in the image at time t,
𝑑𝑖 = ‖𝐫𝑡−1 − 𝐩𝑖 ‖ is the known value, (⋅)𝑇 represents the
matrix transpose. 𝕀3 is a 3 × 3 matrix unit.
Equation (11) consisted of three equations corresponding
to three unknowns which are the 3D position of the camera in Fig. 7. Diagram of Initializing and Updating Local Maps.
WCS, these three equations will not change regardless of the
number of observed features. Therefore, the camera position At the initial time 𝑡 = 0, the robot position is initialized.
estimation can be solved efﬁciently in constant time. The The 3D positions of the feature points will be estimated in the
observed features used to calculate camera position are world coordinate from stereo camera. The 3D points are used
features that are not in moving regions. Suppose, if using the to initialize the local map.
features of the moving regions, the error of the estimated
camera position will increase. Since the camera position is At the time 𝑡(𝑡 ≠ 0), with given robot position, the local
estimated based on 3D feature points at time t-1 and the map is updated according to the following two systems:
corresponding feature observation vectors at time t, if you 1) Stereo camera system: After a given period of time, the
consider a feature of moving regions, the 3D position of system will be restarted to update the local map. The 3D
features in the environment will be different at the time t-1
points are calculated by stereo camera.
and t in the same WCS, and the feature observation vector at
time t will not match the truth vector of the 3D feature point at 2) Monocular camera system: During the feature tracking
time t-1 (i.e. the observation vector 𝐤 𝑖𝑡 will not match the process, some features will be lost and lost features will be
truth vector 𝐠 𝑖𝑡 ) will increase the error for camera location removed from the map. New features are added to the local
estimation. In this case, if it is a static feature, the error level map if the current number of features is smaller than the
will be 0 or very small. minimum allowable feature count. The 3D feature location 𝐩𝑖
of the new feature point is estimated based on a set of 𝜏
Equation (11) can use at least two features to calculate
camera position 𝐫𝑡 . As such, an efﬁcient 2-point RANSAC observation of the 𝑖 𝑡ℎ feature at different camera positions and
(Random Sample Consensus) can be applied for outlier is given by:
rejection. Using this algorithm will help reduce computational 𝐩∗𝒊 = argmin ∑𝑡∈𝜏‖(𝐩𝑖 − 𝐫𝑡 ) × 𝐤 𝑖𝑡 ‖2 (12)
time compared to the traditional 3-point algorithm [35] and 5- 𝐩𝑖
point algorithm [36].
where, 𝐫𝑡 is the 3D position of the camera at time 𝑡 in
As mentioned above, equation (11) is solved by the 2-point WCS, 𝐤 𝑖𝑡 is the observation vector of the 𝑖 𝑡ℎ feature point at
RANSAC algorithm. This algorithm includes the following time 𝑡, 𝐩𝑖 is the 3D position of the 𝑖 𝑡ℎ feature in WCS.
steps: Firstly, determine the number of iterations for the
algorithm. Secondly, at each iteration, define a random sample And solve equation (12) via the following linear system:
set of two elements which are two random points from the 3D (∑𝑡∈𝜏(𝕀3 − 𝐤 𝑖𝑡 𝐤 𝑖𝑡 𝑇 ))𝐩𝑖 = ∑𝑡∈𝜏(𝕀3 − 𝐤 𝑖𝑡 𝐤 𝑖𝑡 𝑇 )𝐫𝑡 (13)
features point set. Then, the estimation results based on this
sample will be evaluated by an error function. The steps above where 𝐫𝑡 is the 3D position of the camera at time 𝑡 in WCS,
are done several times to find the best robot position. Finally, 𝐤 𝑖𝑡 is the observation vector of the 𝑖 𝑡ℎ feature point at time 𝑡
after all iterations, RANSAC will converge at a good robot in WCS, 𝐩𝑖 is the 3D position of the 𝑖 𝑡ℎ feature in WCS, 𝜏 is
position, but not sure if this is the best position. This the set of times t that the 𝑖 𝑡ℎ feature is observed on the left
RANSAC algorithm ensures fast processing time, the ability camera, (⋅)𝑇 represents the matrix transpose, 𝕀3 is a 3 × 3
to estimate a good enough model and eliminate noise in the matrix unit.
data set.
Equation (13) is solved by basic matrix algebra, 𝜏 is
defined as 2 consecutive times t-1 and t.

40 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

The 3D positions of the feature points are calculated from motion vector of the tracked object. An object detector will
monocular camera will be updated in the following two cases: come in handy in this step, it will detect objects that is the
same kind with the tracking object and then, some of detected
 The feature points have been restored the 3D position objects will be chosen to add into the particle set. Object
from stereo camera system: Add the 3D position from detection based on deep learning can be robust to the vibration
monocular camera system to the local map. of camera.
 The feature points have not had the 3D position from In the matching step, YOLOv3 will detect a number of
stereo camera system: The feature points are added to objects of the same type as the tracked object and then select a
the local map if the current number of feature points in detected object with highest matching rate with the tracked
local map is smaller than the minimum allowable object as the current tracked object. If matching is failed, each
feature points count. particle will be matched with the previous object by an
Failure Detection and Recovery observation model. The correlation filter will be integrated to
a deep neural net to match objects. After that, resampling
Because the 3D positions of the feature points in local map could be performed if it is necessary.
are calculated from two systems: feature tracking by
monocular camera, feature detection and matching by stereo The tracking pipeline of the paper is as Fig. 8:
camera, therefore, it will accumulate errors. The error of the 1) Initialization: In the first frame, an object will be
local map at time 𝑡 is calculated as:
chosen and it is expressed as four parameters which are
1 ‖𝐩𝑚
𝑘 −𝐫𝑡 ‖ location and size.
𝛾 = |𝒦| ∑𝑘∈𝒦 (14)
‖𝐩𝑠𝑘 −𝐫𝑡 ‖ 𝑝(𝑡) = [𝑝𝑥(𝑡), 𝑝𝑦(𝑡), 𝑝𝑤(𝑡), 𝑝ℎ(𝑡)] (15)
where, 𝐩𝑘𝑠 is the location of feature 𝑘 obtained by stereo
𝑝𝑥(𝑡), 𝑝𝑦(𝑡) is the position of the object at time (t).
correspondence, 𝐩𝑚 𝑘 is the location of feature 𝑘 obtained by
monocular camera, 𝒦is the set of 3D points to calculate the 𝑝𝑤(𝑡), 𝑝ℎ(𝑡) is the width and height of the object at time
errors, 𝐫𝑡 is the camera position at time 𝑡.
(t).
The system works well when 𝛾 ≅ 1 and otherwise it is
error. When the system has error, the system will remove all 𝐹(𝑡) and 𝐹(𝑡 − 1) are sets of feature (SURF) points of the
features from the monocular camera system and restart the object at time t and time t – 1 from left or right camera.
local map based on the 3D positions of feature points from 𝐹(𝑡) = (𝑓𝑡0 , 𝑓𝑡1 , … , 𝑓𝑡𝑛 ) (16)
stereo camera 𝐩𝑘𝑠 .
0
(𝑓𝑡−1 1 𝑛 )
𝐹(𝑡 − 1) = , 𝑓𝑡−1 , … , 𝑓𝑡−1 (17)
C. Tracking by Detection
The goal of the paper’s tracking algorithm is to solve the Using a matching feature algorithm, feature points which
challenges of a moving camera which are vibration of the are not matched are eliminated and the remaining K pairs of
camera and motion of tracked object. The basic essence of the feature points are treated as a set of K motion vectors.
paper’s tracking algorithm is the integration of single tracker 𝑓𝑣 (𝑡) = (𝑓𝑡0 , 𝑓𝑡1 , … , 𝑓𝑡𝐾 ) (18)
and object detection to become a method called tracking by
detection. The single tracker will give the state of the tracked With each 𝑓𝑡𝑖 , i ∈ K, is a camera’s motion vector of a pair
object at every time step, moreover, its 3D position by of feature points.
reconstructing the environment can provide the necessary
This set is used to compute camera’s motion vector of the
information for tracking process. A single tracker is integrated
object at time t.
to object detection as YOLOv3 because YOLOv3 can detect
objects very quickly and accurately, it can support the single
tracker to find the tracked object more quickly and when the
single tracker fails to track down the tracked object, YOLOv3
is a useful helper to find the object being tracked. And
Tracking by Detection with 3D Environment Reconstruction
can estimate the three-dimensional position of the tracked
object.
This section describes the workflow of the paper’s tracking
algorithm, it includes four steps: Initialization, Prediction,
Matching and Resampling.
Firstly, an object could be selected in a frame to track,
however, the system might be given an image of object and it
would be found out before tracking.
After having the bounding box of the tracked object at Fig. 8. The Tracking Pipeline.
time t, in prediction step at time (t+1), the particle filter will
generate some particles and each particle will be guided by a

41 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

2) Prediction: In this step, the state of the object will be 3) Matching: Each particle will be compared with the
predicted in the next frame. previous object by an observation (matching) model, after
comparing, each particle will have a weight.
The particle filter generates some particles around the
previous object and each object has a weight which is the The correlation filter is presented in [5] [6] [7] will be used
importance of a particle. in observation model. This filter is obtained at time (t-1), next,
For example, in Fig. 9, the motorbike is being tracked: it will traverse the frame at time (t), this means the filter will
convolve with each region of the image from left to right, up
The yellow box indicates the object is being tracked. The to down. This result of a value which determine the
green box indicates a particle around the object. correlation between the object at time (t-1) and the region at
time (t). The higher the value is, the higher the region will be
The particle is guided by a camera’s motion vector which the state of the tracked object at time (t).
is obtained by matching SURF features in two consecutive
frames got from the left or right camera. Stereo camera is used The correlation filter is integrated with particle filter
to get two current frames instead of using two consecutive because it does not need to traverse the whole frame, it only
frames as [38]. After matching features, needs to convolve with each particle. This will reduce
computational cost.
The predicted state of tracked object is 𝑝̂ (𝑡):
But, to use this correlation filter, it has been learned in
𝑝̂ (𝑡) = 𝑝(𝑡 − 1) + 𝑓𝑣 (𝑡) + 𝑞(𝑡) (19) frame (t-1). It is called as 𝑐𝑟𝑓 and the region of the tracked
𝑝(𝑡 − 1) is the state of the tracked object in previous object at time (t-1) is 𝑟 and M, N is width and height of this
frame (t-1). region, and 𝑟𝑚,𝑛 with 𝑚, 𝑛 ∈ {0,1, … , 𝑀 − 1} × {0,1, … , 𝑁 −
1} is the region of the tracked object at time (t-1) after
𝑓𝑣 (𝑡) is a camera’s motion vector at time t computed by
translating to the right 𝑚 pixel and to below 𝑛 pixel.
feature matching.
Each 𝑟𝑚,𝑛 is corresponding with a value called 𝑔𝑚,𝑛 =
𝑞(𝑡) is Gaussian noise added. −(𝑚−𝑀 ⁄2)2 +(𝑛−𝑁 ⁄2)2
ⅇ . This value is a Gaussian value because
After that, a number of particles will be generated around 2𝜎 2
𝑝̂ (𝑡) and each particle is called 𝑝̂𝑖 (𝑡). Gaussian value expresses how far a pixel from its center.

With, The correlation filter is then learned through this

expression:
𝑝̂𝑖 (𝑡) = [𝑝𝑥
̂ (𝑡), 𝑝𝑦
̂ (𝑡), 𝑝𝑤 ̂ (𝑡)]
̂ (𝑡), 𝑝ℎ (20) 2 2
𝑐𝑟𝑓 ∗ = 𝑎𝑔𝑟𝑚𝑖𝑛𝑐𝑟𝑓 ∑𝑚,𝑛 ||𝑐𝑟𝑓 𝑟𝑚,𝑛 − 𝑔𝑚,𝑛 || + 𝜆‖𝑐𝑟𝑓 ‖ (27)
SCALE_CONSTANT = 0.2 (21) 2

And, After find out 𝑐𝑟𝑓 ∗ , the weight also known as the
correlation of each particle calculated by consoling 𝑐𝑟𝑓 ∗ with
̂ (𝑡)
𝑝𝑥 ̂ (𝑡)
𝑝𝑤
̂ 𝑡𝑖 =
𝑝𝑥 + 𝑟𝑎𝑛𝑑𝑜𝑚(0,1) ∗ (22) the region of a particle.
2 2
̂ (𝑡)
𝑝𝑦 ̂ (𝑡)
𝑝ℎ In this step, a deep neural network called VGG-19 is
̂ 𝑡𝑖 =
𝑝𝑦 + 𝑟𝑎𝑛𝑑𝑜𝑚(0,1) ∗ (23) applied [14] [16]. This network is trained and has a optimal
2 2
parameters. 𝑟𝑚,𝑛 and the region of each particle in this step
̂ 𝑡𝑖 = 𝑝𝑤
𝑝𝑤 ̂ (𝑡) + 𝑟𝑎𝑛𝑑𝑜𝑚(0,1) ∗ SCALE_CONSTANT (24)
have to go through this network to obtain three feature maps
̂ 𝑡𝑖
𝑝ℎ ̂ (𝑡) + 𝑟𝑎𝑛𝑑𝑜𝑚(0,1) ∗ SCALE_CONSTANT
= 𝑝ℎ (25) called conv-3, conv-4, conv-5. And with each convolutional
map, a corresponding 𝑐𝑟𝑓 ∗ will be learned. Three weights of a
1
𝑓𝑣 (𝑡) = ∑𝐾
𝑖 𝑓𝑡
𝑖
(26) particle called 𝑐1 , 𝑐2 , 𝑐3 and then the final weight of a particle
𝐾
is:
YOLOv3 [19] is used to detect objects in this step, after
detecting, some objects will be selected by their IoU with the 𝑤𝑡𝑖 = 𝑐1 + 𝑐2 + 𝑐3 (28)
tracked object, if this value is higher than a threshold, this
Finally, the weight of each particle will be normalized.
object will be kept and added into the particle set.
𝑤𝑡𝑖
𝑤𝑡𝑖 = ∑𝑁 𝑖 (29)
𝑖=0 𝑤𝑡

Fig. 10 is an example when a particle is passed through a

deep network and feature maps are obtained. After that, each
feature maps are convolved with a correlation filter to get
weight values. Finally, the weight of the particle is the sum of
three weight values.
Fig. 9. KITTI Tracking: 0003 Dataset [37].

42 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

TABLE I. SPEED/ACCURACY TRADEOFF ON THE AP [19]

Method mAP Inference time (s)

SSD321 28.0 61
SSD513 31.2 125
RetinaNet-50-500 32.5 73
RetinaNet-101-500 34.4 90
RetinaNet-101-800 37.8 198
Yolov3-320 28.2 22
Yolov3-416 31.0 29
Fig. 10. Inspired from [15].
Yolov3-608 33.0 51
4) Resampling: After matching step, the effective sample
size is computed and it is compared with a threshold to YOLOv3 [19] is a suitable solution to detect objects in
perform resampling. real-time applications, it achieves the accuracy rate ranked
between the groups listed below (see Table I) but its
1
̂=
𝑁 𝑁 (30) processing speed is unsurpassed.
∑ (𝑤ⅈ𝑡 )2
𝑖=1 YOLOv3 is trained on many datasets such as: PASCAL
In this step, YOLOv3 [19] will be used to detect objects VOC 2007 [42] and Microsoft COCO [43].
and choose the candidate which has the highest IoU value with IV. EXPERIMENTAL RESULTS
the tracking object to become the one which is tracked.
The proposed method will be evaluated on the accuracy
Object Detector is used to overcome the degeneracy rate of camera location, tracked object location.
problem of particle filter.
KITTI data set is used to evaluate the accuracy of the
If Object Detector does not find any objects, the Roullette proposed method. KITTI dataset has many different data sets,
table will be used to resample. two data sets “Raw Data” and “Object Tracking Evaluation
The Roullet table is described as Fig. 11, all the weights of 2012” are selected to experiment. The accuracy of the camera
particles will be put on a Roullet table: position estimation algorithm (camera position) is evaluated
on “Raw Data” dataset. Datasets 0091, 0060, 0095, 0113,
The wheel will be spinned N times (N is the number of 0106 and 0005 are selected in dataset “Raw Data”. Tracked
particles), and N particles will be got with different weights. object position estimation is evaluated based on the proposed
tracking by detection method on “Object Tracking Evaluation
5) Estimation: After the C.3 step, the estimated state of 2012” dataset. These following datasets 0000, 0004, 0005,
tracked object is calculated by this expression: 0010, 0011, 0020 are used because they have specific objects
𝑁
𝑝𝑡 = ∑𝑖=0 𝑤𝑡𝑖 𝑝̂𝑡𝑖 (31) to tracking.
Recently, there are various of object detection algorithms The experiments are performed based on Python
based on neural network. The paper explored several popular programming language (version 3.7.0) on computers with the
object detection DNN architecture: following speciﬁcations: Intel Corei5 5200U 2.7GHz/ RAM
4GB / Windows 10 operating system (for camera
Faster-RCNN [39], RetinaNet [40] achieved the high localization); For the tasks such as tracking and 3D
accuracy rate but they are not suitable to apply in real-time reconstruction, 2080 Titan GPU, Ubuntu 16.04 are used.
applications because their computational time are still slow.
A. Accuracy of Camera Location Estimation
SSD [41] achieved the accuracy rate lower than Faster-
RCNN and RetinaNet and it is not robust to detect small In this experiment, the data set KITTI is selected. The
objects. camera location is estimated with and without removal of
moving objects.
The solution to remove moving objects in the image is
described in the section 3.A.1 and Fig. 3. Distance error
between estimated camera position and ground-truth camera
position is estimated by formula:
1 𝑁 2
error-distance= ∑ [(𝐫𝑡𝑥 − GPS𝑡𝑥 )2 + (𝐫𝑡𝑦 − GPS𝑡𝑦 ) +
𝑁 𝑡=1
1⁄2
(𝐫𝑡𝑧 − GPS𝑡𝑧 )2 ] (32)
Fig. 11. Describe the Process of Spinning the Roullette Table. Inspired from where, 𝐫𝑡 is the estimated camera location at the time t;
[22].
𝐫𝑡𝑥 , 𝐫𝑡𝑦 , and 𝐫𝑡𝑧 are the coordinates x, y and z of camera

43 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

location 𝐫𝑡 , GPS𝑡 is camera location from GPS at time t; C. Accuracy of the Tracking Object Position based on Object
GPS𝑡𝑥 , GPS𝑡𝑦 , and GPS𝑡𝑧 are the coordinates x, y and z of Center and IoU in 2D
camera location from GPS𝑡 , N is the number of frames. The IoU metric is used to calculate the IoU between
From the experiment results (see Table II), estimated predicted boxes with its ground truth boxes in 2D.
camera position is quite good and if the removal of moving In Section 4.C, the following datasets are used because of
object is considered, the better results can be achieved some objects exist to track, others do not have a consistent
compared to the opposite case. object to follow.
B. Accuracy of the Tracked Object Center in 3D In Table IV, Euclide distance is used to estimate error
Object center is estimated as the average of all 3D points between predicted box center and its ground truth box center.
of the object being considered. The table has shown that the IoU metric of proposed tracking
method in case of using YOLOv3 is better than the opposite case.
The following distances are used to evaluate errors (see
Table III) between tracked object center and its ground truth In Table V, Euclidean distance is used to estimate the
object center: errors between the ground truth centers and the predicted
centers. The table has shown that when combines YOLOv3,
1 2
error-center= ∑𝑁
𝑡=1 [(𝐜𝑒𝑠𝑡,𝑥 (𝑡) − 𝐜𝑔𝑡,𝑥 (𝑡)) + (𝐜𝑒𝑠𝑡,𝑦 (𝑡) − the Euclid distances of proposed tracking method in case of
𝑁
2 2 1⁄2
using YOLOv3 is smaller than the opposite case.
𝐜𝑔𝑡,𝑦 (𝑡)) + (𝐜𝑒𝑠𝑡,𝑧 (𝑡) − 𝐜𝑔𝑡,𝑧 (𝑡)) ] (33)
D. Speed of the Tracking Algorithm
where 𝐜𝑒𝑠𝑡 (𝑡) is the center of the estimated 3D bounding Number of frames are estimated per second (FPS).
box at the time t; 𝐜𝑒𝑠𝑡,𝑥 (𝑡) , 𝐜𝑒𝑠𝑡,𝑦 (𝑡) and 𝐜𝑒𝑠𝑡,𝑧 (𝑡) are the
In Table VI, the paper measured the time taken to process
coordinates x, y and z of center 𝐜𝑒𝑠𝑡 (𝑡); 𝐜𝑔𝑡 (𝑡) is the center of a frame and then invert this time to get the given FPS. The
3D bounding box from ground-truth data at time t; 𝐜𝑔𝑡,𝑥 (𝑡), table has shown that FPS of proposed tracking method in case
𝐜𝑔𝑡,𝑦 (𝑡) and 𝐜𝑔𝑡,𝑧 (𝑡) are the coordinates x, y and z of of using YOLOv3 is about more three times faster than the
center 𝐜𝑔𝑡 (𝑡). N is number of frames. opposite case.
In this Section 4.B, a different dataset will be used to TABLE IV. COMPARE THE ACCURACY OF ESTIMATED OBJECT POSITION
evaluate the accuracy of the tracked object center because it BASED ON TRACKING BY DETECTION AND TRACKING WITHOUT DETECTION
has ground truth data about the object's center. (USING METRIC IOU, OBJECT DETECTION METHODYOLOV3)

From Table III, the center error in case of using YOLO has Method Average IoU 2D with object Average IoU without object
Data detection (YOLOv3) (%) detection (YOLOv3) (%)
achieved better results than the opposite case.
0000 0.66 0.29
TABLE II. ERROR OF THE ESTIMATED CAMERA POSITION COMPARED TO 0004 0.41 0.11
GPS BETWEEN 'WITHOUT REMOVAL MOVING OBJECTS' AND 'REMOVAL
MOVING OBJECTS' 0005 0.74 0.6
0010 0.82 0.64
Distance Without removal Removal moving
Data Frames 0011 0.78 0.48
(m) moving object (m) object (m)
0020 0.65 0.6
0091 150 97.5 0.8449 0.8245
Average 0.68 0.45
0060 70 0 0.0197 0.0193
Outliers
0095 150 137.86 1.1654 1.0322
0018 0.08 0.03
0113 80 16.27 0.5542 0.5440
0106 174 83.61 0.5199 0.5105 TABLE V. COMPARE THE ACCURACY OF ESTIMATED OBJECT POSITION
BASED ON TRACKING BY DETECTION AND TRACKING WITHOUT DETECTION
0005 150 66.78 1.5397 0.6656 (USING EUCLIDE DISTANCE, OBJECT DETECTION METHOD YOLOV3)
Average 0.7740 0.5993 Method Average errors of centers with Average errors of centers
Data YOLOv3 (pixel) without YOLOv3 (pixel)
TABLE III. EVALUATING THE ERRORS BETWEEN THE TRACKED OBJECT 0000 16.14 234.3
CENTER AND ITS GROUND TRUTH CENTER IN 3D
0004 10.74 61.64
error-center with error-center without YOLOv3 0005 3.66 1.47
Data
YOLOv3 (m) (m)
0010 3.62 4.11
0005 0.5421 0.8608
0011 5.57 14.06
0010 0.5470 1.7793 0020 6.22 13.13
0011 0.4271 2.104 Average 7.66 54.79
Average 0.5054 1.5868 Outliers
0018 25.56 315.52

44 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

TABLE VI. EVALUTATE THE SPEED OF THE PROPOSEDTRACKING support the single tracker become more accurate because it
ALGORITHM WITH AND WITHOUT YOLO V3.
can detect more participants for particle filter, besides, it also
Method Average FPS with object Average FPS without object can detect objects very fast and accurately.
Data detection (YOLOv3) (Hz) detection (YOLOv3) (Hz)
Although the framework works well on the KITTI dataset,
0000 11 4 it is necessary to improve both localization and tracking
0004 9 3 algorithms.
0005 12 3 In the localization algorithm, the algorithm should be
0010 12 3 tested on a real robot. The localization algorithm can also pave
the way for the dynamic obstacle avoidance. And combination
0011 12 3
lidar with stereo camera is a novel way to explore.
0020 11 4
In the tracking algorithm, the most time-consuming step is
Average 11 3 the matching step of the Particle Filter. In the matching step,
Outliers Correlation Filter trained at each frame so the matching has
0018 6 3
increased the computational time. In the future, the
Correlation Filter should be replaced by a pre-trained model
V. DISCUSSION such as Siamese net that can compare the features of the target
at the consecutive times in real time.
The experimental results have shown that camera position
is estimated quite well because the moving features are ACKNOWLEDGMENT
removed when estimating camera position. However, the This research is funded by Viet Nam National University
moving features removal algorithm is still limited and should Ho Chi Minh City (VNUHCM) under grant no. B2018-18-01.
be improved in the future, though it has shown an
improvement in the accuracy of the camera position when the Thank to Director of AIOZ Pte Ltd Company Erman
moving features are removed. Though the error between the Tjiputra, CTO Quang D. Tran for the valuable support on
estimated camera location and ground truth camera location is internship cooperation.
still remaining, but it is useful in the environments such as in- REFERENCES
doors, noisy GPS and in the cases where input of tracked [1] Q. Zhao, Z. Yang and H. Tao, “Differential earth mover’s distance with
object is image. The experimental results has shown the its applications to visual tracking,” in Proceedings of the IEEE
important role of visual information in MOT. Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no.
2, pp. 274–287, 2010.
The experimental results have also shown that the [2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
processing speed is suitable for real-time applications. The detection,” in Proceedings of the IEEE Computer Society Conference on
speed of tracking algorithm is increased when Particle filter is Computer Vision and Pattern Recognition (CVPR'05), vol. 1, pp. 886-
integrated to YOLOv3, this is because the tracking algorithm 893, 2005.
could use either of these methods to track the object. And [3] D. Ta, W. Chen, N. Gelfand and K. Pulli, “Surftrac: Efficient tracking
when it uses YOLOv3, the algorithm can run very fast. and continuous object recognition using local feature descriptors,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Because of this, the speed increases significantly, it achieves Recognition, pp. 2937-2944, 2009.
more three times faster than the conventional method. In [4] D. A. Ross, J. Lim, R.-S. Lin and M.-H. Yang, “Incremental learning for
comparison to accuracy, the tracking by detection method is robust visual tracking,” in International Journal of Computer Vision, vol.
also more effective than the single tracker method based on 77, no. 1-3, pp. 125–141, 2008.
metric of IoU in 2D or the tracked object center. [5] D. S. Bolme, J. R. Beveridge, B. A. Draper and Y. M. Lui, “Visual
Object Tracking using Adaptive Correlation Filters,” in Proceedings of
VI. CONCLUSIONS AND FUTURE WORK the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pp. 2544-2550, 2010.
In this paper, a unified system consisted of robot
[6] H. K. Galoogahi, T. Sim and S. Lucey, “Multi-Channel Correlation
localization, environment reconstruction and object tracking Filters,” in Proceedings of the IEEE International Conference on
based on stereo camera and IMU has been proposed. Computer Vision, pp. 3072-3079, 2013.
[7] J. F. Henriques, R. Caseiro, P. Martins and J. Batista, “High-Speed
In localization, the paper’s contribution is a solution to Tracking with Kernelized Correlation Filters,” in Proceedings of the
estimate camera position and tracked object position based on IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
stereo camera and IMU with the removal of moving features. 37, no. 3, pp. 583-596, 2014.
It has shown that the accuracy rate of locatization is improved [8] X. Li, K. Wang, W. Wang and Yang Li, “A multiple object tracking
and the computational time is suitable to real-time applications. method using Kalman filter,” in Proceedings of the 2010 IEEE
International Conference on Information and Automation, pp. 1862-
In tracking, the paper’s contributions are: (1) particles 1866, 2010.
guided by a motion vectors are calculated using pairs of SURF [9] P. Kalane, “Target Tracking Using Kalman Filter,” in International
feature points, so the direction the object is pointing to is Journal of Science & Technology (IJST), vol. 2, no. 2, Article ID
gathered; (2) an observation (matching) model contains IJST/0412/03, 2012.
correlation filter and a deep neural network (VGG-19), it can [10] H. A. Patel and D. G. Thakore, “Moving Object Tracking Using Kalman
deal with changes of translation of the object; (3) Tracking by Filter”, in International Journal of Computer Science and Mobile
Computing, vol. 2, no. 4, pp. 326-332, 2013.
detection with an object detection algorithm (YOLOv3) can

45 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 4, 2020

[11] K. Nummiaro, E. Koller-Meier and L. V. Gool, “Object Tracking with [27] Y. Xu, V. John, S. Mita et al., “3D Point Cloud Map Based Vehicle
an Adaptive Color-Based Particle Filter,” in DAGM 2002: Pattern Localization Using Stereo Camera,” in 2017 IEEE Intelligent Vehicles
Recognition, Lecture Notes in Computer Science (LNCS, vol. 2449), pp. Symposium (IV), pp. 487-492, 2017.
353-360, 2002. [28] S. Hong, M. Li, M. Liao and P. v. Beek, “Real-time mobile robot
[12] S. Prabu and G. Hu, “Stereo Vision based Localization of a Robot using navigation based on stereo vision and low-cost GPS,” in Intelligent
Partial Depth Estimation and Particle Filter,” in Proceedings of the 19th Robotics and Industrial Applications using Computer Vision 2017, pp.
World Congress, The International Federation of Automatic Control, 10-15(6), 2017.
vol. 47, no. 3, pp. 7272-7277, 2014. [29] H. Bay, A. Ess, T. Tuytelaars and L. Van Gool, “Speeded-Up Robust
[13] S. Chen and W. Liang, “Visual Tracking by Combining Deep Learned Features (SURF),” in Computer Vision and Image Understanding, vol.
Image Representation with Particle Filter,” in ICIC Express Letters, Part 110, no. 3, pp. 346-359, 2008.
B: Applications, vol. 3, no. 1, pp. 1-6, 2012. [30] M. Muja and D. G. Lowe, “Fast Approximate Nearest Neighbors with
[14] C. Ma, J. B. Huang, X. Yang and M. H. Yang, “Robust Visual Tracking Automatic Algorithm Configuration,” in VISAPP International
via Hierarchical Convolutional Features,” in Proceedings of the IEEE Conference on Computer Vision Theory and Applications, vol. 1, 2009.
Transactions on Pattern Analysis and Machine Intelligence (Early [31] D. G. Lowe, “Distinctive image features from scale-invariant
Access), pp. 1-1, 2018. keypoints,” in International journal of computer vision, vol. 60, no. 2,
[15] R. J. Mozhdehi and H. Medeiros, “Deep Convolutional Particle Filter for pp. 91-110, 2004.
Visual Tracking,” in Proceedings of the IEEE International Conference [32] B. D. Lucas and T. Kanade, “An iterative image registration technique
on Image Processing (ICIP), pp. 3650-3654, 2017. with an application to stereo vision,” in Proceedings of the 7th
[16] T. Zhang, C. Xu and M. H. Yang, “Multi-task Correlation Particle Filter International Joint Conference on Artificial Intelligence (IJCAI '81), pp.
for Robust Object Tracking,” in Proceedings of the IEEE Conference on 674-679, 1981.
Computer Vision and Pattern Recognition (CVPR), pp. 4819-4827, [33] S. Kim, K. Yun, K. Yi, S. Kim and J. Choi, “Detection of moving
2017. objects with a moving camera using non-panoramic background model,”
[17] J. S. Lim and W. H. Kim, “Detection and Tracking Multiple Pedestrians in Machine Vision and Applications, vol. 24, no. 5, pp. 1015– 1028,
from a Moving Camera,” in ISVC 2005: Advances in Visual 2013.
Computing, LNCS 3804, pp. 527-532, 2005. [34] R. I. Hartley and A. Zisserman, “Multiple View Geometry in Computer
[18] Y. Chen, R. H. Zhang, L. Shang and E. Hu, “Object detection and Vision,” Cambridge University Press, 2004.
tracking with active camera on motion vectors of feature points and [35] R. Haralick, C. Lee, K. Ottenberg and M. Nolle, “Review and Analysis
particle filter,” in The Review of Scientific Instruments, vol. 84, no. 6, of Solutions of the Three Point Perspective Pose Estimation Problem,”
2013. in International Journal of Computer Vision, vol. 13, no. 3, pp. 331-356,
[19] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” 1994.
in University of Washington, 2018. [36] D. Nister, “An efficient solution to the five-point relative pose problem,”
[20] S. Shen, Y. Mulgaonkar, N. Michael and V. Kumar, “Vision-Based in Proceedings of the IEEE Computer Society Conference on Computer
State Estimation for Autonomous Rotorcraft MAVs in Complex Vision and Pattern Recognition, vol. 2, pp. II-195, 2003.
Environments,” in Proceedings of the IEEE International Conference on [37] A. Geiger, P. Lenz and R. Urtasun, “Are we ready for Autonomous
Robotics and Automation, pp. 1758-1764, 2013. Driving? The KITTI Vision Benchmark Suite,” in Conference on
[21] L. Gong, M. Yu and T. Gordon, “Online codebook modeling based Computer Vision and Pattern Recognition (CVPR), 2012.
background subtraction with a moving camera,” in 2017 3rd [38] S. Minaeian, J. Liu and Y. J. Son, “Effective and Efficient Detection of
International Conference on Frontiers of Signal Processing (ICFSP), pp. Moving Targets from a UAV’s Camera,” in IEEE Transactions on
136-140, 2017. Intelligent Transportation Systems, vol. 19, no. 2, pp. 497 – 506, 2018.
[22] S. Minaeian, J. Liu and Y. J. Son, “Effective and Efficient Detection of [39] S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-
Moving Targets from a UAV’s Camera,” in IEEE Transactions on Time Object Detection with Region Proposal Networks,” in IEEE
Intelligent Transportation Systems, vol. 19, no. 2, pp. 497 – 506, 2018. Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no.
[23] F. Zhong, S. Wang, Z. Zhang, C. Zhou and Y. Wang, “Detect-SLAM: 6, pp. 1137-1149, 2015.
Making Object Detection and SLAM Mutually Beneficial,” in IEEE [40] T. Y. Lin, P. Goyal, R. Girshick, K. He and P Dollar, “Focal Loss for
Winter Conference on Applications of Computer Vision (WACV), pp. Dense Object Detection,” in 2017 IEEE International Conference on
1001-1010, 2018. Computer Vision (ICCV), pp. 2999-3007, 2017.
[24] D. Nistér, O. Naroditsky and J. Bergen, “Visual Odometry,” in [41] W. Liu, D. Anguelov, D. Erhan et al., “SSD: Single Shot MultiBox
Proceedings of the IEEE Computer Society Conference on Computer Detector,” in Computer Vision – ECCV 2016, pp. 21-37, 2016.
Vision and Pattern Recognition (CVPR 2004), vol. 1, pp. I-I, 2004.
[42] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A.
[25] B. Kitt, A. Geiger and H. Lategahn, “Visual Odometry based on Stereo Zisserman, “The PASCAL Visual Object Classes Challenge 2007
Image sequences with RANSAC-based Outlier Rejection Scheme,” in (VOC2007) Results,” 2007.
2010 IEEE Intelligent Vehicles Symposium, pp. 486-492, 2010.
[43] C. C. Lin, “Detecting and Tracking Moving Objects from a Moving
[26] Y. Liu, Y. Gu, J. Li and X. Zhang, “Robust Stereo Visual Odometry Platform,” Georgia Institute of Technology, 2012.
Using Improved RANSAC-Based Methods for Mobile Robot
Localization,” in Sensors 2017, vol. 17, no. 10, 2017.

46 | P a g e
www.ijacsa.thesai.org

Visual Object Tracking Based On The Motion Predict
No ratings yet
Visual Object Tracking Based On The Motion Predict
29 pages
Video Editing 101 Getting Start With Adobe Premiere Pro
No ratings yet
Video Editing 101 Getting Start With Adobe Premiere Pro
11 pages
Signals
No ratings yet
Signals
14 pages
Cohen Me Dioni
No ratings yet
Cohen Me Dioni
24 pages
Abstract
No ratings yet
Abstract
15 pages
An Adaptive Tracking Algorithm For Robotics and Computer Vision Application
No ratings yet
An Adaptive Tracking Algorithm For Robotics and Computer Vision Application
82 pages
Sensors 23 07921 v2
No ratings yet
Sensors 23 07921 v2
17 pages
E247 Real TimeObjectDetectionandTrackingonaMovingCameraPlatform
No ratings yet
E247 Real TimeObjectDetectionandTrackingonaMovingCameraPlatform
7 pages
Fulltext
No ratings yet
Fulltext
28 pages
Sensors
No ratings yet
Sensors
23 pages
A Real Time Object Distance Measurement
No ratings yet
A Real Time Object Distance Measurement
6 pages
A Real Time Face Tracking System Based On Multiple Information Fusion
No ratings yet
A Real Time Face Tracking System Based On Multiple Information Fusion
19 pages
Detection and Tracking of A Moving Objec
No ratings yet
Detection and Tracking of A Moving Objec
14 pages
Abstract
No ratings yet
Abstract
3 pages
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
No ratings yet
Design of An Effective Multiple Objects Tracking Framework For Dynamic Video Scenes
13 pages
JIN-XIA YU, Henan Polytechnic University ZI-XING CAI, Central South University ZHUO-HUA DUAN, Shaoguan University
No ratings yet
JIN-XIA YU, Henan Polytechnic University ZI-XING CAI, Central South University ZHUO-HUA DUAN, Shaoguan University
19 pages
Detection and Tracking of Moving Object With A Mobile Robot Using Laser Scanner
No ratings yet
Detection and Tracking of Moving Object With A Mobile Robot Using Laser Scanner
19 pages
Object Tracking
100% (1)
Object Tracking
22 pages
Engineering and Technology Journal: Hadeel N. Abdullah, Nuha H. Abdulghafoor
No ratings yet
Engineering and Technology Journal: Hadeel N. Abdullah, Nuha H. Abdulghafoor
9 pages
Chen 2013
No ratings yet
Chen 2013
7 pages
Ruchitha Paper
No ratings yet
Ruchitha Paper
5 pages
Objects Detection and Extraction in Video Sequences Captured by A Mobile Camera
No ratings yet
Objects Detection and Extraction in Video Sequences Captured by A Mobile Camera
7 pages
Real-Time Camera Tracking Using A Particle Filter
No ratings yet
Real-Time Camera Tracking Using A Particle Filter
10 pages
Bastian Leibe, Konrad Schindler, Nico Cornelis, and Luc Van Gool
No ratings yet
Bastian Leibe, Konrad Schindler, Nico Cornelis, and Luc Van Gool
14 pages
Proposed Multi Object Tracking Algorithm
No ratings yet
Proposed Multi Object Tracking Algorithm
10 pages
IEEE - MODAT - Sairaj
No ratings yet
IEEE - MODAT - Sairaj
4 pages
Moving Object Recognization, Tracking and Destruction
No ratings yet
Moving Object Recognization, Tracking and Destruction
45 pages
1.mot Ijsae
No ratings yet
1.mot Ijsae
10 pages
Zhang 2020
No ratings yet
Zhang 2020
5 pages
Stereo-Based Pedestrian Detection For Collision-Avoidance Applications
No ratings yet
Stereo-Based Pedestrian Detection For Collision-Avoidance Applications
12 pages
Object Tracking Methods-A Review
No ratings yet
Object Tracking Methods-A Review
7 pages
Object Detection and Identification Using Deep Learning and OpenCV
No ratings yet
Object Detection and Identification Using Deep Learning and OpenCV
7 pages
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
No ratings yet
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
6 pages
Image Processing Techniques For Object Tracking in Video Surveillance A Survey 2015 2
No ratings yet
Image Processing Techniques For Object Tracking in Video Surveillance A Survey 2015 2
6 pages
Doaa Nasser Alghamdi-442202873
No ratings yet
Doaa Nasser Alghamdi-442202873
5 pages
Moving Objects Detection Based On Histogram of Oriented Gradient Algorithm Chip For Hazy Environment
No ratings yet
Moving Objects Detection Based On Histogram of Oriented Gradient Algorithm Chip For Hazy Environment
12 pages
Combined Major Project
No ratings yet
Combined Major Project
8 pages
Visual Object Tracking Based On Dynamic Weights and Gaze Density
No ratings yet
Visual Object Tracking Based On Dynamic Weights and Gaze Density
9 pages
Kalman 3. Robot Paltill
No ratings yet
Kalman 3. Robot Paltill
9 pages
Module 2 - Setting-Up Computer Networks
No ratings yet
Module 2 - Setting-Up Computer Networks
25 pages
Detecting and Tracking Moving Objects For Video Surveillance
No ratings yet
Detecting and Tracking Moving Objects For Video Surveillance
7 pages
Single Object Tracking A Survey of Methods Dataset
No ratings yet
Single Object Tracking A Survey of Methods Dataset
15 pages
A Survey On Multiple Object Detection and Tracking IJERTV3IS10574
No ratings yet
A Survey On Multiple Object Detection and Tracking IJERTV3IS10574
3 pages
MMU Ug Fee Structure International
No ratings yet
MMU Ug Fee Structure International
2 pages
Ijaerv10n9spl 339
No ratings yet
Ijaerv10n9spl 339
9 pages
07 An - Investigate - On - Moving - Object - Tracking - and - Detection - in - Images
No ratings yet
07 An - Investigate - On - Moving - Object - Tracking - and - Detection - in - Images
7 pages
Non-Linear Moving Target Tracking: A Particle Filter Approach
No ratings yet
Non-Linear Moving Target Tracking: A Particle Filter Approach
7 pages
Object Detection and Trackinfg in Videos: N. Rasathi
No ratings yet
Object Detection and Trackinfg in Videos: N. Rasathi
8 pages
Tempest 160314194757
No ratings yet
Tempest 160314194757
28 pages
Smart Cards
No ratings yet
Smart Cards
39 pages
Computer Vision Paper
No ratings yet
Computer Vision Paper
3 pages
Image Processing: Object Tracking With Color Detection
No ratings yet
Image Processing: Object Tracking With Color Detection
14 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
38 pages
Real Time Object Detection and Tracking Using Deep Learning and Opencv
No ratings yet
Real Time Object Detection and Tracking Using Deep Learning and Opencv
4 pages
Human and Moving Object Detection and Tracking Using Image Processing
No ratings yet
Human and Moving Object Detection and Tracking Using Image Processing
4 pages
Object Tracking Techniques For Video Tracking: A Survey: Mansi Manocha, Parminder Kaur
No ratings yet
Object Tracking Techniques For Video Tracking: A Survey: Mansi Manocha, Parminder Kaur
5 pages
Real Time Unattended Object Detection and Tracking Using MATLAB
No ratings yet
Real Time Unattended Object Detection and Tracking Using MATLAB
8 pages
Design and Development of Automated 3D Visual Tracking System
No ratings yet
Design and Development of Automated 3D Visual Tracking System
8 pages
Digital Image Processing
No ratings yet
Digital Image Processing
13 pages
Reasoning EMBEDDED FIGURE Questions, Answers & Explanation: Exercise
No ratings yet
Reasoning EMBEDDED FIGURE Questions, Answers & Explanation: Exercise
4 pages
Object Tracking
No ratings yet
Object Tracking
5 pages
Object Detection
No ratings yet
Object Detection
4 pages
Edoc List
No ratings yet
Edoc List
11 pages
CSC134 Chapter 1
No ratings yet
CSC134 Chapter 1
64 pages
Power Amplifier: Technical Data Sheet
No ratings yet
Power Amplifier: Technical Data Sheet
5 pages
Project 314
No ratings yet
Project 314
14 pages
QSAN - Production Training - 2022
No ratings yet
QSAN - Production Training - 2022
37 pages
FlexivaHP-series FAX5-40KW TM Rev Z
No ratings yet
FlexivaHP-series FAX5-40KW TM Rev Z
178 pages
Project Report, OnGC - Diksha
No ratings yet
Project Report, OnGC - Diksha
49 pages
NASSCOM Healthcare - Transforming Through Innovation-July 2022
No ratings yet
NASSCOM Healthcare - Transforming Through Innovation-July 2022
44 pages
UNIT 4 Matrices
No ratings yet
UNIT 4 Matrices
75 pages
Paper 19-A Comprehensive Study On Medical Image Segmentation
No ratings yet
Paper 19-A Comprehensive Study On Medical Image Segmentation
18 pages
Captura de Pantalla 2024-05-17 A La(s) 8.35.06 A.M.
No ratings yet
Captura de Pantalla 2024-05-17 A La(s) 8.35.06 A.M.
46 pages
Air Quality 2
No ratings yet
Air Quality 2
34 pages
RK23EUA12 - Skill Based Assignment 1 - INT428 - Final Patent - Ai
No ratings yet
RK23EUA12 - Skill Based Assignment 1 - INT428 - Final Patent - Ai
14 pages
Part B Unit 2 AI Project Cycle
No ratings yet
Part B Unit 2 AI Project Cycle
25 pages
Wilber Mugisha Edited
No ratings yet
Wilber Mugisha Edited
37 pages
Automation Cat Term 1 2024
No ratings yet
Automation Cat Term 1 2024
4 pages
Paper 27-Recent Advances in Medical Image Classification
No ratings yet
Paper 27-Recent Advances in Medical Image Classification
20 pages
Lecture 1: Introduction To Scilab: Kebaier@
No ratings yet
Lecture 1: Introduction To Scilab: Kebaier@
55 pages
Bharathiar University:: Coimbatore - 641 046 Common Entrance Test For M.Phil. / Ph.D. 2014 - Score List
No ratings yet
Bharathiar University:: Coimbatore - 641 046 Common Entrance Test For M.Phil. / Ph.D. 2014 - Score List
138 pages
Procom
No ratings yet
Procom
26 pages
Paper 3-The New High Performance Face Tracking System-LQN
No ratings yet
Paper 3-The New High Performance Face Tracking System-LQN
13 pages
Philips Ax 1100 00 Users Manual 383720 PDF
No ratings yet
Philips Ax 1100 00 Users Manual 383720 PDF
5 pages
Chief Product Officer
No ratings yet
Chief Product Officer
2 pages
Algorithmics Brochure
No ratings yet
Algorithmics Brochure
14 pages
Paper 64-Person Re Identification System at Semantic
No ratings yet
Paper 64-Person Re Identification System at Semantic
10 pages
NRT/KS/19/5676: CLS-19983 1 (Contd.)
No ratings yet
NRT/KS/19/5676: CLS-19983 1 (Contd.)
3 pages
Haier India Customer Care No. 1800 419 9999 India Customer Care
No ratings yet
Haier India Customer Care No. 1800 419 9999 India Customer Care
1 page
Video Downloader Online - Videos Downloaden Kostenlos
No ratings yet
Video Downloader Online - Videos Downloaden Kostenlos
1 page
Pikachu Official Artwork Gallery Pokémon Database
No ratings yet
Pikachu Official Artwork Gallery Pokémon Database
1 page
Theoretical method to increase the speed of continuous mapping in a three-dimensional laser scanning system using servomotors control
From Everand
Theoretical method to increase the speed of continuous mapping in a three-dimensional laser scanning system using servomotors control
Lars Lindner
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
From Everand
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Fouad Sabry
No ratings yet

Paper 6-A New Framework of Moving Object Tracking

Uploaded by

Paper 6-A New Framework of Moving Object Tracking

Uploaded by

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 11, No. 4, 2020

A New Framework of Moving Object Tracking based

Fig. 1. Diagram of Camera Position Estimation. Inspired from [20].

the moving regions. Finally, remove the features located in the

2) 3D Feature location via stereo correspondences: The

Thus, (𝑋, 𝑌, 𝑍) is 3D camera coordinates of the feature 𝐤 𝐂𝑖𝑡 = 𝐾 −1 𝑝𝑖𝐼 (8)

With, The correlation filter is then learned through this

Fig. 10 is an example when a particle is passed through a

TABLE I. SPEED/ACCURACY TRADEOFF ON THE AP [19]

Method mAP Inference time (s)

You might also like