robotics

This paper presents a deep learning framework for real-time person detection and tracking using a mobile robot equipped with a stereo camera. The system employs a head detector and a high-speed regression network tracker, along with a PID controller for smooth robot movement, to effectively follow a person in crowded environments. The proposed method has been tested in real-world scenarios, demonstrating its robustness and effectiveness in tracking individuals.

Uploaded by

archanaonlineexam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

robotics

Uploaded by

archanaonlineexam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO.

1, JANUARY 2007 1

Deep learning framework for robot for person

detection and tracking
Adarsh Ghimire, Xiaoxiong Zhang, Naoufel Werghi, Sajid Javed, Jorge Dias

Abstract—Robustly tracking a person of interest in the crowd with a robotic platform is one of the cornerstones of human-robot interaction.
The robot platform which is limited by the computational power, rapid movements and occlusions of target requires an efficient and robust
framework to perform tracking. This paper proposes a deep learning framework for tracking a person using a mobile robot with stereo
camera. The proposed system detects a person based on its head, then utilizes the low cost, high speed regression network based
tracker to track the person of interest in real time. The visual servoing of the mobile robot has been designed using PID controller which
utilizes tracker output and depth estimation of the person in subsequent frames, hence providing smooth and adaptive movement of the
arXiv:2205.04213v1 [cs.RO] 19 Apr 2022

robot based on target movement. The proposed system has been tested in real environment, thus proving its effectiveness.

Keywords—Computer Vision, Robot, Unmanned Ground Vehicle, Deep learning, Tracker, Control.

1 I NTRODUCTION kalman filter in the tracking. Some of the recent works

used a stereo vision for simultaneous tracking and depth
T RADITIONAL non-mobile intelligent monitoring and
surveillance devices are limited by their stationary
nature. For instance, a CCTV camera at airports devel-
estimation. [3] used HOG-based classifier on stereo feed
and unscented kalman filter to track the person while [1]
proposed CNN tracker for simultaneously estimating the
oped to track specific person, is limited by its field of target and tracking the person. In addition, [4] utilized
view. In order to cover up for limited field of vision, SVM and optical flow for tracking the target, while
some solutions employed intelligent web of cameras to [6] proposed SVM with HOG features, block matching
keep on tracking the subject when it moves. However, algorithm for frames, and kalman filter.
these solutions require huge computational cost and are In this paper, we present a complete deep learning
still limited by its immobile nature when the subject exits framework that efficiently follows a person in the crowd
the view of the system. in real time. The system uses a head detector for de-
To counter loosing a subject from its view, many tecting person in the crowd, and then uses high speed
researches have taken place by combining robotics and regression network tracker[7] to track the subject at 150
computer vision. This enables a robot with camera to FPS by incorporating temporal information. The robotic
move along with the subject and provide a complete motion has been designed using two PID controllers
surveillance over it [1]–[5]. This type of autonomous based on tracker output and depth value. Kobuki robot
robot is a very plausible solution for surveillance task. equipped with a stereo camera is used for experiments.
In addition, it is also widely applicable in health care, The main contributions of this paper are :
entertainment, and industries [6]. For example, such
robots can assist handicapped people with household 1) Two wheel driven robot visual servoing algorithm
works, help warehouse workers during warehousing, 2) Appropriate depth estimation from stereo camera
serve nurses in hospitals by carrying medicines and 3) A complete robot system that can track a person
equipments, etc. The major challenges faced by these in real time
robotic systems while following a person in crowded
environments have been real time and robust tracking 2 M ETHOD
requirements.
Figure 1 shows the block diagram of the proposed person
A person following robot requires knowledge from
tracking system. Detailed explanation of the complete
multiple fields such as person detection, tracking move-
system is described in sub-sections 2.1, 2.2, 2.3, and, 2.4.
ment, and robot control system. Most works in the litera-
ture employ monocular camera with other sensors to ob-
tain the depth information of the target. [5] developed a
system that uses an online boosting algorithm with con- RGB feed Head
RE3 Tracker Angular PID Control
Detector
volutional channel features on monocular camera feed Stereo
Camera
and depth information from laser range finder sensor to Feed
Robot Control System Move Base

follow the person. [2] used ultrasonic sensor to obtain Depth feed Depth
Linear PID Control
Estimator
the depth information of the target and used extended
Fig. 1: System Block Diagram
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 2

2.1 Head Detector: The head is the most distinctive

part of a person. Thus, a fast head detector model [8]
has been used to detect people in the feed. The detector
outputs bounding boxes and the corresponding probabil-
ities of heads in the first frame. Among the several heads,
the algorithm selects the one with the highest probability
and sends that to the tracker.
2.2 RE3 Tracker: The tracker [7] first initializes itself
with features inside the bounding box given by the
detector, then tracks and updates features in subsequent
frames. The tracker outputs bounding box of the object (a) Robot following the person (b) Person A is partially oc-
A cluded
in every frame, which is used by the robot control system
to generate corresponding motion control signals.
2.3 Depth Estimator: Initial depth value of the object in
the current frame is estimated using the median of depth
values in the bounding box. We do this to overcome high
variance of depth values given by stereo camera. In addi-
tion, fast changes in object movements result in transient
abrupt robot movements. To address this issue, the final
depth value is adjusted according to previous estimates
by using exponential weighted moving average.
2.4 Robot Control System: To smooth the movement
of the robot, two PID controllers have been designed. (d) Robot following after oc-
First PID controller controls the angular movement of (c) Person A is fully occluded
clusion
the robot based on the person’s horizontal movement
on the camera frame tracked by the tracker. Second
PID controller controls the linear movement of the robot Fig. 2: Person Following Robot in the crowd of two
based on the person’s movement towards or away from people
the robot which is estimated by depth estimator. [2] M. Wang, D. Su, L. Shi, Y. Liu, and J. V. Miro, “Real-
time 3d human tracking for mobile robots with mul-
3 R ESULTS tisensors,” in 2017 IEEE International Conference on
Figure 2 reports examples of four different tracking Robotics and Automation (ICRA), 2017, pp. 5081–5087.
scenarios where a robot is following the person A (white [3] Y. Sun, L. Sun, and J. Liu, “Real-time and fast rgb-
dressed) in the presence of person B (black dressed). Full d based people detection and tracking for service
demo can be seen in this link 1 . robots,” in 2016 12th World Congress on Intelligent
4 C ONCLUSION Control and Automation (WCICA), 2016, pp. 1514–
1519.
In this paper, fast and efficient person following robot [4] E. Chen, “Folo: A vision-based human-following
system using a real-time recurrent regression network robot,” in Proceedings of the 2018 3rd International
tracker in the context of robotics has been described. The Conference on Automation, Mechanical Control and
proposed system could perform very well in crowded Computational Engineering (AMCCE), Atlantis Press,
indoor and outdoor environments. Possible future work pp. 224–232.
includes incorporating more robust and efficient tracker, [5] K. Koide and J. Miura, “Convolutional channel
and more advanced control for the robot. features-based person identification for person fol-
ACKNOWLEDGMENT lowing robots,” in Intelligent Autonomous Systems 15,
Springer International Publishing, 2019, pp. 186–
This work acknowledges the support provided by the
198.
Khalifa University of Science and Technology under
[6] T.-H. Tsai and C.-H. Yao, “A robust tracking al-
award No. RC1-2018-KUCARS.
gorithm for a human-following mobile robot,” IET
Image Processing, vol. 15, no. 3, pp. 786–796, 2021.
R EFERENCES [7] D. Gordon, A. Farhadi, and D. Fox, “Re3 : Real-time
[1] B. X. Chen, R. Sahdev, and J. Tsotsos, “Integrat- recurrent regression networks for object tracking,”
ing stereo vision with a cnn tracker for a person- CoRR, vol. abs/1705.06368, 2017. [Online]. Avail-
following robot,” 2017, pp. 300–313. able: http://arxiv.org/abs/1705.06368.
[8] X. Zhang, S. Javed, A. Obeid, J. Dias, and N. Werghi,
1. Video Link “Gender recognition on rgb-d image,” in 2020 IEEE
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 3