DS-SLAM A Semantic Visual SLAM Towards Dynamic Environments

This document summarizes a research paper titled "DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments". It proposes DS-SLAM, a semantic visual SLAM system that can handle dynamic environments. DS-SLAM combines semantic segmentation with optical flow to filter out dynamic objects. It builds a dense semantic 3D map in real-time. Experiments show DS-SLAM improves accuracy and robustness compared to ORB-SLAM2 in dynamic environments. The dense semantic map can also enable high-level tasks for robots.

Uploaded by

Dimas de la Peña

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views7 pages

DS-SLAM A Semantic Visual SLAM Towards Dynamic Environments

Uploaded by

Dimas de la Peña

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DS-SLAM: A Semantic Visual SLAM towards Dynamic

Environments
Chao Yu1, Zuxin Liu2, Xin-Jun Liu*1, Fugui Xie1, Yi Yang3, Qi Wei3, Qiao Fei*3

robust-perception age and more research to achieve genuinely



Abstract— Simultaneous Localization and Mapping (SLAM)

is considered to be a fundamental capability for intelligent robust perception and navigation for autonomous robots is
mobile robots. Over the past decades, many impressed SLAM needed.
systems have been developed and achieved good performance
under certain circumstances. However, some problems are still The typical SLAM methods only provide a map with
not well solved, for example, how to tackle the moving objects in geometric entities (points, planes, etc.), which do not have
the dynamic environments, how to make the robots truly semantic attributes distinction between them. However, the
understand the surroundings and accomplish advanced tasks. semantic information is needed for the robot to understand the
In this paper, a robust semantic visual SLAM towards dynamic scene surrounding them. With the development of deep
environments named DS-SLAM is proposed. Five threads run learning, some networks could achieve good performance in
in parallel in DS-SLAM: tracking, semantic segmentation, local semantic segmentation[4]. Therefore, combining these
mapping, loop closing and dense semantic map creation. networks with SLAM could produce the semantic map, and
DS-SLAM combines semantic segmentation network with thus improve the perception level of robots [5, 6].
moving consistency check method to reduce the impact of
dynamic objects, and thus the localization accuracy is highly The robustness of the SLAM system in dynamic
improved in dynamic environments. Meanwhile, a dense environments is also a challenge. While modern SLAM
semantic octo-tree map is produced, which could be employed system has been successfully demonstrated mostly in specific
for high-level tasks. We conduct experiments both on TUM circumstances, unexpected changes of surroundings would
RGB-D dataset and in real-world environment. The results probably corrupt the quality of the state estimation and even
demonstrate the absolute trajectory accuracy in DS-SLAM can lead to system failure. For example, the presence of dynamics
be improved one order of magnitude compared with in the environment, like walking people, might deceive feature
ORB-SLAM2. It is one of the state-of-the-art SLAM systems in association in vision-based SLAM system. Although some
high-dynamic environments. progress has been made to reduce the impact of the changes of
I. INTRODUCTION the environment in laser-based SLAM, for instance, some
approaches combine inertial measurement unit with visual
In the past years, visual SLAM has been extensively SLAM to improve robustness, like [7], but the problem is still
investigated, because images store a wealth of information and not well solved in pure vision-based SLAM.
can be employed for other vision-based applications, like
semantic segmentation and object detection. The framework In this paper, we focus on reducing the impact of dynamic
of modern visual SLAM system is quite mature, which often objects in vision-based SLAM by combining semantic
consists of several essential parts, such as feature extraction segmentation network with optical flow method and
front-end, state estimation back-end, loop closure detection meanwhile providing a semantic presentation of the octo-tree
and so forth. Besides, some advanced SLAM algorithms have map [8], which could be employed for high-level tasks for
already attained satisfactory performance, such as robots. The overview of DS-SLAM is shown in Fig. 1.
ORB-SLAM2[1], LSD-SLAM[2]. The main contributions of this paper include:
However, some issues remain unsolved, for example, most 1. A complete semantic SLAM system in dynamic
of the existing algorithms are fragile. On the one hand, they environments (DS-SLAM) is proposed based on
have difficulty in handling all kinds of environments, such as ORB-SLAM2 [2], which could reduce the influence of
extraordinarily dynamic or rough environments. On the other dynamic objects on pose estimation. The effectiveness of the
hand, their map models are often based on geometric system is evaluated on TUM RGB-D dataset [9]. The results
information, like the landmark-based map and point cloud indicate that DS-SLAM outperforms ORB-SLAM2
map, so they do not provide any high-level understanding of significantly regarding accuracy and robustness in dynamic
the surroundings. According to [3], SLAM enters the environments. The system is also integrated with Robot
Operating System (ROS) [10], and its performance is verified
by testing DS-SLAM on a robot in a real environment.
Resrach supported by the National Natural Science Foundation of China
under Grant 91648116 and 51425501. 2. We put a real-time semantic segmentation network in an
1
Department of Mechanical Engineering, Tsinghua University, Beijing, independent thread, which combining semantic segmentation
China with moving consistency check method to filter out dynamic
2
School of Instrumentation Science and Opto-electronics Engineering,
portions of the scene, like walking people. Thus the
Beihang University, Beijing, China
3
Department of Electronic Engineering, Tsinghua University, Beijing, performance of localization module and mapping module is
China
*
corresponding author: [email protected] and
[email protected]
Figure 1. The overview of DS-SLAM. The raw RGB image is utilized to semantic segmentation and moving consistency check simultaneously.
Then remove outliers and estimate pose. Semantic octo-tree map is built in an independent thread based on the pose, depth image, and semantic
segmentation results.

Figure 2. The framework of DS-SLAM. Local mapping thread and loop closing thread are the same as ORB-SLAM2. The former one processes
new keyframes and performs local bundle adjustment to achieve an optimal reconstruction in the surroundings of the camera pose, while the later one
searches for loops and perform a graph optimization if a loop is detected.

improved in respects of robustness and accuracy in dynamic II. RELATED WORKS

scenarios.
A. Semantic SLAM
3. DS-SLAM creates a separate thread to build a dense Since purely geometry-based maps could not provide
semantic 3D octo-tree map [8]. The dense semantic 3D conceptual knowledge of the surroundings to facilitate
octo-tree map adopts log-odds score method to filter out complex tasks, associating semantic concepts with geometry
unstable voxels and update these voxels’ semantic meaning. entities in the environment has become a popular research
In the rest of this paper, the structure is as follows. Section domain recently. In many previous works, the semantic maps
II provides an overview of various current accomplishments in often consisted of two parts: a geometric part and a semantic
respects of semantic SLAM and SLAM in dynamic part. Afterwards, a group of approaches trained the object
environments. Then section III presents the framework of this recognition subsystem in advance and attached semantic
whole SLAM system in detail, that how to detect dynamic information to the object models recognized. Vasudevan et al.
objects and produce semantic maps. Subsequently, Section IV [11] apply a simple naïve Bayesian classifier to identify the
provides qualitative and quantitative results of performance of scene categories and give a hierarchical probabilistic
DS-SLAM both on TUM RGB-D dataset [9] and in a representation of space based on pre-trained objects. More
real-world environment to demonstrate the effectiveness and recently, an efficient description of 3D models was proposed
accuracy of the system. Finally, a brief conclusion and by Sengupta et al. [12], which embedded an octo-tree into a
discussion are given in section V. hierarchical robust Markov Random Filed to provide each
voxel in the map with the semantic label. With the advent of
deep learning, the results of semantic segmentation have been
vastly improved. Sünderhauf et al. [13] apply convolutional feature-based SLAM solution that enables us to detect
neural network to 3D points segmentation and correlate dynamic objects and produce semantic octo-tree map. The
semantic label with object based on the nearest neighbor overview of DS-SLAM is shown in Fig. 1.
method, and then add or update the map of target object point
cloud information and subordinate type confidence value. Five threads run in parallel in DS-SLAM: tracking,
semantic segmentation, local mapping, loop closing and dense
However, their work only focuses on semantic mapping map creation. The framework of the system is shown in Fig. 2.
and object recognition, while the semantic information is not The raw RGB images captured by Kinect2 are processed in
well used in other parts. Bao et al. [14] first try to estimate tracking thread and semantic segmentation thread
camera pose, points, and objects jointly utilizing both simultaneously. The tracking thread first extracts ORB feature
geometry and semantic attributes in the scene, which points, then check moving consistency of the feature points
significantly improve the object recognition accuracy. In this roughly and save the potential outliers. Then the tracking
paper, semantic information is not only used to generate an thread waits for the image that has pixel-wise semantic label
octo-tree map based representation of the environment but predicted by semantic segmentation thread. After the
also utilized to filter outliers in the process of tracking in a segmentation result arrives, the ORB feature points outliers
dynamic environment. To perform semantic segmentation located in moving objects will be discarded based on the result
accurately, we use semantic segmentation network to follow and the potential outliers detected before. Then, the
the state-of-the-art line of work. transformation matrix is calculated by matching the rest of
stable feature points.
B. SLAM in dynamic environments
A basic assumption in most current SLAM approaches is B. Semantic Segmentation
that the environment is static. However, active objects like This system is designed for real-world applications, so
humans, exist in many real-world scenes. Therefore, most there should be a balance between real-time and accuracy.
state-of-the-art approaches that initially designed for doing DS-SLAM adopts SegNet [4] to provide pixel-wise semantic
SLAM in static environments are not capable of handling segmentation based on caffe [17] in real-time. The SegNet
severe dynamic scenarios. trained on PASCAL VOC dataset [18] could segment 20
classes in total. In real applications, people are most likely to
To address this issue, identifying the dynamic objects from
be dynamic objects, so we assume that feature points located
the static parts, and then discarding them before pose in people are most likely to be outliers.
estimation is necessary. In dense SLAM, many moving target
detection methods often based on optical flow technique. C. Moving Consistency Check
Optical flow is generated if movements exist in the image, so Since motion segmentation is time-consuming and
static background and moving target could be distinguished by semantic segmentation results can be obtained from another
computing the inconsistency of optical flow. For example, thread, we only need to determine whether the key points in
Fang et al. [15] use optimum-estimation and uniform sampling segmentation results are moving. If some points are
method to detect dynamic objects. Although it is efficient in determined to be dynamic within a segmented object, then the
time comparing with other kinds of derivative optical flow object could be regarded as a dynamic object. The idea of
methods, its accuracy is lower, and the computation load is moving points detection in this paper is straightforward. The
still large, especially when sampling densely in a big image. first step is to calculate optical flow pyramid to get the
Wang et al. [16] detect dynamic objects by clustering the matched feature points in the current frame. Then if the
image based on point trajectories and exclude them from matched pair is too close to the edge of the image or the pixel
energy function minimization. Their method is robust, but difference of the 3 × 3 image block at the center of the
cannot be performed in real-time. matched pair is too large, the matched pair will be discarded.
III. SYSTEM INTRODUCTION The third step is to find fundamental matrix by using
RANSAC with the most inliers. Then calculate the epipolar
In this section, DS-SLAM will be introduced in details. line in the current frame by using the fundamental matrix.
This section contains five aspects. First, architecture chart of Finally, determine whether the distance from a matched point
the framework of DS-SLAM is presented. Second, we give a to its corresponding epipolar line is less than a certain
brief introduction to the real-time semantic segmentation threshold. If the distance is greater than the threshold, then the
method that adopted in DS-SLAM. Then the algorithm that is matched point will be determined to be moving.
used for checking the moving consistency of the feature points
is introduced. Subsequently, the outliers rejection method is The fundamental matrix maps the feature points in the last
demonstrated, which combine the semantic segmentation and frame to the corresponding search domain in the current frame,
the moving consistency check to filter out dynamic objects. that is, the epipolar line. Let p1 , p2 denote the matched points
Finally, we present the method to build semantic octo-tree in the last frame and current frame respectively, and P1 , P2 are
map.
their homogeneous coordinate form:
A. Framework of DS-SLAM
P1  [u1 , v1 ,1], P2  [u2 , v2 ,1],
In real-world applications, accurate pose estimation and (1)
reliability in harsh environments are critical factors to evaluate p1  [u1 , v1 ], p2  [u2 , v2 ]
autonomous robots. ORB-SLAM2 has an excellent where u, v are the values in the image frame. Then the
performance in most practical situations. Therefore, epipolar line that is denoted as I1 can be computed by the
ORB-SLAM2 is adopted in DS-SLAM to provide a global
following equation:
X   u1  After the semantic segmentation result come out, if no
people are detected, then all the ORB features would be
I1  Y  FP1  F  v1 
  (2)
    matched with the last frame directly to predict the pose.
 Z  1  Otherwise, determine whether the people are moving by using
where X , Y , Z represent line vector and F represents the moving consistency check results. If the people are
determined to be static, then predict the pose directly,
fundamental matrix. Then the distance from the matched point otherwise remove all the ORB feature points that fall within
to its corresponding epipolar line is determined as follows: the outline of people before matching. In this way, the impact
of dynamic objects could be reduced remarkably.
| P2T FP1 |
D (3)
E. Dense semantic 3D octo-tree map building
|| X ||2  || Y ||2
The semantic octo-tree mapping thread gets new keyframes
where D represents the distance. The algorithm to check
moving consistency and determine the dynamic points is from tracking thread and segmentation results from semantic
shown in Algorithm 1, where  is a preset threshold value. segmentation thread. The keyframes’ transform matrix and the
depth images are used to generate the local point cloud. Then
the local point cloud would be converted and maintained in a
global octo-tree map. We adopt the octo-tree representation [8]
because it is flexible, compact and updatable. The octo-tree
map is stored efficiently, and it is easy to be employed for
navigation.
Semantic information is also incorporated into the
octo-tree map. Each voxel in the octo-tree map is associated
with a specific color, and each color represents a semantic
label. For example, the voxel in red color represents it belongs
to a sofa, while the voxel in pink represents it belongs to a
person. All the processes of modeling and semantic fusion are
D. Outliers Rejection done in a probabilistic fashion, so it is convenient to update the
Because of the flexible deformation and complex motion voxel’s attribute. In this way, the dense semantic 3D octo-tree
of moving objects, such as human, it is difficult for moving map can provide a basis for mobile robots to accomplish
consistency check method to extract the contours of the advanced tasks.
complete dynamic regions, and the time cost of extracting the DS-SLAM is oriented to deal with dynamic environments,
whole contours is very expensive. In DS-SLAM, since the so the dynamic objects should not exist in the map. Semantic
semantic segmentation network have been employed, the full segmentation results can help us filter out dynamic objects
outlines of objects could be easily obtained. Our idea is to effectively. However, the accuracy of semantic segmentation
combine semantic information and moving consistency check is limited. In complex situations, for example, objects overlap
results to complete the establishment of the two-level semantic each other, the semantic segmentation results may be
knowledge base: the object is moving or not moving. If there incomplete or even wrong. To address this problem, log-odds
are certain number of dynamic points produced by moving score method [8] is used in DS-SLAM to minimize the impact
consistency check fall in the contours of a segmented object, of dynamic objects. The log odds score is used to represent the
then this object is determined to be moving. If the segmented possibility of an individual voxel being occupied
object is determined to be moving, then remove all the feature quantitatively. Let p  [0,1] denote the probability of a voxel
points located in the object’s contour. In this way, outliers can
be eliminated precisely. Besides, the impact of wrong being occupied, and l  R is denoted as the log odds score of
segmentation can also be reduced to a certain extent. the probability. l can be calculated by logit transform:
Furthermore, the time that the tracking thread waits for the p
l  log it ( p )  log( ) (4)
semantic segmentation result from another thread could be 1- p
taken full use of. During the waiting period, moving The inverse transform:
consistency check could be performed. In the next
experimental results section, Table IV gives the consumption exp(l )
time in DS-SLAM. It can be seen that the time used in the p  log it -1 (l )  (5)
exp(l )  1
tracking thread is approximately equal to the semantic
segmentation thread. Let Zt denote the observation result of a voxel n at time
t , and its log odds score from the beginning to time t is
Because human activities interfere with robots’
localization seriously in most real application scenarios, and
L(n | Z1:t ) ,Then at time t  1 , the log odds score of voxel n
the categories that semantic segmentation networks can can be calculated as follows:
identify are limited, we would take the human as a typical
L(n | z1:t 1 )  L(n | z1:t -1 )  L(n | zt ) (6)
representative of dynamic objects in the rest of this paper. In
theory DS-SLAM applies to any multiple recognized and where L(n | Zt ) is equal to  if the voxel n is observed
segmented dynamic objects. to be occupied at time t , otherwise 0. The increment  is a
pre-defined value. This formula represents that the log odds
score of a voxel will increase when the voxel is repeatedly -r
observed to be occupied, otherwise decrease. The occupied   100% (7)
o
probability p of a voxel can be calculated by inverse logit where  represents the value of improvement, o
transform. Only when the occupying probability p is greater represents the value of ORB-SLAM2 and r represents the
than a pre-defined threshold, the voxel is regarded to be value of DS-SLAM.
occupied and will be visualized in the octo-tree map. In other
words, the voxels that have been observed to be occupied As we can see from Table I - Table III, DS-SLAM can
many times are considered to be stable occupied voxels. With make the performance in most high-dynamic sequences get an
this method, we can handle the map building problem in order of magnitude improvement. In terms of ATE, the RMSE
dynamic environments well. and S.D. improvement values can reach up to 97.91% and
97.94% respectively. The results indicate that DS-SLAM can
IV. EXPERIMENTAL RESULTS improve the robustness and stability of SLAM system in
high-dynamic environments significantly. However, in
In this section, experimental results would be presented to low-dynamic sequences, for example, the fr3_sitting_static
demonstrate the effectiveness of DS-SLAM. We tested the sequence, the improvements of performance are not obvious.
time required for each module, and evaluate the performance We think the reason is that original ORB-SLAM2 can handle
of DS-SLAM in dynamic environments with public TUM the low-dynamic scenarios easily and achieve good
RGB-D dataset. The sequences in TUM RGB-D dataset performance, so the space that can be improved is limited.
containing dynamic objects are challenging because the
moving people in the observation could corrupt the robustness Fig. 3. and Fig. 4. show selected ATE and RPE plots from
and accuracy of SLAM algorithm. In extreme case, the ORB-SLAM2 and DS-SLAM in high dynamic
moving people occupy more than half of the image. We also fr3_walking_xyz sequence respectively. As we can see, the
integrate DS-SLAM with ROS and qualitatively demonstrate errors are significantly reduced in DS-SLAM.
it on a physical robot in typical dynamic environments to
evaluate its real-time performance and dense semantic 3D
octo-tree map creation module. All the experiments are
performed on a computer with Intel i7 CPU, P4000 GPU, and
32GB memory. The physical robot is TurtleBot2, and image
sequences are captured by Kinect V2 camera.
A. Evaluation using TUM RGB-D dataset
The TUM RGB-D dataset [9] provides several sequences
in dynamic environments with accurate ground truth obtained
with an external motion capture system, such as walking,
sitting and desk. The walking sequences are mainly used for
Figure 3. ATE and RPE from ORB-SLAM2
our experiments because some people walk back and forth in
the walking sequences. The people in these sequences could
be regarded as high-dynamic objects, and they are most
difficult to handle with. The sitting sequences are also used,
but they are considered as low-dynamic sequences as the
persons in them just move a little bit.
DS-SLAM adopt ORB-SLAM2 [1] as global SLAM
solution, which it is recognized as one of the most outstanding
and stable SLAM algorithms at present, so we make a
comparison between them. Metrics Absolute Trajectory Error
(ATE) and Relative Pose Error (RPE) are used for quantitative
evaluation. Metric ATE stands for global consistency of
trajectory, while metric RPE measures the translational and Figure 4. ATE and RPE from DS-SLAM
rotational drift [9].
For practical applications, real-time performance is a
The quantitative comparison results are shown in Table I - crucial indicator to evaluate SLAM system. We test the time
Table III, where xyz, static, rpy and half in the first column required for some major modules to process. The results are
stand for four types of camera ego-motions [9], for example, shown in Table IV. The average time in the main thread to
xyz represents the camera moves along the x-y-z axes. We process each frame is 59.4ms, including semantic
present the values of RMSE, Mean Error, Median Error and segmentation, visual odometry estimation, pose graph
Standard Deviation (S.D.) in this paper, while RMSE and S.D. optimization and dense 3D semantic octo-tree map creation.
are more concerned because they can better indicate the Compared with previous non real-time methods to filter out
robustness and stability of the system. We also show the dynamic objects, such as [19], DS-SLAM is more satisfied
values of improvement of DS-SLAM compared to the original with the needs of the real time.
ORB-SLAM2. The improvement values in the tables are
calculated as follows:
TABLE I. RESULTS OF METRIC ROTATIONAL DRIFT (RPE)

ORB-SLAM2 DS-SLAM Improvements

Sequences
RMSE Mean Median S.D. RMSE Mean Median S.D. RMSE Mean Median S.D.
fr3_walking_xyz 7.7432 5.8765 4.5340 4.9895 0.8266 0.5836 0.4192 0.5826 89.32% 90.07% 90.75% 88.32%
fr3_walking_static 3.8958 1.6845 0.3571 3.5095 0.2690 0.2416 0.2259 0.1182 93.09% 85.66% 36.75% 96.63%
fr3_walking_rpy 8.0802 5.4558 2.7828 5.9499 3.0042 1.9187 0.9902 2.3065 62.82% 64.83% 64.42% 61.23%
fr3_walking_half 7.3744 4.5917 1.8143 5.7558 0.8142 0.7033 0.6217 0.4101 88.96% 84.68% 65.73% 92.87%
fr3_sitting_static 0.2881 0.2598 0.2493 0.1244 0.2735 0.2450 0.2351 0.1215 5.07% 5.70% 5.68% 2.36%

TABLE II. RESULTS OF METRIC TRANSLATIONAL DRIFT (RPE)

ORB-SLAM2 DS-SLAM Improvements

Sequences
RMSE Mean Median S.D. RMSE Mean Median S.D. RMSE Mean Median S.D.
fr3_walking_xyz 0.4124 0.3110 0.2465 0.2684 0.0333 0.0238 0.0181 0.0229 91.93% 92.34% 92.66% 91.48%
fr3_walking_static 0.2162 0.0905 0.0155 0.1962 0.0102 0.0091 0.0082 0.0048 95.27% 90.00% 47.07% 97.58%
fr3_walking_rpy 0.4249 0.2825 0.1487 0.3166 0.1503 0.0942 0.0457 0.1168 64.64% 66.66% 69.24% 63.10%
fr3_walking_half 0.3550 0.2161 0.0774 0.2810 0.0297 0.0256 0.0226 0.0152 91.62% 88.16% 70.74% 94.60%
fr3_sitting_static 0.0095 0.0083 0.0073 0.0046 0.0078 0.0068 0.0061 0.0038 17.61% 17.81% 17.01% 16.96%

TABLE III. RESULTS OF METRICS ABSOLUTE TRAJECTORY ERROR (ATE)

ORB-SLAM2 DS-SLAM Improvements

Sequences
RMSE Mean Median S.D. RMSE Mean Median S.D. RMSE Mean Median S.D.
fr3_walking_xyz 0.7521 0.6492 0.5857 0.3759 0.0247 0.0186 0.0151 0.0161 96.71% 97.13% 97.42% 95.71%
fr3_walking_static 0.3900 0.3554 0.3087 0.1602 0.0081 0.0073 0.0067 0.0036 97.91% 97.95% 97.82% 97.74%
fr3_walking_rpy 0.8705 0.7425 0.7059 0.4520 0.4442 0.3768 0.2835 0.2350 48.97% 49.26% 59.84% 48.02%
fr3_walking_half 0.4863 0.4272 0.3964 0.2290 0.0303 0.0258 0.0222 0.0159 93.76% 93.95% 94.40% 93.05%
fr3_sitting_static 0.0087 0.0076 0.0066 0.0043 0.0065 0.0055 0.0049 0.0033 25.94% 26.87% 26.29% 23.15%
area are removed effectively. This figure is better viewed in
TABLE IV. TIME EVALUATION color.
Module ORB feature Moving Semantic
extraction consistency segmentation
check
Thread Tracking Tracking Semantic
segmentation
Time(ms) 9.375046 29.50869 37.57330
B. Evaluation in Real Environment
To demonstrate the robustness and real-time performance
of DS-SLAM, we integrate it with ROS and conduct extensive
experiments in a laboratory environment. Images are captured
by Kinect V2 camera with 960  540 resolution.
Figure 5. Experimental results in lab environment
Fig. 5. qualitatively demonstrate the results of outliers
rejection. The sub-figures from top row to bottom row are Fig. 6. shows the dense 3D semantic octo-tree map
ORB feature extraction results, optical flow based moving reconstruction result. The red voxels represent sofa, and the
consistency check results, semantic segmentation results, and blue voxels represent monitor. Since log-odds score method is
images after outliers removal respectively. The green dots used to filter out unstable voxel in the map, the result of
represent the location of ORB feature points, and the red short reconstruction is not affected by moving people. Besides, 2D
lines represent the direction of optical flow. As we can see, the cost map is produced by octo-tree map and can be employed
person is determined to be moving by moving consistency for navigation.
check, and then feature points that fall in the moving person
[3] C. Cadena et al., "Past, Present, and Future of Simultaneous
Localization and Mapping: Toward the Robust-Perception Age," IEEE
Transactions on Robotics, vol. 32, no. 6, pp. 1309-1332, 2016.
[4] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep
convolutional encoder-decoder architecture for image segmentation,"
arXiv preprint arXiv:1511.00561, 2015.
[5] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A.
J. Davison, "Slam++: Simultaneous localisation and mapping at the
level of objects," in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2013, pp. 1352-1359.
[6] S. Vasudevan, S. Gächter, V. Nguyen, and R. Siegwart, "Cognitive
maps for mobile robots—an object based approach," Robotics and
Autonomous Systems, vol. 55, no. 5, pp. 359-371, 2007.
[7] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale,
"Keyframe-based visual-inertial odometry using nonlinear
optimization," (in English), International Journal Of Robotics
Figure 6. Dense 3D semantic octo-tree map Research, vol. 34, no. 3, pp. 314-334, Mar 2015.
[8] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W.
V. CONCLUSIONS Burgard, "OctoMap: an efficient probabilistic 3D mapping framework
based on octrees," Autonomous Robots, vol. 34, no. 3, pp. 189-206,
In this paper, a complete real-time robust semantic SLAM 2013.
(DS-SLAM) system is proposed, which could reduce the [9] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, "A
influence of dynamic objects on pose estimation, and benchmark for the evaluation of RGB-D SLAM systems," pp. 573-580,
meanwhile provide semantic presentation of the octo-tree map 2012.
[10] M. Quigley et al., "ROS: an open-source Robot Operating System," in
[8]. Five threads run in parallel in DS-SLAM: tracking,
ICRA workshop on open source software, 2009, vol. 3, no. 3.2, p. 5:
semantic segmentation, local mapping, loop closing and dense Kobe.
map creation. A real-time semantic segmentation network [11] S. Vasudevan and R. Siegwart, "Bayesian space conceptualization and
SegNet is combined with moving consistency check to filter place classification for semantic maps in mobile robotics," Robotics
out dynamic portions of the scene, like walking people. Then and Autonomous Systems, vol. 56, no. 6, pp. 522-537, 2008.
the matched feature points would be removed out of those [12] S. Sengupta and P. Sturgess, "Semantic octree: Unifying recognition,
reconstruction and representation via an octree constrained higher order
detected dynamic regions, and thus improve the performance MRF," in Robotics and Automation (ICRA), 2015 IEEE International
of robustness and accuracy in dynamic scenarios. Also, the Conference on, 2015, pp. 1874-1879: IEEE.
dense semantic 3D octo-tree map adopts log-odds score [13] N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford, and I. Reid,
method to filter out unstable voxels, and it can be employed "Meaningful Maps $-$ Object-Oriented Semantic Mapping," arXiv
for navigation and complex tasks for robots. The effectiveness preprint arXiv:1609.07849, 2016.
[14] S. Y. Bao, M. Bagra, C. Yu-Wei, and S. Savarese, "Semantic structure
of the system against dynamic objects is tested on challenging from motion with points, regions, and objects," pp. 2703-2710, 2012.
dynamic sequences from TUM RGB-D dataset. The results [15] Y. Fang and B. Dai, "An improved moving target detecting and
indicate that DS-SLAM outperforms ORB-SLAM2 tracking based on optical flow technique and kalman filter," in
significantly regarding accuracy and robustness in the Computer Science & Education, 2009. ICCSE'09. 4th International
high-dynamic environment. The system is also integrated with Conference on, 2009, pp. 1197-1202: IEEE.
[16] Y. Wang and S. Huang, "Towards dense moving object segmentation
Robot Operating System (ROS), and its performance is based robust dense RGB-D SLAM in dynamic scenarios," in Control
verified by testing DS-SLAM on a physical robot in the real Automation Robotics & Vision (ICARCV), 2014 13th International
environment. Conference on, 2014, pp. 1841-1846: IEEE.
[17] Y. Jia et al., "Caffe:Convolutional Architecture for Fast Feature
However, DS-SLAM still exists some ongoing works. For Embedding," pp. 675-678, 2014.
instance, the types of objects that can be recognized in [18] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A.
semantic segmentation network are restricted, which limit its Zisserman, "The pascal visual object classes (voc) challenge,"
scope of application. The octo-tree map has to be re-built International journal of computer vision, vol. 88, no. 2, pp. 303-338,
when the loop closure is detected. In the future, we would 2010.
[19] Y. Sun, M. Liu, and M. Q.-H. Meng, "Improving RGB-D SLAM in
improve the performance of DS-SLAM in terms of real time dynamic environments: A motion removal approach," Robotics and
by optimizing the moving consistency check method. Autonomous Systems, vol. 89, pp. 110-122, 2017.
Moreover, the dense semantic octo-tree map would be adopted
for the mobile robots to accomplish high-level tasks.

ACKNOWLEDGMENT
This work was supported in part by the National Natural
Science Foundation of China under Grant 91648116 and
51425501.

REFERENCES
[1] R. Mur-Artal and J. D. Tardós, "Orb-slam2: An open-source slam
system for monocular, stereo, and rgb-d cameras," IEEE Transactions
on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017.
[2] J. Engel, T. Schöps, and D. Cremers, "LSD-SLAM: Large-scale direct
monocular SLAM," in European Conference on Computer Vision,
2014, pp. 834-849: Springer.

BR235 - EN - Col18 SAP Convergent Charging
No ratings yet
BR235 - EN - Col18 SAP Convergent Charging
195 pages
Camera, LiDAR and Multi-Modal SLAM Systems For Autonomous Ground Vehicles-A Survey
No ratings yet
Camera, LiDAR and Multi-Modal SLAM Systems For Autonomous Ground Vehicles-A Survey
35 pages
4HANA QM Basic Configuration Guide
No ratings yet
4HANA QM Basic Configuration Guide
80 pages
D8.1M 2007PV PDF
No ratings yet
D8.1M 2007PV PDF
5 pages
Apple Iphone Field Test Mode PDF
No ratings yet
Apple Iphone Field Test Mode PDF
3 pages
Dynamic SLAM A Visual SLAM in Outdoor Dynamic Scen
No ratings yet
Dynamic SLAM A Visual SLAM in Outdoor Dynamic Scen
15 pages
A Robust Deep Learning Enhanced Monocular SLAM System For Dynamic Environments
No ratings yet
A Robust Deep Learning Enhanced Monocular SLAM System For Dynamic Environments
8 pages
Remotesensing 14 03010
No ratings yet
Remotesensing 14 03010
47 pages
An Adaptive ORB-SLAM3 System For Outdoor Dynamic E
No ratings yet
An Adaptive ORB-SLAM3 System For Outdoor Dynamic E
16 pages
Sensors 23 04364
No ratings yet
Sensors 23 04364
19 pages
Applsci 09 02105 PDF
No ratings yet
Applsci 09 02105 PDF
17 pages
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
No ratings yet
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
8 pages
MonoSLAM Real-Time Single Camera SLAM
No ratings yet
MonoSLAM Real-Time Single Camera SLAM
16 pages
Localization and Navigation Analysis of Mobile Robot Based
No ratings yet
Localization and Navigation Analysis of Mobile Robot Based
6 pages
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM For Multi-Robot Systems
No ratings yet
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM For Multi-Robot Systems
8 pages
Vdo Slam
No ratings yet
Vdo Slam
15 pages
Recent Advances in Simultaneous Localiza
No ratings yet
Recent Advances in Simultaneous Localiza
34 pages
Remotesensing 14 03324 v2
No ratings yet
Remotesensing 14 03324 v2
15 pages
Past, Present and Future of SLAM
No ratings yet
Past, Present and Future of SLAM
25 pages
A General Framework For Lifelong Localization and Mapping in Changing Environment
No ratings yet
A General Framework For Lifelong Localization and Mapping in Changing Environment
8 pages
Sensors: A Review of Visual-Lidar Fusion Based Simultaneous Localization and Mapping
No ratings yet
Sensors: A Review of Visual-Lidar Fusion Based Simultaneous Localization and Mapping
20 pages
A Review of Visual SLAM For Robotics - Evolution, Properties, and Future Applications - Frobt-11-1347985
No ratings yet
A Review of Visual SLAM For Robotics - Evolution, Properties, and Future Applications - Frobt-11-1347985
18 pages
Envslam: Combining Slam Systems and Neural Networks To Improve The Environment Fusion in Ar Applications
No ratings yet
Envslam: Combining Slam Systems and Neural Networks To Improve The Environment Fusion in Ar Applications
21 pages
A Survey of Simultaneous Localization and Mapping With An
No ratings yet
A Survey of Simultaneous Localization and Mapping With An
17 pages
Localization and Mapping Aproximation For Autonomous Ground Platforms, Implementing SLAM Algorithms
No ratings yet
Localization and Mapping Aproximation For Autonomous Ground Platforms, Implementing SLAM Algorithms
5 pages
Gil 2010
No ratings yet
Gil 2010
13 pages
117 FullManuscript 495 2 10 20220402
No ratings yet
117 FullManuscript 495 2 10 20220402
13 pages
Superquadric Object Representation For Optimization-Based Semantic SLAM
No ratings yet
Superquadric Object Representation For Optimization-Based Semantic SLAM
8 pages
Deep Learning Techniques For Visual SLAM A Survey
No ratings yet
Deep Learning Techniques For Visual SLAM A Survey
25 pages
V-SLAM: Vision-Based Simultaneous Localization and Map Building For An Autonomous Mobile Robot
No ratings yet
V-SLAM: Vision-Based Simultaneous Localization and Map Building For An Autonomous Mobile Robot
6 pages
Sensors 25 01447
No ratings yet
Sensors 25 01447
24 pages
Research and Implementation of SLAM Based On LIDAR For Four-Wheeled Mobile Robot
No ratings yet
Research and Implementation of SLAM Based On LIDAR For Four-Wheeled Mobile Robot
5 pages
A Survey of State-Of-The-Art On Visual SLAM - 1-S2.0-S0957417422010156-Main
No ratings yet
A Survey of State-Of-The-Art On Visual SLAM - 1-S2.0-S0957417422010156-Main
17 pages
Sensors 23 07921 v2
No ratings yet
Sensors 23 07921 v2
17 pages
Computer Vision Mesh Paper
No ratings yet
Computer Vision Mesh Paper
10 pages
Yaseenmustafa,+117 Alsadik+et+al. The+Simultaneous+Localization+and+Mapping
No ratings yet
Yaseenmustafa,+117 Alsadik+et+al. The+Simultaneous+Localization+and+Mapping
12 pages
Electronics 12 02006
No ratings yet
Electronics 12 02006
29 pages
Comparison of Modern Open-Source Visual SLAMapproaches
No ratings yet
Comparison of Modern Open-Source Visual SLAMapproaches
18 pages
Intelligent Navigation of An Autonomous Outdoor Mobile Robot in A Dynamic Environment
No ratings yet
Intelligent Navigation of An Autonomous Outdoor Mobile Robot in A Dynamic Environment
5 pages
SLAM in The Dynamic Context of Robot Soccer Games
No ratings yet
SLAM in The Dynamic Context of Robot Soccer Games
12 pages
ORB-SLAM2 An Open-Source SLAM System For Monocular, Stereo and RGB-D Cameras
No ratings yet
ORB-SLAM2 An Open-Source SLAM System For Monocular, Stereo and RGB-D Cameras
7 pages
ORB-SLAM2 An Open-Source SLAM System For Monocular Stereo and RGB-D Cameras
No ratings yet
ORB-SLAM2 An Open-Source SLAM System For Monocular Stereo and RGB-D Cameras
8 pages
Saputra 2018
No ratings yet
Saputra 2018
36 pages
The New Method For Active SLAM Using LIDAR
No ratings yet
The New Method For Active SLAM Using LIDAR
15 pages
Drones 07 00329 v2
No ratings yet
Drones 07 00329 v2
19 pages
Sensors: SLAM in Dynamic Environments: A Deep Learning Approach For Moving Object Tracking Using ML-RANSAC Algorithm
No ratings yet
Sensors: SLAM in Dynamic Environments: A Deep Learning Approach For Moving Object Tracking Using ML-RANSAC Algorithm
20 pages
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
No ratings yet
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
6 pages
A Stereo SLAM System With Dense Mapping
No ratings yet
A Stereo SLAM System With Dense Mapping
9 pages
Simultaneous Localization and Mapping For Autonomous Robot Navigation
No ratings yet
Simultaneous Localization and Mapping For Autonomous Robot Navigation
5 pages
Research Paper
No ratings yet
Research Paper
8 pages
Evaluation and Comparison of Eight Popular Lidar and Visual SLAM Algorithms
No ratings yet
Evaluation and Comparison of Eight Popular Lidar and Visual SLAM Algorithms
8 pages
FYP Review Paper
No ratings yet
FYP Review Paper
6 pages
The Practice of Mapping-Based Navigation System For Indoor Robot With RPLIDAR and Raspberry Pi
No ratings yet
The Practice of Mapping-Based Navigation System For Indoor Robot With RPLIDAR and Raspberry Pi
4 pages
2021 - Review of SLAM Algorithms For Indoor Mobile Robot With LIDAR and RGB-D Camera - Kolhatkar
No ratings yet
2021 - Review of SLAM Algorithms For Indoor Mobile Robot With LIDAR and RGB-D Camera - Kolhatkar
13 pages
CoreSLAM A SLAM Algorithm in Less Than 200 Lines o
No ratings yet
CoreSLAM A SLAM Algorithm in Less Than 200 Lines o
7 pages
Embedded Mobile ROS Platform For SLAM Application With RGB-D Cameras
No ratings yet
Embedded Mobile ROS Platform For SLAM Application With RGB-D Cameras
5 pages
New PPTX Presentation
No ratings yet
New PPTX Presentation
7 pages
Detection of All Dynamic Objects in Neural SLAM With Local Voxel Map
No ratings yet
Detection of All Dynamic Objects in Neural SLAM With Local Voxel Map
8 pages
A Terrain Data Collection Sensor Box Towards A Better Analysis of Terrains Conditions
No ratings yet
A Terrain Data Collection Sensor Box Towards A Better Analysis of Terrains Conditions
15 pages
Visual Slam
No ratings yet
Visual Slam
16 pages
Remotesensing 13 02864 v2
No ratings yet
Remotesensing 13 02864 v2
21 pages
Map Completion For SLAM Systems Based On Neural Radiance Field
No ratings yet
Map Completion For SLAM Systems Based On Neural Radiance Field
6 pages
Weikersdorfer 2013 Edvsslam
No ratings yet
Weikersdorfer 2013 Edvsslam
11 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Series
No ratings yet
Series
1 page
OpenSeqSLAM (Matlab)
No ratings yet
OpenSeqSLAM (Matlab)
7 pages
AOE2
No ratings yet
AOE2
1 page
Albums Aleatorios
No ratings yet
Albums Aleatorios
92 pages
ASLAM A Dual Fisheye Stereo Edge SLAM
No ratings yet
ASLAM A Dual Fisheye Stereo Edge SLAM
2 pages
A Comparison of SLAM Algorithms Based On A Graph of Relations
No ratings yet
A Comparison of SLAM Algorithms Based On A Graph of Relations
8 pages
An Evaluation of 2D SLAM Techniques Availablein Robot Operating System
No ratings yet
An Evaluation of 2D SLAM Techniques Availablein Robot Operating System
6 pages
Business Logic - Module 1
No ratings yet
Business Logic - Module 1
4 pages
Abhishek Arora
No ratings yet
Abhishek Arora
2 pages
IBDP Computer Science Revision Notes Paper 1
100% (1)
IBDP Computer Science Revision Notes Paper 1
32 pages
Catalysts Contrivance
No ratings yet
Catalysts Contrivance
9 pages
Unikl Bmi: Section A: Course Details
No ratings yet
Unikl Bmi: Section A: Course Details
4 pages
Towards One-Shot Learning For Text Classification Using Inductive Logic Programming
No ratings yet
Towards One-Shot Learning For Text Classification Using Inductive Logic Programming
11 pages
5 Data Centric Engineering
No ratings yet
5 Data Centric Engineering
23 pages
Kingdom of Saudi: Jubail Industrial City Project
No ratings yet
Kingdom of Saudi: Jubail Industrial City Project
45 pages
Aplikasi Ujian Online Masuk Universitas Merdeka Madiun Berbasis Android
No ratings yet
Aplikasi Ujian Online Masuk Universitas Merdeka Madiun Berbasis Android
12 pages
Monalyn Señaris - 2.2.1.4 Packet Tracer - Simulating IoT Devices
No ratings yet
Monalyn Señaris - 2.2.1.4 Packet Tracer - Simulating IoT Devices
5 pages
Roller Chains Catalogue en Kettenwulf
No ratings yet
Roller Chains Catalogue en Kettenwulf
146 pages
S21 Sorting Algorithms Activity-1
No ratings yet
S21 Sorting Algorithms Activity-1
2 pages
3573 PF00603
No ratings yet
3573 PF00603
8 pages
Characteristics of P-Channel SOI LDMOS Transistor With Tapered Field Oxides
No ratings yet
Characteristics of P-Channel SOI LDMOS Transistor With Tapered Field Oxides
7 pages
Final Exam Entrepreneur Answer Key
No ratings yet
Final Exam Entrepreneur Answer Key
1 page
Data Mining and Data Warehousing: Inst Ruct Ions T O Cand Idat Es
No ratings yet
Data Mining and Data Warehousing: Inst Ruct Ions T O Cand Idat Es
2 pages
Dewesoft Product Overview Catalog en
No ratings yet
Dewesoft Product Overview Catalog en
16 pages
PCS50-630 User Manual 20220509
No ratings yet
PCS50-630 User Manual 20220509
37 pages
QMS Manual (23 Files Merged)
100% (1)
QMS Manual (23 Files Merged)
168 pages
Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet
100% (1)
Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet Micro Star Restricted Secret Cover Sheet
32 pages
PPL Members As On 23 March 07
No ratings yet
PPL Members As On 23 March 07
27 pages
AMF-65 AMS RFT Partnership Range - CULT PDF
No ratings yet
AMF-65 AMS RFT Partnership Range - CULT PDF
46 pages
FVM Crash Intro
No ratings yet
FVM Crash Intro
202 pages
John Deere 4320 Tractor Operator's Manual (C)
No ratings yet
John Deere 4320 Tractor Operator's Manual (C)
84 pages
Installation and Testing of Battery & Battery Charger
No ratings yet
Installation and Testing of Battery & Battery Charger
3 pages
Assignment-0 DIP-CS406-Fal 2024
No ratings yet
Assignment-0 DIP-CS406-Fal 2024
6 pages