0% found this document useful (0 votes)

102 views4 pages

Droid Slam Supplemental

This document summarizes results from DROID-SLAM, a monocular visual SLAM system that uses a learned depth estimation network. It shows that DROID-SLAM achieves state-of-the-art trajectory accuracy on stereo EuRoC datasets, reducing average error over ORB-SLAM3 by 71%. Ablation studies demonstrate that the system benefits from stereo inputs, global optimization, and training with differentiable bundle adjustment. Key components of the network architecture, like global context and bundle adjustment, are also shown to improve performance. Camera models and Jacobians used for optimization are defined.

Uploaded by

George Pachitariu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views4 pages

Droid Slam Supplemental

Uploaded by

George Pachitariu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1 DROID-SLAM: Supplementary Material

2 A Additional Results

MH01 MH02 MH03 MH04 MH05 V101 V102 V103 V201 V202 V203 Avg
D3VO + DSO [6] - - 0.08 - 0.09 - - 0.11 - 0.05 - -
ORB-SLAM2 [4] 0.035 0.018 0.028 0.119 0.060 0.035 0.020 0.048 0.037 0.035 - -
VINS-Fusion [5] 0.540 0.460 0.330 0.780 0.500 0.550 0.230 - 0.230 0.200 - -
SVO [3] 0.040 0.070 0.270 0.170 0.120 0.040 0.040 0.070 0.050 0.090 0.790 0.159
ORB-SLAM3 [2] 0.029 0.019 0.024 0.085 0.052 0.035 0.025 0.061 0.041 0.028 0.521 0.084
Ours 0.015 0.013 0.035 0.048 0.040 0.037 0.011 0.020 0.018 0.015 0.017 0.024

Table 1: Stereo SLAM on the EuRoC datasets, ATE[m].

3 We provide stereo results on the EuRoC dataset[1] in Tab. 1 using our network trained on synthetic,
4 monocular video. In the stereo setting, it is possible to recover the trajectory of the camera up to
5 scale. Compared to ORB-SLAM3[2] we reduce the average ATE by 71%.

6 B Ablations

Keyframe Image Keyframe Depth Optical Flow X-Conﬁdence Y-Conﬁdence

Figure 1: Visualizations of keyframe image, depth, flow and confidence estimates.

1.0 1.0
0.8 0.8
0.6 0.6
% runs

% runs

1 Keyframe
0.4 Monocular (local) 0.4 2 Keyframes
Monocular (full) 3 Keyframes
0.2 Stereo (local) 0.2 5 Keyframes
Stereo (full) 8 Keyframes
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
ATE [m] ATE [m]

Figure 2: (Left) we show the performance of the system with different inputs (monocular vs. stereo)
and whether global optimization is performed in addition to local BA (local vs. full). (Right) Tracking
accuracy as a function of the number of keyframes. We use 5 keyframes (bold) in our experiments.

7 Ablations We ablate various design choices regarding our SLAM system and network architecture.
8 Ablations are performed on our validation split of the TartanAir dataset. In Fig. 1 we show visu-
9 alizations on the validation set of keyframe depth estimates alongside optical flow and associated
10 confidence weights.
11 In Fig.2 (left) we show how the system benefits from both stereo video and global optimization.
12 Although our network is only trained on monocular video, it can readily leverage stereo frames if
13 available. In Fig. 2 (right) we show how the number of keyframe affects odometery performance.
14 In Fig. 3 we ablate components of the network architecture. Fig. 3 (left) shows the impact of using
15 global context in the GRU through spatial pooling while 3 (right) demonstrates the importance of

1
1.0 1.0
0.8 0.8
0.6 0.6 RAFT + BA
% runs

% runs
Ours
0.4 0.4
0.2 No Global Pooling 0.2
Global Pooling
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10
ATE [m] ATE [m]

Figure 3: (Left) Impact of global context in the update operator. (Right) Impact of using the bundle
adjustment layer during training vs training directly on optical flow, then applying BA at test time.

16 training with DBA as opposed to training on flow and applying BA at inference. We find that the
17 SLAM system is unstable and prone to failure if the DBA is not used during training.

18 C Camera Model and Jacobians

19 We represent 3D points using homogeneous coordinates X = (X, Y, Z, W )T . An image point p
20 with inverse depth d is re-projected from frame i into frame j according to the warping function
p0 = Πc (Gij · Π−1 (p, d)) Gij = Gj ◦ G−1
i (1)
21 where Πc is the pinhole projection function, and Π−1
c is the inverse projection
 px −cx 
fx
 py −cy 
X
fx Z + cy −1
Πc (X) = Πc (p, d) =   1y  .
f  (2)
fy YZ + cy
d
22 given camera intrinsic parameters c = (fx , fy , cx , cy ).
23 For optimization, we need the Jacobians with respect to Gi , Gj , and d. We use the local parameteri-
24 zation eξi Gi and eξj Gj and treat d as a vector in R1 . The Jacobians of the projection and inverse
25 projection functions are given as
0
 
−1
1 X

∂Πc (X) fx Z 0 −fx Z 2 0 ∂Πc (p, d) 0
= =  . (3)
∂X 0 fy Z1 −fy ZY2 0 ∂d 0
1

26 Using the local parameterization, we compute the Jacobian of the 3D point transformation
X0 = Exp(ξj ) · Gj · (Exp(ξi ) · Gi )−1 · X = Exp(ξj ) · Gj · G−1
i · Exp(−ξi ) · X (4)
27 using the adjoint operator to move the ξi term to the front of the expression
X0 = Exp(ξj ) · Exp(− AdjGj G−1 ξi ) · Gj · G−1
i ·X (5)
i

28 allowing us to compute the Jacobians using the generators

W0 0 0 0 Z0 −Y 0
 
0 0 0
∂X  0 W 0 −Z 0 X0 
= 0 0 W0 Y0 −X 0 0  (6)
∂ξj
0 0 0 0 0 0
29
W0 0 0 0 Z0 −Y 0
 
0
∂X  0 W0 0 −Z 0 0 X 0
= − 0 0 W0 Y0 −X 0 0  · AdjGj G−1 (7)
∂ξi i

0 0 0 0 0 0

2
30 Using the chain rule, we can compute the full Jacobians with respect to the variables
∂p0 ∂Πc (X0 ) ∂X0 ∂p0 ∂Πc (X0 ) ∂X0
= , = (8)
∂ξj ∂X0 ∂ξj ∂ξi ∂X0 ∂ξi
31
tx
 
∂p0 ∂Πc (X0 ) ∂X0 ∂Π−1 (p, d) ∂Πc (X0 ) ∂Πc (X0 ) ty 
= = = t  (9)
∂d ∂X0 ∂X ∂d ∂X0 ∂X0 z
1
32 where (tx , ty , tz ) is the translation vector of Gj ◦ G−1
i .

33 D Network Architecture

ResBlock (128)

ResBlock (256)
ResBlock (128)

ResBlock (256)
ResBlock (256)
ResBlock (64)
Conv7x7 (64)

Figure 4: Architecture of the feature and context encoders. Both extract features at 1/8 the input Conv3x3(D)
image resolution using a set of 6 basic residual blocks. Instance normalization is used in the feature
encoder; no normalization is used in the context encoder. The feature encoder outputs features with
dimension D=128 which the context encoder outputs features with dimension D=256.

Context
Conv3x3 (128)

Conv3x3 (128)

Conv3x3 (2)

Corr
Conv7x7 (128)

Conv3x3 (128)
Conv3x3 (64)

Conv3x3 (2)

Sigmoid

Flow
3x3 ConvGRU (128)

Figure 5: Architecture of the update operator. During each iteration, context, correlation, and flow
features get injected into the GRU. The revision (r) and confidence weights (w) are predicted from
the updated hidden state.

34 References
35 [1] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart. The euroc
36 micro aerial vehicle datasets. The International Journal of Robotics Research, 35(10):1157–1163, 2016.

3
37 [2] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós. Orb-slam3: An accurate
38 open-source library for visual, visual-inertial and multi-map slam. arXiv preprint arXiv:2007.11898, 2020.
39 [3] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza. Svo: Semidirect visual odometry for
40 monocular and multicamera systems. IEEE Transactions on Robotics, 33(2):249–265, 2016.
41 [4] R. Mur-Artal and J. D. Tardós. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d
42 cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017.
43 [5] T. Qin and S. Shen. Online temporal calibration for monocular visual-inertial systems. In 2018 IEEE/RSJ
44 International Conference on Intelligent Robots and Systems (IROS), pages 3662–3669. IEEE, 2018.
45 [6] N. Yang, L. v. Stumberg, R. Wang, and D. Cremers. D3vo: Deep depth, deep pose and deep uncertainty for
46 monocular visual odometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
47 Recognition, pages 1281–1292, 2020.

Shaping Paper English
100% (1)
Shaping Paper English
26 pages
qt74t252zw Nosplash
No ratings yet
qt74t252zw Nosplash
158 pages
Direct Pose Estimation and Refinement
No ratings yet
Direct Pose Estimation and Refinement
235 pages
Ok Task Based Control
No ratings yet
Ok Task Based Control
85 pages
Coslam
No ratings yet
Coslam
21 pages
DROID-SLAM Research Paper
No ratings yet
DROID-SLAM Research Paper
15 pages
Adaptive Monocular Visual-Inertial SLAM For Real-T
No ratings yet
Adaptive Monocular Visual-Inertial SLAM For Real-T
25 pages
Gaussian Splatting SLAM
No ratings yet
Gaussian Splatting SLAM
20 pages
04 Multi-View Geometry
No ratings yet
04 Multi-View Geometry
54 pages
Droid Slam
No ratings yet
Droid Slam
12 pages
Nister 2006 Visual Odometry
100% (1)
Nister 2006 Visual Odometry
35 pages
Gaussian Sla
No ratings yet
Gaussian Sla
15 pages
Engel Et Al Pami2018 PDF
No ratings yet
Engel Et Al Pami2018 PDF
17 pages
Course Slam
No ratings yet
Course Slam
72 pages
Spla TAM
No ratings yet
Spla TAM
11 pages
ORB-SLAM3 An Accurate Open-Source Library For Visual VisualInertial and Multimap SLAM
No ratings yet
ORB-SLAM3 An Accurate Open-Source Library For Visual VisualInertial and Multimap SLAM
17 pages
LSD-SLAM: Large-Scale Direct Monocular SLAM
No ratings yet
LSD-SLAM: Large-Scale Direct Monocular SLAM
16 pages
Electronics 12 02006
No ratings yet
Electronics 12 02006
29 pages
Visual SLAM Why Bundle Adjust
No ratings yet
Visual SLAM Why Bundle Adjust
7 pages
English and Business Communication
87% (31)
English and Business Communication
392 pages
Bloesch CodeSLAM - Learning CVPR 2018 Paper
No ratings yet
Bloesch CodeSLAM - Learning CVPR 2018 Paper
9 pages
Dense Visual SLAM For RGB-D Cameras
No ratings yet
Dense Visual SLAM For RGB-D Cameras
7 pages
ISMAR 2022 For Arxiv
No ratings yet
ISMAR 2022 For Arxiv
10 pages
Mur Montiel Tardos TRO15
No ratings yet
Mur Montiel Tardos TRO15
18 pages
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
No ratings yet
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
7 pages
Diseases of Nervous System of Farm Animals by Ali Sadiek
100% (7)
Diseases of Nervous System of Farm Animals by Ali Sadiek
65 pages
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
No ratings yet
ORB-SLAM: A Versatile and Accurate Monocular SLAM System
17 pages
Visual Odometry (Vo) : Csc2541, Feb 9, 2016 Presented by Patrick Mcgarey
No ratings yet
Visual Odometry (Vo) : Csc2541, Feb 9, 2016 Presented by Patrick Mcgarey
36 pages
Determining The Epipolar Geometry and Its Uncertainty: A Review
No ratings yet
Determining The Epipolar Geometry and Its Uncertainty: A Review
35 pages
Shin 2018
No ratings yet
Shin 2018
8 pages
Indoornavigationusing Camera
No ratings yet
Indoornavigationusing Camera
10 pages
Kinect 6
No ratings yet
Kinect 6
5 pages
Neuromorphic Visual Odometry System For Intelligent Vehicle Application With Bio-Inspired Vision Sensor
No ratings yet
Neuromorphic Visual Odometry System For Intelligent Vehicle Application With Bio-Inspired Vision Sensor
8 pages
High-Fidelity SLAM Using Gaussian Splatting With Rendering-Guided Densification and Regularized Optimization
No ratings yet
High-Fidelity SLAM Using Gaussian Splatting With Rendering-Guided Densification and Regularized Optimization
7 pages
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
No ratings yet
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
10 pages
A Stereo SLAM System With Dense Mapping
No ratings yet
A Stereo SLAM System With Dense Mapping
9 pages
Chaumette - 2002 - A First Step Toward Visual Servoing Using Image Mo
No ratings yet
Chaumette - 2002 - A First Step Toward Visual Servoing Using Image Mo
7 pages
Design
No ratings yet
Design
17 pages
GauS-SLAM - Dense RGB-D SLAM With Gaussian Surfels
No ratings yet
GauS-SLAM - Dense RGB-D SLAM With Gaussian Surfels
15 pages
Vidal Etal Icra2006
No ratings yet
Vidal Etal Icra2006
7 pages
DLL Thesis
No ratings yet
DLL Thesis
54 pages
Real-Time 6-DOF Monocular Visual SLAM in A Large-Scale Environment
No ratings yet
Real-Time 6-DOF Monocular Visual SLAM in A Large-Scale Environment
8 pages
Orb Slam
No ratings yet
Orb Slam
17 pages
ORB-SLAM2 An Open-Source SLAM System For Monocular, Stereo and RGB-D Cameras
No ratings yet
ORB-SLAM2 An Open-Source SLAM System For Monocular, Stereo and RGB-D Cameras
7 pages
Fast Visual Odometry For 3-D Range Sensors
No ratings yet
Fast Visual Odometry For 3-D Range Sensors
14 pages
Robust Odometry Estimation For RGB-D Cameras: Christian Kerl, J Urgen Sturm, and Daniel Cremers
No ratings yet
Robust Odometry Estimation For RGB-D Cameras: Christian Kerl, J Urgen Sturm, and Daniel Cremers
8 pages
Gaussian Splatting SLAM
No ratings yet
Gaussian Splatting SLAM
21 pages
Visual Modeling With A Hand-Held Camera: Abstract
No ratings yet
Visual Modeling With A Hand-Held Camera: Abstract
26 pages
Framework of Advanced Driving Assistance System (ADAS) Research
No ratings yet
Framework of Advanced Driving Assistance System (ADAS) Research
8 pages
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
No ratings yet
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
6 pages
Direct Visual-Inertial Odometry With Stereo Cameras: Vladyslav Usenko, Jakob Engel, J Org ST Uckler, and Daniel Cremers
No ratings yet
Direct Visual-Inertial Odometry With Stereo Cameras: Vladyslav Usenko, Jakob Engel, J Org ST Uckler, and Daniel Cremers
8 pages
Comparison of Various SLAM Systems For Mobile Robot in An Indoor Environment
No ratings yet
Comparison of Various SLAM Systems For Mobile Robot in An Indoor Environment
8 pages
Loosely-Coupled Semi-Direct Monocular SLAM
No ratings yet
Loosely-Coupled Semi-Direct Monocular SLAM
1 page
Kinect 5
No ratings yet
Kinect 5
1 page
Camera-Based Simultaneous Localization and Mapping: Methods, Camera Types, and Deep Learning Trends
No ratings yet
Camera-Based Simultaneous Localization and Mapping: Methods, Camera Types, and Deep Learning Trends
11 pages
Error Analysis For Visual Odometry On Indoor, Wheeled Mobile Robots With 3-D Sensors
No ratings yet
Error Analysis For Visual Odometry On Indoor, Wheeled Mobile Robots With 3-D Sensors
11 pages
Lec 17
No ratings yet
Lec 17
10 pages
ORB-SLAM2 An Open-Source SLAM System For Monocular Stereo and RGB-D Cameras
No ratings yet
ORB-SLAM2 An Open-Source SLAM System For Monocular Stereo and RGB-D Cameras
8 pages
MS Powerpoint
No ratings yet
MS Powerpoint
7 pages
Paper 4
No ratings yet
Paper 4
7 pages
2D 1 2 Visual Servoing Stability Analysis With Respect To Camera Calibration Errors
No ratings yet
2D 1 2 Visual Servoing Stability Analysis With Respect To Camera Calibration Errors
7 pages
Ielts 1
No ratings yet
Ielts 1
1 page
7 TOYOTA Reprog de ECUS PDF
100% (2)
7 TOYOTA Reprog de ECUS PDF
6 pages
Depth Invariant Visual Servoing: Peter A. Karasev, Miguel Moises Serrano, Patricio A. Vela, and Allen Tannenbaum
No ratings yet
Depth Invariant Visual Servoing: Peter A. Karasev, Miguel Moises Serrano, Patricio A. Vela, and Allen Tannenbaum
7 pages
Address Proof
No ratings yet
Address Proof
1 page
Kyocera KM1650 / 2050 Parts List / Manual
No ratings yet
Kyocera KM1650 / 2050 Parts List / Manual
48 pages
Tooth-Colored Restorations: (1) Good Esthetics
100% (1)
Tooth-Colored Restorations: (1) Good Esthetics
12 pages
Let's Talk About Home & Houses
No ratings yet
Let's Talk About Home & Houses
2 pages
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
No ratings yet
... System For Ranking Jobs Logically & Fairly: To Determine The Relative Size of Jobs in An Organization
23 pages
Riopipeline2019 1107 201905201751ibp1107 19 Jacques PDF
No ratings yet
Riopipeline2019 1107 201905201751ibp1107 19 Jacques PDF
7 pages
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
No ratings yet
National Curriculum in England - Mathematics Programmes of Study - GOV - UK
45 pages
Functional Reach
No ratings yet
Functional Reach
16 pages
Byproducts of Sulfur Hexafluoride (SF) Use in The Electric Power Industry
No ratings yet
Byproducts of Sulfur Hexafluoride (SF) Use in The Electric Power Industry
11 pages
Lecture Notes For Introductory Probability - Gravner
No ratings yet
Lecture Notes For Introductory Probability - Gravner
218 pages
Corporate Governance
No ratings yet
Corporate Governance
10 pages
KY-040 Arduino Rotary Encoder User Manual
No ratings yet
KY-040 Arduino Rotary Encoder User Manual
5 pages
Campbell Price 2016
No ratings yet
Campbell Price 2016
19 pages
BOSS Supastor Stainless Steel Unvented Cylinders
No ratings yet
BOSS Supastor Stainless Steel Unvented Cylinders
10 pages
Eijppr 2023 Vol 13 Iss2 April 1 17 2116
No ratings yet
Eijppr 2023 Vol 13 Iss2 April 1 17 2116
17 pages
Dokumen - Tips - Wood Group Sps Surface Pumping Systems
No ratings yet
Dokumen - Tips - Wood Group Sps Surface Pumping Systems
7 pages
POLIMER
No ratings yet
POLIMER
28 pages
He Week 2
No ratings yet
He Week 2
19 pages
Philippines Faces Bigger Shortage of Rice Farms - Miraflor (2020)
No ratings yet
Philippines Faces Bigger Shortage of Rice Farms - Miraflor (2020)
3 pages
Hockey
No ratings yet
Hockey
11 pages
Specialties and Accessories: Buffer Tank Hydraulic Separator
No ratings yet
Specialties and Accessories: Buffer Tank Hydraulic Separator
4 pages
Freeze Concentration of Coffe and Tea
No ratings yet
Freeze Concentration of Coffe and Tea
1 page
Seminar Report On Bio-Diesel: (In Partial Fulfilment To B.Tech Degree From MMEC, Mullana.)
No ratings yet
Seminar Report On Bio-Diesel: (In Partial Fulfilment To B.Tech Degree From MMEC, Mullana.)
9 pages
Isolating Switching Amplifier 2-Channel IM1-22EX-R/24VDC
No ratings yet
Isolating Switching Amplifier 2-Channel IM1-22EX-R/24VDC
3 pages
Python图像处理实战: Chinese Edition
From Everand
Python图像处理实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet

Droid Slam Supplemental

Uploaded by

Droid Slam Supplemental

Uploaded by

1 DROID-SLAM: Supplementary Material

Table 1: Stereo SLAM on the EuRoC datasets, ATE[m].

Keyframe Image Keyframe Depth Optical Flow X-Conﬁdence Y-Conﬁdence

Figure 1: Visualizations of keyframe image, depth, flow and confidence estimates.

18 C Camera Model and Jacobians

28 allowing us to compute the Jacobians using the generators

You might also like