Contour Based Reconstruction of Underwater Structures Using Sonar Visual Inertial and Depth Sensor
Contour Based Reconstruction of Underwater Structures Using Sonar Visual Inertial and Depth Sensor
Contour Based Reconstruction of Underwater Structures Using Sonar Visual Inertial and Depth Sensor
8055
Authorized licensed use limited to: FUNDACAO UNIVERSIDADE DO RIO GRANDE. Downloaded on November 14,2023 at 12:56:41 UTC from IEEE Xplore. Restrictions apply.
Contour Stereo Local BA for
detection Contour
matching
contour
features The reprojection error er describes the difference between
Stereo Feature
camera
(15 fps)
Image pre-
processing
detection &
tracking
Local optimization
a keypoint measurement in camera coordinate frame C
Reprojection Keyframe
Initialization
Pose
error marginalization
and the corresponding landmark projection according to the
estimation from
camera motion IMU pre-
Pose &
speed/bias
Pose
IMU error
Pose stereo projection model. The IMU error term es combines
integration estimation optimization
Pose
IMU
(100 Hz)
prediction from
IMU
propagation
Position
all accelerometer and gyroscope measurements by IMU pre-
Depth error estimation
Depth
(1 Hz)
Depth
measurement
(z) integration [34] between successive camera measurements
Sonar range Position
error estimation and represents the pose, speed and bias error between
Sonar
(100 Hz)
the prediction based on previous and current states. Both
Loop closing and relocalization
Pose-graph Loop reprojection error and IMU error term follow the formulation
optimization detection
by Leutenegger et al. [26].
Fig. 2. Block diagram of the proposed system; in yellow the sensor input The sonar range error et , introduced in our previous work
with frequency from the custom-made sensor suite, in green the components [8], represents the difference between the 3D point that can
from OKVIS, in red and blue the contribution from our previous works [8] be derived from the range measurement and a corresponding
and [9], and in orange the new contributions in this paper.
visual feature in 3D.
deployed by divers as well as mounted on a single or dual The depth error term eu can be calculated as the difference
Diver Propulsion Vehicle (DPV) [19]. The hardware was between the rig position along the z direction and the water
designed with cave mapping as the target application. As depth measurement provided by a pressure sensor. Depth
such, the sonar scanning plane is parallel to the image plane values are extracted along the gravity direction which is
which provides data at a maximum of 6 m range, scanning aligned with the z of the world W – observable due to the
in a plane over 360◦ , with angular resolution of 0.9◦ . tightly coupled IMU integration. This can correct the position
B. Notations and States of the robot along the z axis.
The Ceres Solver nonlinear optimization framework [35]
The reference frames associated to each sensor and the
optimizes J(x) to estimate the state of the robot in Eq. (1).
world are denoted as C for Camera, I for IMU, S for Sonar,
D for Depth, and W for World. Let us denote X TY = D. Feature Selection and 3D Reconstruction from Stereo
[X RY |X pY ] the homogeneous transformation matrix be- Contour Matching
tween two arbitrary coordinate frames X and Y , where X RY To ensure that the VIO system and the 3D reconstruction
represents the rotation matrix with corresponding quaternion can be run in real-time in parallel, we replaced the OKVIS
X qY and X pY denotes the position vector. feature detection method with the one described in [36],
The state of the robot R is denoted as xR : which provides a short list of the most prominent features
xR = [W pTI ,W qTI ,W vTI , bg T , ba T ]T (1) based on the corner response function in the images. This
reduces the computation in the frontend tracking and, as
It contains the position W pI , the quaternion W qI , the linear
velocity W vI . All of them are in the IMU reference frame shown in the results, retains the same accuracy with less
I with respect to the world reference frame W . In addition, computational requirements.
the gyroscopes and accelerometers bias bg and ba are also
estimated and stored in the state vector.
The corresponding error-state vector is defined in mini-
mal coordinates, while the perturbation for the optimization
problem defined next, takes place in the tangent space:
δχR = [δpT , δqT , δvT , δbg T , δba T ]T (2)
Fig. 3. Image in a cave and the detected contours.
C. Tightly-coupled Non-Linear Optimization Problem
A real-time stereo contour matching algorithm is utilized
The cost function J(x) for the tightly-coupled non-linear followed by an outlier rejection mechanism to produce the
optimization includes the reprojection error er , the IMU error point-cloud on the contour created by the moving light; see
es , sonar error et , and the depth error eu : Fig. 5(c) for an example of all the edge-features detected.
2 X K K−1
X X
i,j,kT k i,j,k
X T The approach of Weidner et al. [6] has been adapted for
J(x) = er Pr er + eks Pks eks the contours from the intersection of the cone of light
i=1 k=1 j∈J (i,k) k=1
with the cave wall; see Fig. 3 for the extracted contours
K−1 K−1
X T X T from an underwater cave. In particular, adaptive thresholding
+ ekt Pkt ekt + eku Puk eku (3) the images based on the light and dark areas ensures that
k=1 k=1 the illuminated areas are clearly defined. In our current
with i denoting the camera index – i = 1 for left, i = 2 work, we also found that sampling from pixels which have
for right camera in a stereo camera – and landmark index j rich gradients, e.g., edges, provide better and denser point-
observed in the kth camera frame. Pkr , Pks , Pkt , and Puk denote cloud reconstructions. As such, both types of edges – the
the information matrix of visual landmarks, IMU, sonar ones marking the boundaries between the light and dark
range, and depth measurement for the kth frame respectively. areas and the others from visible cave walls – are used to
8056
Authorized licensed use limited to: FUNDACAO UNIVERSIDADE DO RIO GRANDE. Downloaded on November 14,2023 at 12:56:41 UTC from IEEE Xplore. Restrictions apply.
reconstruct the 3-D map of the cave. The overview of the IV. E XPERIMENTAL R ESULT
augmenting Stereo Contour Matching method in our tightly- The experimental data were collected using a custom made
coupled Sonar-Visual-Inertial-Depth optimization framework sensor suite [19] consisting of a stereo camera, an IMU, a
is as follows. depth sensor and a mechanical scanning Sonar, as described
For every frame in the local optimization window, a noisy in Section III-A. More specifically, two USB-3 uEye cameras
edge map is created from the edges described above. This in a stereo configuration provide data at 15 Hz, an IMA-
is followed by a filtering process to discard short contours GENEX 831L mechanical scanning Sonar sensor acquires a
by calculating their corresponding bounding boxes and only full 360◦ scan every four seconds; the Bluerobotics Bar30
keeping the largest third percentile. This method retains pressure sensor provides depth data at 1 Hz; a MicroStrain
the highly defined continuous contours of the surroundings 3DM-GX4-15 IMU generates inertial data at 100 Hz; and
while eliminating spurious false edges, thus allowing to an Intel NUC running Linux and ROS consolidates all the
use the pixels on them as good features to be used in the data. A video light is attached to the unit to provide artificial
reconstruction. In a stereo frame, for every image point on illumination of the scene. The Sonar is mounted on top of
the contour of the left image a BRISK feature descriptor the main unit which contains the remaining electronics. In
is calculated and matched against the right image searching Fig. 1 the unit can be seen deployed mounted on a dual
along the epipolar line. Then a sub-pixel accurate localization Diver Propulsion Vehicle (DPV); please note, the system is
of the matching disparity is performed. Another layer of neutrally buoyant and stable. The experiments were run on
filtering is done based on the grouping of the edge detector, a computer with an Intel i7-7700 CPU @ 3.60GHz, 32 GB
i.e., keeping only the consecutive points belonging to the RAM, running Ubuntu 16.04 and ROS Kinetic and on an
same contour in a stereo pair. These stereo contour matched Intel NUC with the same configuration.
features along with depth estimation is projected into 3- The data is from the ballroom at Ginnie Springs, FL,
D and then projected back for checking the reprojection a cavern open to divers with no cave-diving training. It
error consistency resulting into a point-cloud with very low provides a safe locale to collect data in an underwater cave
reprojection error. environment. From entering the cavern at a depth of seven
The reason behind choosing stereo matched contour fea- meters, the sensor was taken down to fifteen meters, and then
tures rather than tracking them using a semi-direct method a closed loop trajectory was traversed three times.
or using a contour tracking [37] method is to avoid any In the following, we present, first, preliminary experiments
spurious edge detection due to lighting variation in con- with DSO [23] showing the problem with photometric con-
secutive images, which could lead to erroneous estimation sistency. Second, as there is no ground truth available un-
or even tracking failure. The performance of SVO [33], an derwater, such as a motion capture system, we qualitatively
open-source state-of-the-art semi-direct method, in under- validate our approach from the information collected by the
water datasets [38], [29] validates the above statement. In divers during the data collection procedure.
addition, though indirect feature extractors and descriptors
are invariant to photometric variations to some extent, using A. Comparison with DSO
a large number of features for tracking and thus using them DSO is one of the best performing state-of-the-art direct
for reconstruction is unrealistic due to the computational VO method which uses a sparse set of high intensity gradient
complexity of maintaining them. pixels. Josh et al. [29] show few cases where DSO generates
E. Local Bundle Adjustment (BA) for Contour Features very good 3-D reconstructions in challenging underwater
environments. Fig. 4 shows the result of DSO in the un-
In the current optimization window, a local BA is per-
derwater cave dataset in two different runs, Fig. 4(a) and
formed for all newly detected stereo contour matched fea-
Fig. 4(b). DSO did not track for the full length of cave;
tures and the keyframes they are observed in, to achieve
instead it was able to keep track just for a small segment
an optimal reconstruction. A joint non-linear optimization
due to the variation of the light and hence violating the
is performed for refining k th keyframe pose W TCi k and
photometric consistency assumption of a direct method. Also,
homogeneous landmark j in world coordinate W , W lj =
the initialization method is critical as it requires mainly
[lx j , ly j , lz j , lw j ] minimizing the cost function:
X T
translational movement and a very small rotational change
J(x) = ρ(ej,k Pj,k ej,k ) (4) due to the fact that it is a pure monocular visual SLAM.
j,k We ran DSO using different starting points of the dataset to
Hereby Pj,k denotes the information matrix of associated have a better initialization, the best one we got in Fig. 4(b)
landmark measurement, ρ is the Huber loss function to down- – eventually failed too due to the poor lighting conditions.
weigh outliers. The reprojection error, ej,k for landmark j B. Odometry and 3D Cave-Wall Reconstruction
with matched keypoint measurement zj,k in image coordinate
in the respective camera i is defined as: The length of the trajectory produced by our method is 87
meters, consistent with the measures from the divers. Fig. 5
ej,k = zj,k − hi (W TCi k ,W lj ) (5) shows the whole trajectory with the different point clouds
with camera projection model hi . We used Levenberg- generated by the features used for tracking, Sonar data, and
Marquardt to solve the local BA problem. stereo contour matching. Keeping a small set of features for
8057
Authorized licensed use limited to: FUNDACAO UNIVERSIDADE DO RIO GRANDE. Downloaded on November 14,2023 at 12:56:41 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
Fig. 4. Partial trajectories generated by DSO. Fig. 4(a) Incorrect odometry and failing to track just after a few seconds and Fig. 4(b) longer trajectory
after starting at a place with better illumination which also fails later on.
only tracking helps to run the proposed approach in real-time, on the robust Sonar-Visual-Inertial-Depth estimate. Thus we
without any dropped sensor data, on the tested computers. As will achieve a denser 3-D reconstruction by jointly mini-
shown in the figure, Sonar provides a set of sparse but robust mizing the reprojection and photometric error followed by a
points using range and head position information. Finally, robust tracking method. We also plan to acquire ground truth
the stereo contour matched point generates a denser point- trajectories [39] by placing AprilTags along each trajectory
cloud to represent the cave environment. for quantitative analysis. By deploying the sensor suite on a
Fig. 6 highlights some specific sections of the cavern, with dual DPV more accurate results due to the greater stability
the image and the corresponding reconstruction – in gray, the are expected – see Fig. 1 for preliminary tests.
points from the contours; in red the points from the Sonar.
As it can be observed, our proposed method enhances the R EFERENCES
reconstruction with a dense point cloud; for example rocks
[1] S. Exley, Basic Cave Diving: A Blueprint for Survival. National
and valleys are clearly visible in Fig. 6. Speleological Society Cave Diving Section, 1977.
V. D ISCUSSION [2] “Climate Change and Sea-Level Rise in Florida: An Update of
The proposed system improves the point cloud reconstruc- “The Effects of Climate Change on Florida’s Ocean and Coastal
Resources.”,” Florida Ocean and Coastal Council, Tallahasee, FL,
tion and is able to perform in real time even with additional Tech. Rep., 2010.
processing requirements. One of the lessons learned during [3] Z. Xu, S. W. Bassett, B. Hu, and S. B. Dyer, “Long distance seawater
the experimental activities is that the light placement affects intrusion through a karst conduit network in the Woodville Karst Plain,
Florida,” Scientific Reports, vol. 6, pp. 1–10, Aug 2016.
also the quality of the reconstruction. In the next version of [4] A. Abbott, “Mexican skeleton gives clue to American ancestry,” Nature
the sensor suite, we plan to mount the dive light in a fixed News, May 2014.
position so that the cone of light can be predicted according [5] N. Kresic and A. Mikszewski, Hydrogeological Conceptual Site Mod-
els: Data Analysis and Visualization. CRC Press, 2013.
to the characteristics of the dive light. Furthermore, setting [6] N. Weidner, S. Rahman, A. Quattrini Li, and I. Rekleitis, “Underwater
the maximum distance of the Sonar according to the target Cave Mapping using Stereo Vision,” in Proc. ICRA, 2017, pp. 5709–
environment improves the range measurements. 5715.
While this work presents the first initiative towards real- [7] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale,
“Keyframe-based visual–inertial odometry using nonlinear optimiza-
time semi-dense reconstruction of challenging environments tion,” Int. J. Robot. Res., vol. 34, no. 3, pp. 314–334, 2015.
with lighting variations, there are several scopes for improve- [8] S. Rahman, A. Quattrini Li, and I. Rekleitis, “Sonar Visual Inertial
ments. One future work of interest is to combine a direct SLAM of underwater structures,” in Proc. ICRA, 2018, pp. 5190–
5196.
method and an indirect method, similar to [33], but instead [9] ——, “SVIn2: An Underwater SLAM System using Sonar, Visual,
of relying on the direct method for tracking, we would rely Inertial, and Depth Sensor,” in Proc. IROS, 2019.
8058
Authorized licensed use limited to: FUNDACAO UNIVERSIDADE DO RIO GRANDE. Downloaded on November 14,2023 at 12:56:41 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)
[10] J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: Large-Scale Direct filter for vision-aided inertial navigation,” in Proc. ICRA, 2007, pp.
Monocular SLAM,” in Proc. ECCV. Springer Int. Pub., 2014, vol. 3565–3572.
8690, pp. 834–849. [26] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale,
[11] G. Dudek et al., “A visually guided swimming robot,” in Proc. IROS, “Keyframe-based visual-inertial odometry using nonlinear optimiza-
2005, pp. 1749–1754. tion,” Int. J. Robot. Res., vol. 34, no. 3, pp. 314–334, 2015.
[12] D. Meger, J. C. G. Higuera, A. Xu, P. Giguere, and G. Dudek, [27] R. Mur-Artal and J. D. Tardós, “Visual-inertial monocular SLAM with
“Learning legged swimming gaits from experience,” in Proc. ICRA, map reuse,” IEEE Robot. Autom. Lett., vol. 2, no. 2, pp. 796–803,
2015, pp. 2332–2338. 2017.
[13] M. Gary, N. Fairfield, W. C. Stone, D. Wettergreen, G. Kantor, [28] T. Qin, P. Li, and S. Shen, “VINS-Mono: A robust and versatile
and J. M. Sharp Jr, “3D mapping and characterization of sistema monocular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34,
Zacatón from DEPTHX (DEep Phreatic THermal eXplorer),” in Proc. no. 4, pp. 1004–1020, 2018.
of KARST: Sinkhole Conference ASCE, 2008. [29] B. Joshi, S. Rahman, M. Kalaitzakis, B. Cain, J. Johnson, M. Xan-
[14] W. C. Stone, “Design and Deployment of a 3-D Autonomous Subter- thidis, N. Karapetyan, A. Hernandez, A. Quattrini Li, N. Vitzilaios,
ranean Submarine Exploration Vehicle,” in Int. Symp. on Unmanned and I. Rekleitis, “Experimental Comparison of Open Source Visual-
Untethered Submersible Technologies (UUST), no. 512, 2007. Inertial-Based State Estimation Algorithms in the Underwater Do-
main,” in Proc. IROS, 2019.
[15] Stone Aerospace , “Digital Wall Mapper,” URL:http://stoneaerospace.
[30] N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: exploring
com/digital-wall-mapper/, Apr. 2015.
photo collections in 3d,” in ACM transactions on graphics (TOG),
[16] A. Mallios et al., “Toward autonomous exploration in confined under- vol. 25, no. 3, 2006, pp. 835–846.
water environments,” J. Field Robot., vol. 33, pp. 994–1012, 2016. [31] C. Wu, “Towards linear-time incremental structure from motion,” in
[17] S.-F. Chen and J.-Z. Yu, “Underwater cave search and entry using a IEEE Int. Conf. on 3D Vision-3DV, 2013, pp. 127–134.
robotic fish with embedded vision,” in Chinese Control Conference [32] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm,
(CCC), 2014, pp. 8335–8340. R. Yang, D. Nistér, and M. Pollefeys, “Real-time visibility-based
[18] K. Richmond, C. Flesher, L. Lindzey, N. Tanner, and W. C. Stone, fusion of depth maps,” in Proc. ICCV, 2007, pp. 1–8.
“SUNFISH R : A human-portable exploration AUV for complex 3D [33] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza,
environments,” in MTS/IEEE OCEANS Charleston, 2018, pp. 1–9. “SVO: Semidirect Visual Odometry for Monocular and Multicamera
[19] S. Rahman, A. Quattrini Li, and I. Rekleitis, “A modular sensor Systems,” IEEE Trans. Robot., vol. 33, no. 2, 2017.
suite for underwater reconstruction,” in MTS/IEEE Oceans Charleston, [34] C. Forster et al., “On-Manifold Preintegration for Real-Time Visual–
2018, pp. 1–6. Inertial Odometry,” IEEE Trans. Robot., vol. 33, no. 1, pp. 1–21, 2017.
[20] P. Corke, C. Detweiler, M. Dunbabin, M. Hamilton, D. Rus, and [35] S. Agarwal, K. Mierle, and Others, “Ceres Solver,” http://ceres-solver.
I. Vasilescu, “Experiments with underwater robot localization and org, 2015.
tracking,” in Proc. ICRA, 2007, pp. 4556–4561. [36] J. Shi et al., “Good features to track,” in Proc. CVPR, 1994, pp. 593–
[21] G. Klein and D. Murray, “Parallel tracking and mapping for small AR 600.
workspaces,” in IEEE and ACM Int. Symp. on Mixed and Augmented [37] J. J. Tarrio and S. Pedre, “Realtime edge based visual inertial odometry
Reality, 2007, pp. 225–234. for MAV teleoperation in indoor environments,” J. Intell. Robot. Syst.,
[22] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM: pp. 235–252, 2017.
A Versatile and Accurate Monocular SLAM System,” IEEE Trans. [38] A. Quattrini Li, A. Coskun, S. M. Doherty, S. Ghasemlou, A. S. Jagtap,
Robot., vol. 31, no. 5, pp. 1147–1163, 2015. M. Modasshir, S. Rahman, A. Singh, M. Xanthidis, J. M. O’Kane, and
[23] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE I. Rekleitis, “Experimental comparison of open source vision based
Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. 611–625, 2018. state estimation algorithms,” in Proc. ISER, 2016.
[24] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” [39] E. Westman and M. Kaess, “Underwater AprilTag SLAM and calibra-
in Proc. CVPR, 2016, pp. 4104–4113. tion for high precision robot localization,” Carnegie Mellon University,
[25] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalman Tech. Rep. CMU-RI-TR-18-43, 2018.
8059
Authorized licensed use limited to: FUNDACAO UNIVERSIDADE DO RIO GRANDE. Downloaded on November 14,2023 at 12:56:41 UTC from IEEE Xplore. Restrictions apply.