0% found this document useful (0 votes)

16 views10 pages

Neural Recon

NeuralRecon is a novel framework for real-time 3D scene reconstruction from monocular video, utilizing a neural network to directly reconstruct local surfaces as sparse TSDF volumes. It outperforms traditional depth-based methods by capturing local smoothness and global shape prior, achieving accurate and coherent 3D geometry in real-time. Experimental results demonstrate that NeuralRecon operates significantly faster than state-of-the-art methods while maintaining high reconstruction quality.

Uploaded by

Harre Bams Ayma Aranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Neural Recon

Uploaded by

Harre Bams Ayma Aranda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

Jiaming Sun1,2∗ Yiming Xie1∗ Linghao Chen1 Xiaowei Zhou1 Hujun Bao1†
1
Zhejiang University 2 SenseTime Research

Abstract 1 2 3 … 15
arXiv:2104.00681v1 [cs.CV] 1 Apr 2021

We present a novel framework named NeuralRecon for

real-time 3D scene reconstruction from a monocular video.
…
Unlike previous methods that estimate single-view depth
Reference View
maps separately on each key-frame and fuse them later, we Depth-based (38.78 s) Source View
propose to directly reconstruct local surfaces represented 1 2 3
as sparse TSDF volumes for each video fragment sequen-
tially by a neural network. A learning-based TSDF fusion
module based on gated recurrent units is used to guide the
network to fuse features from previous fragments. This de-
sign allows the network to capture local smoothness prior Ours (5.68 s)
and global shape prior of 3D surfaces when sequentially
reconstructing the surfaces, resulting in accurate, coher- Figure 1. Comparison between depth-based 3D reconstruction
ent, and real-time surface reconstruction. The experiments methods and the proposed method. In depth-based methods,
on ScanNet and 7-Scenes datasets show that our system key-frame depths are estimated separately from each key frame,
outperforms state-of-the-art methods in terms of both ac- and later fused into a TSDF volume. In the proposed method, the
curacy and speed. To the best of our knowledge, this is TSDF volume is directly predicted with all the key frames in a
the first learning-based system that is able to reconstruct local window. This design leads to a much more coherent recon-
struction and real-time speed.
dense coherent 3D geometry in real-time. Code is avail-
able at the project page: https://zju3dv.github.io/
neuralrecon/.
[31]. Single-view depth maps from each key frame are first
estimated with real-time multi-view depth estimation meth-
ods like [48, 24, 13, 46]. The estimated depth maps are later
1. Introduction filtered with criteria like multi-view consistency and tempo-
ral smoothness, and fused into a Truncated Signed Distance
3D scene reconstruction is one of the central tasks in 3D Function (TSDF) volume. The reconstructed mesh can be
computer vision with many applications. In augmented re- extracted from the fused TSDF volume with the Marching
ality (AR) for example, to enable realistic and immersive Cubes algorithm [27]. This depth-based pipeline has two
interactions between AR effects and the surrounding phys- major drawbacks. First, since single-view depth maps are
ical scene, 3D reconstruction needs to be accurate, coher- estimated individually on each key frame, each depth esti-
ent and performed in real-time. While camera motion can mation is from scratch instead of conditioned on the pre-
be tracked accurately with state-of-the-art visual-inertial vious estimations even the view-overlapping is substantial.
SLAM systems [3, 35, 1], real-time image-based dense re- As a result, the scale-factor may vary even with the correct
construction remains to be a challenging problem due to low camera ego-motion. Due to depth inconsistencies between
reconstruction quality and high computation demands. different views, the reconstruction result is prone to be ei-
Most image-based real-time 3D reconstruction pipelines ther layered or scattered. One example is shown in the red
[38, 52] adopt the depth map fusion approach, which re- boxes in Fig. 1, where the depth-based method struggles to
semble RGB-D reconstruction methods like KinectFusion produce coherent depth estimations on the chairs and wall.
∗ The first two authors contributed equally. The authors are affiliated Second, since key-frame depth maps need to be estimated
with the State Key Lab of CAD&CG and ZJU-SenseTime Joint Lab of 3D separately in overlapped local windows, geometry of the
Vision. † Corresponding author: Hujun Bao. same 3D surface is estimated multiple times in different key

1
frames, causing redundant computation. [46, 51] optimize this line of research towards low power
In this paper, we propose a novel framework for real- consumption on mobile platforms. Learning-based meth-
time monocular reconstruction named NeuralRecon that ods on real-time multi-view depth estimation try to alle-
jointly reconstructs and fuses the 3D geometry directly in viate the photo-consistency assumption with a data-driven
the volumetric TSDF representation. Given a sequence of approach. Notably, MVDepthNet [48] and Neural RGB-
monocular images and their corresponding camera poses >D [24] use 2D CNNs to process the 2D depth cost vol-
estimated by a SLAM system, NeuralRecon incrementally ume constructed from multi-view image features. CNMNet
reconstructs local geometry in a view-independent 3D vol- [26] further leverages the planar structure in indoor scenes
ume instead of view-dependent depth maps. Specifically, to constrain the surface normals calculated from the pre-
it unprojects the image features to form a 3D feature vol- dicted depth maps to obtain smooth depth estimation. These
ume and then uses sparse convolutions to process the feature learning-based methods use 2D CNNs to process the depth
volume to output a sparse TSDF volume. With a coarse- cost volume to maintain a low computational cost for near
to-fine design, the predicted TSDF is gradually refined at real-time performance.
each level. By directly reconstructing the implicit surface When the input images are high-resolution and offline
(TSDF), the network is able to learn the local smoothness computation is allowed, multi-view depth estimation is
and global shape prior of natural 3D surfaces. Different also known as the Multiple View Stereo (MVS) problem.
from depth-based methods that predict depth maps for each PatchMatch-based methods [56, 37] have achieved impres-
key frame separately, the surface geometry within a local sive accuracy and are still the most popular methods ap-
fragment window is jointly predicted in NeuralRecon, and plicable to high-resolution images. Learning-based ap-
thus locally coherent geometry estimation can be produced. proaches in MVS have recently dominated several bench-
To make the current-fragment reconstruction to be globally marks [2, 20] in terms of accuracy, but are only limited to
consistent with the previously reconstructed fragments, a processing mid-resolution images due to the GPU memory
learning-based TSDF fusion module using the Gated Re- constraint. Different from the real-time methods, 3D cost
current Unit (GRU) is proposed. The GRU fusion makes volumes are constructed and 3D CNNs are used to process
the current-fragment reconstruction conditioned on the pre- the cost volume as proposed in MVSNet [53]. Some recent
viously reconstructed global volume, yielding a joint recon- works [12, 4] improve this pipeline with a coarse-to-fine ap-
struction and fusion approach. As a result, the reconstructed proach. Similar design can also be found in many learning-
mesh is dense, accurate and globally coherent in scale. Fur- based SLAM systems [45, 57, 42, 44].
thermore, predicting the volumetric representation also re- All the above-mentioned works adopt single-view depth
moves the redundant computation in depth-based methods, maps as intermediate representations. SurfaceNet [15, 16]
which allows us to use a larger 3D CNN while maintaining takes a different approach and uses a unified volumetric rep-
the real-time performance. resentation to predict the volume occupancy. Recently, At-
We validate our system on the ScanNet and 7-Scenes las [30] also proposes a volumetric design and direct pre-
datasets. The experimental results show that NeuralRe- dicts TSDF and semantic labels with 3D CNN. As an offline
con outperforms multiple state-of-the-art multi-view depth method, Atlas aggregates the image features of the entire
estimation methods and the volume-based reconstruction sequence and then predicts the global TSDF volume only
method Atlas [30] by a large margin, while achieving a real- once with a decoder module. We further elaborate the rela-
time performance at 33 key frames per second, ∼10× faster tionship between the proposed method and Atlas in the sup-
compared to Atlas. As shown in the supplementary video, plementary material. The proposed method is also related to
our method is able to reconstruct large-scale 3D scenes from [5, 18] in terms of using recurrent networks for multi-view
a video stream on a laptop GPU in real-time. To the best of feature fusion. However, their recurrent fusion is applied
our knowledge, this is the first learning-based system that is to only the global features and their focus is to reconstruct
able to reconstruct dense and coherent 3D scene geometry single objects.
in real-time. 3D Surface Reconstruction. After depth maps are esti-
mated and converted to point clouds, the remaining task for
2. Related Work 3D reconstruction is to estimate the 3D surface position and
produce the reconstructed mesh. In an offline MVS pipeline
Multi-view Depth Estimation. The most related line of [37], Poisson reconstruction [19] and Delaunay triagula-
research is real-time methods for multi-view depth estimation [22] are often used to fulfill this purpose. Proposed
tion. Before the age of deep learning, many renowned by the seminal work KinectFusion [31], incremental volu-
works in monocular 3D reconstruction [47, 21, 38, 34] have metric TSDF fusion [7] gets widely adopted in real-time re-
achieved good performance with plane-sweeping stereo and construction scenarios due to its simplicity and paralleliza-
depth filters under the assumption of photo-consistency. tion capability. [32, 10] improve KinectFusion by making it

2
Global C Concatenate
… Hidden State
S1t S Sparsify
F1t GRU
MLP
<latexit sha1_base64="hY3m3Wc765S0dCsvqotn77+OMfM=">AAAB9XicbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5cV7QPaacmkmTY0kxmSO0oZ+h9uXCji1n9x59+YtrPQ1gOBwzn3ck9OkEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3grGNzO/9ci1EbF6wEnC/YgOlQgFo2ilXjeiOArC7H7a8/rYL1fcqjsHWSVeTiqQo94vf3UHMUsjrpBJakzHcxP0M6pRMMmnpW5qeELZmA55x1JFI278bJ56Ss6sMiBhrO1TSObq742MRsZMosBOzlKaZW8m/ud1Ugyv/UyoJEWu2OJQmEqCMZlVQAZCc4ZyYgllWtishI2opgxtUSVbgrf85VXSvKh6btW7cyu1y7yOIpzAKZyDB1dQg1uoQwMYaHiGV3hznpwX5935WIwWnHznGP7A+fwBfmKSbw==</latexit>
sha1_base64="+ZC0JRnFw6IV9/r6K0IQjwvA4yc=">AAAB9XicbVDLSgMxFL3js9ZXfezcBIvgxjIjgi4LblxWtA9opyWTZtrQTGZI7ihlKPgZblwo4tZ/ceffmD4W2nogcDjnXu7JCRIpDLrut7O0vLK6tp7byG9ube/sFvb2ayZONeNVFstYNwJquBSKV1Gg5I1EcxoFkteDwfXYrz9wbUSs7nGYcD+iPSVCwShaqd2KKPaDMLsbtb0OdgpFt+ROQBaJNyPF8uFZ7wkAKp3CV6sbszTiCpmkxjQ9N0E/oxoFk3yUb6WGJ5QNaI83LVU04sbPJqlH5MQqXRLG2j6FZKL+3shoZMwwCuzkOKWZ98bif14zxfDKz4RKUuSKTQ+FqSQYk3EFpCs0ZyiHllCmhc1KWJ9qytAWlbclePNfXiS185Lnlrxb28YFTJGDIziGU/DgEspwAxWoAgMNz/AKb86j8+K8Ox/T0SVntnMAf+B8/gBkPZPX</latexit>
sha1_base64="xXCTkKBIxWQkDJn3vc4d/SSDpNI=">AAAB9XicbVDLSsNAFL2pj9b6qo+dm2AR3FgSEXRZcOOyon1gX0ymk3boZBJmbiwl9D/cKCji1n9x5984abvQ1gMDh3Pu5Z45XiS4Rsf5tjIrq2vr2dxGfnNre2e3sLdf02GsKKvSUISq4RHNBJesihwFa0SKkcATrO4Nr1O//siU5qG8x3HE2gHpS+5zStBInVZAcOD5yd2k43axWyg6JWcKe5m4c1IsH571R9mXh0q38NXqhTQOmEQqiNZN14mwnRCFnAo2ybdizSJCh6TPmoZKEjDdTqapJ/aJUXq2HyrzJNpT9fdGQgKtx4FnJtOUetFLxf+8Zoz+VTvhMoqRSTo75MfCxtBOK7B7XDGKYmwIoYqbrDYdEEUomqLypgR38cvLpHZecp2Se2vauIAZcnAEx3AKLlxCGW6gAlWgoOAJXuHNGlnP1rv1MRvNWPOdA/gD6/MHrIuUzQ==</latexit>

S U Upsample
<latexit sha1_base64="SSbpW/qad9dr/a4oLs/6o9UAr88=">AAAB9XicbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCIC4r2Af0RSbNtKGZzJDcUcrQ/3DjQhG3/os7/8ZMOwttPRA4nHMv9+T4sRQGXffbKaytb2xuFbdLO7t7+wflw6OmiRLNeINFMtJtnxouheINFCh5O9achr7kLX9yk/mtR66NiNQDTmPeC+lIiUAwilbqd0OKYz9Ib2d9b4CDcsWtunOQVeLlpAI56oPyV3cYsSTkCpmkxnQ8N8ZeSjUKJvms1E0Mjymb0BHvWKpoyE0vnaeekTOrDEkQafsUkrn6eyOloTHT0LeTWUqz7GXif14nweC6lwoVJ8gVWxwKEkkwIlkFZCg0ZyinllCmhc1K2JhqytAWVbIleMtfXiXNi6rnVr17t1K7zOsowgmcwjl4cAU1uIM6NICBhmd4hTfnyXlx3p2PxWjByXeO4Q+czx9qbZJi</latexit>
sha1_base64="lnPCBgwihh62rmSSBxbJDu3RPzk=">AAAB9XicbVDLSgMxFL3js9ZXfezcBIvgxjIjgi4LgrisYB/QF5k004ZmMkNyRylDwc9w40IRt/6LO//GTNuFth4IHM65l3ty/FgKg6777Swtr6yurec28ptb2zu7hb39mokSzXiVRTLSDZ8aLoXiVRQoeSPWnIa+5HV/eJ359QeujYjUPY5i3g5pX4lAMIpW6rRCigM/SG/GHa+L3ULRLbkTkEXizUixfHjWfwKASrfw1epFLAm5QiapMU3PjbGdUo2CST7OtxLDY8qGtM+blioactNOJ6nH5MQqPRJE2j6FZKL+3khpaMwo9O1kltLMe5n4n9dMMLhqp0LFCXLFpoeCRBKMSFYB6QnNGcqRJZRpYbMSNqCaMrRF5W0J3vyXF0ntvOS5Je/OtnEBU+TgCI7hFDy4hDLcQgWqwEDDM7zCm/PovDjvzsd0dMmZ7RzAHzifP1BIk8o=</latexit>
sha1_base64="EXp3ZyZeslyxNXlgZmIiSOiP5Eo=">AAAB9XicbVDLSsNAFL2pj9b6qo+dm2AR3FgSEXRZEMRlBfvAvphMJ+3QySTM3FhK6H+4UVDErf/izr9x0nahrQcGDufcyz1zvEhwjY7zbWVWVtfWs7mN/ObW9s5uYW+/psNYUValoQhVwyOaCS5ZFTkK1ogUI4EnWN0bXqd+/ZEpzUN5j+OItQPSl9znlKCROq2A4MDzk5tJx+1it1B0Ss4U9jJx56RYPjzrj7IvD5Vu4avVC2kcMIlUEK2brhNhOyEKORVskm/FmkWEDkmfNQ2VJGC6nUxTT+wTo/RsP1TmSbSn6u+NhARajwPPTKYp9aKXiv95zRj9q3bCZRQjk3R2yI+FjaGdVmD3uGIUxdgQQhU3WW06IIpQNEXlTQnu4peXSe285Dol9860cQEz5OAIjuEUXLiEMtxCBapAQcETvMKbNbKerXfrYzaaseY7B/AH1ucPmJaUwA==</latexit>

Fusion Extract
Image Encoder Replace
U

Coarse-to- Fine
Global Sgt
<latexit sha1_base64="PJpKLRmk8ERcNxmOUXHBlXOLN4U=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4sSQi6LLgxmVF+4A2hsl00g6dTMLMRCwhv+LGhSJu/RF3/o2TNgttPTBwOOde7pkTJJwp7Tjf1srq2vrGZmWrur2zu7dvH9Q6Kk4loW0S81j2AqwoZ4K2NdOc9hJJcRRw2g0m14XffaRSsVjc62lCvQiPBAsZwdpIvl0bRFiPgzC7yx9GfqbP3Ny3607DmQEtE7ckdSjR8u2vwTAmaUSFJhwr1XedRHsZlpoRTvPqIFU0wWSCR7RvqMARVV42y56jE6MMURhL84RGM/X3RoYjpaZRYCaLpGrRK8T/vH6qwysvYyJJNRVkfihMOdIxKopAQyYp0XxqCCaSmayIjLHERJu6qqYEd/HLy6Rz3nCdhnvr1JsXZR0VOIJjOAUXLqEJN9CCNhB4gmd4hTcrt16sd+tjPrpilTuH8AfW5w/8CJRU</latexit>
sha1_base64="LPo8xU6B0BcTMeVdAsUNOVnQjGw=">AAAB+3icbVDLSsNAFL2pr1pfsS51MVgEN5ZEBF0WBHFZ0T6gjWEynbRDJ5MwMxFLyK+4caGIW3/EnTs/xeljoa0HBg7n3Ms9c4KEM6Ud58sqLC2vrK4V10sbm1vbO/ZuuaniVBLaIDGPZTvAinImaEMzzWk7kRRHAaetYHg59lsPVCoWizs9SqgX4b5gISNYG8m3y90I60EQZrf5fd/P9Imb+3bFqToToEXizkildhBefQNA3bc/u72YpBEVmnCsVMd1Eu1lWGpGOM1L3VTRBJMh7tOOoQJHVHnZJHuOjozSQ2EszRMaTdTfGxmOlBpFgZkcJ1Xz3lj8z+ukOrzwMiaSVFNBpofClCMdo3ERqMckJZqPDMFEMpMVkQGWmGhTV8mU4M5/eZE0T6uuU3VvTBtnMEUR9uEQjsGFc6jBNdShAQQe4Qle4NXKrWfrzXqfjhas2c4e/IH18QOOW5Y8</latexit>
sha1_base64="6uSwjjSS5Fuj7D0l2A0+llhOdlE=">AAAB+3icbVDLSsNAFJ3UV62vtK5EF8EiuLEkIuiyIIjLivYBbQyT6aQdOpmEmRuxhPyKGxeKuPVH3PkZ/oGTtgttPTBwOOde7pnjx5wpsO0vo7C0vLK6VlwvbWxube+Y5UpLRYkktEkiHsmOjxXlTNAmMOC0E0uKQ5/Ttj+6zP32A5WKReIOxjF1QzwQLGAEg5Y8s9ILMQz9IL3N7gdeCidO5plVu2ZPYC0SZ0aq9YPg6ru1V2545mevH5EkpAIIx0p1HTsGN8USGOE0K/USRWNMRnhAu5oKHFLlppPsmXWklb4VRFI/AdZE/b2R4lCpcejryTypmvdy8T+vm0Bw4aZMxAlQQaaHgoRbEFl5EVafSUqAjzXBRDKd1SJDLDEBXVdJl+DMf3mRtE5rjl1zbnQbZ2iKItpHh+gYOegc1dE1aqAmIugRPaEX9GpkxrPxZrxPRwvGbGcX/YHx8QNQh5bN</latexit>
1 Sgt
<latexit sha1_base64="uXixzNhEISBS5pPAP22ypKVcKb4=">AAAB9XicbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5cV7QPaacmkmTY0kxmSO0oZ+h9uXCji1n9x59+YtrPQ1gOBwzn3ck9OkEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3grGNzO/9ci1EbF6wEnC/YgOlQgFo2ilXjeiOArC7H7aG/axX664VXcOskq8nFQgR71f/uoOYpZGXCGT1JiO5yboZ1SjYJJPS93U8ISyMR3yjqWKRtz42Tz1lJxZZUDCWNunkMzV3xsZjYyZRIGdnKU0y95M/M/rpBhe+5lQSYpcscWhMJUEYzKrgAyE5gzlxBLKtLBZCRtRTRnaokq2BG/5y6ukeVH13Kp351Zql3kdRTiBUzgHD66gBrdQhwYw0PAMr/DmPDkvzrvzsRgtOPnOMfyB8/kD0KaSpQ==</latexit>
sha1_base64="iVFOYBrgGv9/jQwbR0k6Ymt89vg=">AAAB9XicbVDLSsNAFL2pr1pf9YEbN8EiuCqJCLosuHFZ0T6gTctkOmmHTiZh5kYpof/hRkERt/6C3+DOv3HSdqGtBwYO59zLPXP8WHCNjvNt5ZaWV1bX8uuFjc2t7Z3i7l5dR4mirEYjEammTzQTXLIachSsGStGQl+whj+8yvzGPVOaR/IORzHzQtKXPOCUoJE67ZDgwA/S23Gn38VuseSUnQnsReLOSKlycPj8CQDVbvGr3YtoEjKJVBCtW64To5cShZwKNi60E81iQoekz1qGShIy7aWT1GP7xCg9O4iUeRLtifp7IyWh1qPQN5NZSj3vZeJ/XivB4NJLuYwTZJJODwWJsDGyswrsHleMohgZQqjiJqtNB0QRiqaoginBnf/yIqmflV2n7N6YNs5hijwcwTGcggsXUIFrqEINKCh4hBd4tR6sJ+vNep+O5qzZzj78gfXxAzeZlG0=</latexit>
sha1_base64="rOv6QVju2HoUZYbtcljC2hIQcqE=">AAAB9XicbVDLSsNAFL3xWeurPnDjJlgEVyURQZcFQcRVRfuAvphMJ+3QySTM3Cgl9D/cKCji1n/wA1y482+ctF1o64GBwzn3cs8cLxJco+N8W3PzC4tLy5mV7Ora+sZmbmu7osNYUVamoQhVzSOaCS5ZGTkKVosUI4EnWNXrn6d+9Y4pzUN5i4OINQPSldznlKCRWo2AYM/zk5thq9vGdi7vFJwR7FniTki+uLv39PF5cVVq574anZDGAZNIBdG67joRNhOikFPBhtlGrFlEaJ90Wd1QSQKmm8ko9dA+NErH9kNlnkR7pP7eSEig9SDwzGSaUk97qfifV4/RP2smXEYxMknHh/xY2BjaaQV2hytGUQwMIVRxk9WmPaIIRVNU1pTgTn95llSOC65TcK9NGycwRgb24QCOwIVTKMIllKAMFBQ8wDO8WPfWo/VqvY1H56zJzg78gfX+Aw5/lc0=</latexit>

Hidden State
F2t S2t
GRU
C MLP S
<latexit sha1_base64="I/9Wc+BqWUwsXvJNo79qs6sAYjA=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZki6LLgxmVF+4C+yKSZNjSTGZI7Shn6H25cKOLWf3Hn35hpZ6GtBwKHc+7lnhw/lsKg6347a+sbm1vbhZ3i7t7+wWHp6LhpokQz3mCRjHTbp4ZLoXgDBUrejjWnoS95y5/cZH7rkWsjIvWA05j3QjpSIhCMopX63ZDi2A/S+1m/OsBBqexW3DnIKvFyUoYc9UHpqzuMWBJyhUxSYzqeG2MvpRoFk3xW7CaGx5RN6Ih3LFU05KaXzlPPyLlVhiSItH0KyVz9vZHS0Jhp6NvJLKVZ9jLxP6+TYHDdS4WKE+SKLQ4FiSQYkawCMhSaM5RTSyjTwmYlbEw1ZWiLKtoSvOUvr5JmteK5Fe/OLdcu8zoKcApncAEeXEENbqEODWCg4Rle4c15cl6cd+djMbrm5Dsn8AfO5w9/6JJw</latexit>
sha1_base64="w40Qy61/95/0JrHKw9XuOREKe1A=">AAAB9XicbVDLSsNAFL2pr1pf9bFzEyyCG0tSBF0W3LisaB/QpmUynbRDJ5Mwc6OUUPAz3LhQxK3/4s6/cdJ2oa0HBg7n3Ms9c/xYcI2O823lVlbX1jfym4Wt7Z3dveL+QUNHiaKsTiMRqZZPNBNcsjpyFKwVK0ZCX7CmP7rO/OYDU5pH8h7HMfNCMpA84JSgkbqdkODQD9K7SbfSw16x5JSdKexl4s5JqXp0PngCgFqv+NXpRzQJmUQqiNZt14nRS4lCTgWbFDqJZjGhIzJgbUMlCZn20mnqiX1qlL4dRMo8ifZU/b2RklDrceibySylXvQy8T+vnWBw5aVcxgkySWeHgkTYGNlZBXafK0ZRjA0hVHGT1aZDoghFU1TBlOAufnmZNCpl1ym7t6aNC5ghD8dwAmfgwiVU4QZqUAcKCp7hFd6sR+vFerc+ZqM5a75zCH9gff4AZcOT2A==</latexit>
sha1_base64="52FYRkQbIyRDsdWb32XfYs2e5Vc=">AAAB9XicbVDLSsNAFL3x1Vpf9bFzEyyCG0tSBF0W3LisaB/YF5PppB06mYSZG0sJ/Q83Coq49V/c+TdO2i609cDA4Zx7uWeOFwmu0XG+rZXVtfWNTHYzt7W9s7uX3z+o6TBWlFVpKELV8IhmgktWRY6CNSLFSOAJVveG16lff2RK81De4zhi7YD0Jfc5JWikTisgOPD85G7SKXWxmy84RWcKe5m4c1IoH533R5mXh0o3/9XqhTQOmEQqiNZN14mwnRCFnAo2ybVizSJCh6TPmoZKEjDdTqapJ/apUXq2HyrzJNpT9fdGQgKtx4FnJtOUetFLxf+8Zoz+VTvhMoqRSTo75MfCxtBOK7B7XDGKYmwIoYqbrDYdEEUomqJypgR38cvLpFYquk7RvTVtXMAMWTiGEzgDFy6hDDdQgSpQUPAEr/Bmjaxn6936mI2uWPOdQ/gD6/MHrhGUzg==</latexit>

<latexit sha1_base64="6a88EkJH8iiUaw5vujPyf/1nWPY=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclZki6LIgiMsK9gF9kUkzbWgmMyR3lDL0P9y4UMSt/+LOvzHTzkJbDwQO59zLPTl+LIVB1/121tY3Nre2CzvF3b39g8PS0XHTRIlmvMEiGem2Tw2XQvEGCpS8HWtOQ1/ylj+5yfzWI9dGROoBpzHvhXSkRCAYRSv1uyHFsR+kt7N+dYCDUtmtuHOQVeLlpAw56oPSV3cYsSTkCpmkxnQ8N8ZeSjUKJvms2E0Mjymb0BHvWKpoyE0vnaeekXOrDEkQafsUkrn6eyOloTHT0LeTWUqz7GXif14nweC6lwoVJ8gVWxwKEkkwIlkFZCg0ZyinllCmhc1K2JhqytAWVbQleMtfXiXNasVzK969W65d5nUU4BTO4AI8uIIa3EEdGsBAwzO8wpvz5Lw4787HYnTNyXdO4A+czx9r85Jj</latexit>
sha1_base64="YSfaX/Lq8t0oAOcE9BqS9v6Zhfg=">AAAB9XicbVDLSsNAFL2pr1pf9bFzEyyCG0tSBF0WBHFZwT6gTctkOmmHTiZh5kYpoeBnuHGhiFv/xZ1/46TtQlsPDBzOuZd75vix4Bod59vKrayurW/kNwtb2zu7e8X9g4aOEkVZnUYiUi2faCa4ZHXkKFgrVoyEvmBNf3Sd+c0HpjSP5D2OY+aFZCB5wClBI3U7IcGhH6Q3k26lh71iySk7U9jLxJ2TUvXofPAEALVe8avTj2gSMolUEK3brhOjlxKFnAo2KXQSzWJCR2TA2oZKEjLtpdPUE/vUKH07iJR5Eu2p+nsjJaHW49A3k1lKvehl4n9eO8Hgyku5jBNkks4OBYmwMbKzCuw+V4yiGBtCqOImq02HRBGKpqiCKcFd/PIyaVTKrlN270wbFzBDHo7hBM7AhUuowi3UoA4UFDzDK7xZj9aL9W59zEZz1nznEP7A+vwBUc6Tyw==</latexit>
sha1_base64="b+xGtgNeJFr1PaN7DIWiQV9NiLQ=">AAAB9XicbVDLSsNAFL3x1Vpf9bFzEyyCG0tSBF0WBHFZwT6wLybTSTt0MgkzN5YS+h9uFBRx67+482+ctF1o64GBwzn3cs8cLxJco+N8Wyura+sbmexmbmt7Z3cvv39Q02GsKKvSUISq4RHNBJesihwFa0SKkcATrO4Nr1O//siU5qG8x3HE2gHpS+5zStBInVZAcOD5yc2kU+piN19wis4U9jJx56RQPjrvjzIvD5Vu/qvVC2kcMIlUEK2brhNhOyEKORVskmvFmkWEDkmfNQ2VJGC6nUxTT+xTo/RsP1TmSbSn6u+NhARajwPPTKYp9aKXiv95zRj9q3bCZRQjk3R2yI+FjaGdVmD3uGIUxdgQQhU3WW06IIpQNEXlTAnu4peXSa1UdJ2ie2fauIAZsnAMJ3AGLlxCGW6hAlWgoOAJXuHNGlnP1rv1MRtdseY7h/AH1ucPmhyUwQ==</latexit>

Fusion

Global
Hidden State
F3t GRU S3t Slt
C MLP S
<latexit sha1_base64="WNe7brbRKZm3ALKaoj8QI/lwfmo=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclRkVdFkQxGUF+4C+yKSZNjSTGZI7Shn6H25cKOLWf3Hn35hpZ6GtBwKHc+7lnhw/lsKg6347K6tr6xubha3i9s7u3n7p4LBhokQzXmeRjHTLp4ZLoXgdBUreijWnoS950x/fZH7zkWsjIvWAk5h3QzpUIhCMopV6nZDiyA/S22nvoo/9UtmtuDOQZeLlpAw5av3SV2cQsSTkCpmkxrQ9N8ZuSjUKJvm02EkMjykb0yFvW6poyE03naWeklOrDEgQafsUkpn6eyOloTGT0LeTWUqz6GXif147weC6mwoVJ8gVmx8KEkkwIlkFZCA0ZygnllCmhc1K2IhqytAWVbQleItfXiaN84rnVrx7t1y9zOsowDGcwBl4cAVVuIMa1IGBhmd4hTfnyXlx3p2P+eiKk+8cwR84nz9teZJk</latexit>
sha1_base64="TeOvM5ArAoT53jv0W1hPnDMJXs8=">AAAB9XicbVDJSgNBFHwTtxi3uNy8NAbBi2FGBT0GBPEYwSyQTEJPpydp0rPQ/UYJQ8DP8OJBEa/+izf/xp4kB00saCiq3uNVlxdLodG2v63c0vLK6lp+vbCxubW9U9zdq+soUYzXWCQj1fSo5lKEvIYCJW/GitPAk7zhDa8zv/HAlRZReI+jmLsB7YfCF4yikTrtgOLA89Obcee8i91iyS7bE5BF4sxIqXJw2n8CgGq3+NXuRSwJeIhMUq1bjh2jm1KFgkk+LrQTzWPKhrTPW4aGNODaTSepx+TYKD3iR8q8EMlE/b2R0kDrUeCZySylnvcy8T+vlaB/5aYijBPkIZse8hNJMCJZBaQnFGcoR4ZQpoTJStiAKsrQFFUwJTjzX14k9bOyY5edO9PGBUyRh0M4ghNw4BIqcAtVqAEDBc/wCm/Wo/VivVsf09GcNdvZhz+wPn8AU1STzA==</latexit>
sha1_base64="rZVbI+AYB9wK5hfgFJ5lgcJjWmQ=">AAAB9XicbVDLSsNAFL2pj9b6qo+dm2AR3FgSFXRZEMRlBfvAvphMJ+3QySTM3FhK6H+4UVDErf/izr9x0rrQ1gMDh3Pu5Z45XiS4Rsf5sjJLyyur2dxafn1jc2u7sLNb02GsKKvSUISq4RHNBJesihwFa0SKkcATrO4Nr1K//sCU5qG8w3HE2gHpS+5zStBInVZAcOD5yfWkc9bFbqHolJwp7EXi/pBief+kP8o+31e6hc9WL6RxwCRSQbRuuk6E7YQo5FSwSb4VaxYROiR91jRUkoDpdjJNPbGPjNKz/VCZJ9Geqr83EhJoPQ48M5mm1PNeKv7nNWP0L9sJl1GMTNLZIT8WNoZ2WoHd44pRFGNDCFXcZLXpgChC0RSVNyW4819eJLXTkuuU3FvTxjnMkIMDOIRjcOECynADFagCBQWP8AKv1sh6st6s99loxvrZ2YM/sD6+AZuilMI=</latexit> <latexit sha1_base64="t65hbJUORQz5y5qYhp1cDXMJz0o=">AAAB9XicbVDLSgMxFL3js9ZX1aWbYBFclRkVdFlw47KifUA7LZk004ZmMkNyRylD/8ONC0Xc+i/u/BvTdhbaeiBwOOde7skJEikMuu63s7K6tr6xWdgqbu/s7u2XDg4bJk4143UWy1i3Amq4FIrXUaDkrURzGgWSN4PRzdRvPnJtRKwecJxwP6IDJULBKFqp24koDoMwu590L3rYK5XdijsDWSZeTsqQo9YrfXX6MUsjrpBJakzbcxP0M6pRMMknxU5qeELZiA5421JFI278bJZ6Qk6t0idhrO1TSGbq742MRsaMo8BOTlOaRW8q/ue1Uwyv/UyoJEWu2PxQmEqCMZlWQPpCc4ZybAllWtishA2ppgxtUUVbgrf45WXSOK94bsW7c8vVy7yOAhzDCZyBB1dQhVuoQR0YaHiGV3hznpwX5935mI+uOPnOEfyB8/kDgW6ScQ==</latexit>
sha1_base64="AgBtdGTbY7KXiF824FFtZ0imFWI=">AAAB9XicbVDJSgNBFHwTtxi3uNy8NAbBi2FGBT0GvHiMaBZIJqGn05M06VnofqOEIeBnePGgiFf/xZt/Y0+SgyYWNBRV7/Gqy4ul0Gjb31ZuaXlldS2/XtjY3NreKe7u1XWUKMZrLJKRanpUcylCXkOBkjdjxWngSd7whteZ33jgSosovMdRzN2A9kPhC0bRSJ12QHHg+enduHPexW6xZJftCcgicWakVDk47T8BQLVb/Gr3IpYEPEQmqdYtx47RTalCwSQfF9qJ5jFlQ9rnLUNDGnDtppPUY3JslB7xI2VeiGSi/t5IaaD1KPDMZJZSz3uZ+J/XStC/clMRxgnykE0P+YkkGJGsAtITijOUI0MoU8JkJWxAFWVoiiqYEpz5Ly+S+lnZscvOrWnjAqbIwyEcwQk4cAkVuIEq1ICBgmd4hTfr0Xqx3q2P6WjOmu3swx9Ynz9nSZPZ</latexit>
sha1_base64="XRYu97Axa/Jc7A6HTXU3wkLTcUU=">AAAB9XicbVDLSsNAFL2pj9b6qo+dm2AR3FgSFXRZcOOyon1gX0ymk3boZBJmbiwl9D/cKCji1n9x5984aV1o64GBwzn3cs8cLxJco+N8WZml5ZXVbG4tv76xubVd2Nmt6TBWlFVpKELV8IhmgktWRY6CNSLFSOAJVveGV6lff2BK81De4Thi7YD0Jfc5JWikTisgOPD85HbSOetit1B0Ss4U9iJxf0ixvH/SH2Wf7yvdwmerF9I4YBKpIFo3XSfCdkIUcirYJN+KNYsIHZI+axoqScB0O5mmnthHRunZfqjMk2hP1d8bCQm0HgeemUxT6nkvFf/zmjH6l+2EyyhGJunskB8LG0M7rcDuccUoirEhhCpustp0QBShaIrKmxLc+S8vktppyXVK7o1p4xxmyMEBHMIxuHABZbiGClSBgoJHeIFXa2Q9WW/W+2w0Y/3s7MEfWB/fr5eUzw==</latexit>

<latexit sha1_base64="GdrM42YpDFeSRYHw4cC9/6Gqtp0=">AAAB9XicbVDLSgMxFL1TX7W+qi7dBIvgqsyIoMuCG5cV7QPaacmkmTY0kxmSO0oZ+h9uXCji1n9x59+YtrPQ1gOBwzn3ck9OkEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmjjVjDdYLGPdDqjhUijeQIGStxPNaRRI3grGNzO/9ci1EbF6wEnC/YgOlQgFo2ilXjeiOArC7H7ax57slytu1Z2DrBIvJxXIUe+Xv7qDmKURV8gkNabjuQn6GdUomOTTUjc1PKFsTIe8Y6miETd+Nk89JWdWGZAw1vYpJHP190ZGI2MmUWAnZynNsjcT//M6KYbXfiZUkiJXbHEoTCXBmMwqIAOhOUM5sYQyLWxWwkZUU4a2qJItwVv+8ippXlQ9t+rduZXaZV5HEU7gFM7BgyuowS3UoQEMNDzDK7w5T86L8+58LEYLTr5zDH/gfP4A2FaSqg==</latexit>
sha1_base64="Bu52UauP6zMk1TN3tEr87l9+BiU=">AAAB9XicbVDLSgMxFL3js9ZXfezcBIvgxjIjgi4LblxWtA9opyWTZtrQTGZI7ihlKPgZblwo4tZ/ceffmD4W2nogcDjnXu7JCRIpDLrut7O0vLK6tp7byG9ube/sFvb2ayZONeNVFstYNwJquBSKV1Gg5I1EcxoFkteDwfXYrz9wbUSs7nGYcD+iPSVCwShaqd2KKPaDMLsbdbAtO4WiW3InIIvEm5Fi+fCs9wQAlU7hq9WNWRpxhUxSY5qem6CfUY2CST7Kt1LDE8oGtMeblioaceNnk9QjcmKVLgljbZ9CMlF/b2Q0MmYYBXZynNLMe2PxP6+ZYnjlZ0IlKXLFpofCVBKMybgC0hWaM5RDSyjTwmYlrE81ZWiLytsSvPkvL5LaeclzS96tbeMCpsjBERzDKXhwCWW4gQpUgYGGZ3iFN+fReXHenY/p6JIz2zmAP3A+fwC+MZQS</latexit>
sha1_base64="u0coKfp+RVN1rEPt/pzGT8QykJw=">AAAB9XicbVDLSsNAFL2pj9b6qo+dm2AR3FgSEXRZcOOyon1gX0ymk3boZBJmbiwl9D/cKCji1n9x5984abvQ1gMDh3Pu5Z45XiS4Rsf5tjIrq2vr2dxGfnNre2e3sLdf02GsKKvSUISq4RHNBJesihwFa0SKkcATrO4Nr1O//siU5qG8x3HE2gHpS+5zStBInVZAcOD5yd2kix3RLRSdkjOFvUzcOSmWD8/6o+zLQ6Vb+Gr1QhoHTCIVROum60TYTohCTgWb5FuxZhGhQ9JnTUMlCZhuJ9PUE/vEKD3bD5V5Eu2p+nsjIYHW48Azk2lKveil4n9eM0b/qp1wGcXIJJ0d8mNhY2inFdg9rhhFMTaEUMVNVpsOiCIUTVF5U4K7+OVlUjsvuU7JvTVtXMAMOTiCYzgFFy6hDDdQgSpQUPAEr/Bmjaxn6936mI1mrPnOAfyB9fkDBo6VCA==</latexit>

Fusion
{It , ⇠t }N
<latexit sha1_base64="fT3y9EB3thfSWzwgGQUL/ASbTa4=">AAACAnicbVDLSsNAFJ34rPUVdSVuBovgQkrixi4LbnQjFewDmhAm00k7dPJg5kYsIbjxV9y4UMStX+HOv3HSZqGtBy4czrmXe+/xE8EVWNa3sbS8srq2Xtmobm5t7+yae/sdFaeSsjaNRSx7PlFM8Ii1gYNgvUQyEvqCdf3xZeF375lUPI7uYJIwNyTDiAecEtCSZx46mRMSGPlBdu1BfoadB+6Bk3s3nlmz6tYUeJHYJamhEi3P/HIGMU1DFgEVRKm+bSXgZkQCp4LlVSdVLCF0TIasr2lEQqbcbPpCjk+0MsBBLHVFgKfq74mMhEpNQl93Fueqea8Q//P6KQQNN+NRkgKL6GxRkAoMMS7ywAMuGQUx0YRQyfWtmI6IJBR0alUdgj3/8iLpnNdtq27fWrVmo4yjgo7QMTpFNrpATXSFWqiNKHpEz+gVvRlPxovxbnzMWpeMcuYA/YHx+QMh3pcy</latexit>
sha1_base64="PtZBmMXjmi1HqwqJ+YubSC4aKIY=">AAACAnicbVA9S8RAEJ347fkVtVFsFg/BQo7ExisFG21EwfMOLiFs9ja6uPlgdyKeIdj4V2wsFLH1V9gJ/hj3Pgq988HA470ZZuaFmRQaHefLmpicmp6ZnZuvLCwuLa/Yq2uXOs0V4w2WylS1Qqq5FAlvoEDJW5niNA4lb4Y3Rz2/ecuVFmlygd2M+zG9SkQkGEUjBfamV3gxxeswKk4CLPeIdycC9MrgNLCrTs3pg4wTd0iqhxv332BwFtifXidlecwTZJJq3XadDP2CKhRM8rLi5ZpnlN3QK942NKEx137Rf6EkO0bpkChVphIkffX3REFjrbtxaDp75+pRryf+57VzjOp+IZIsR56wwaIolwRT0suDdITiDGXXEMqUMLcSdk0VZWhSq5gQ3NGXx8nlfs11au65SaMOA8zBFmzDLrhwAIdwDGfQAAYP8AQv8Go9Ws/Wm/U+aJ2whjPr8AfWxw9m5Jjg</latexit>
sha1_base64="zV9TLPrQjy/3jm1J6Dj2nfOpw4E=">AAACAnicbVC7SgNBFJ2NryS+ojaKzWAQLCTs2pgyaKONRDAPyIZldjKbDJl9MHNXjEvQwl+xsVDU1q+wE/wYZ5MUmnjgwuGce7n3HjcSXIFpfhmZufmFxaVsLr+8srq2XtjYrKswlpTVaChC2XSJYoIHrAYcBGtGkhHfFazh9k9Tv3HNpOJhcAWDiLV90g24xykBLTmFHTuxfQI910vOHRgeYvuGO2APnQunUDRL5gh4llgTUqxs337n7t9Oqk7h0+6ENPZZAFQQpVqWGUE7IRI4FWyYt2PFIkL7pMtamgbEZ6qdjF4Y4n2tdLAXSl0B4JH6eyIhvlID39Wd6blq2kvF/7xWDF65nfAgioEFdLzIiwWGEKd54A6XjIIYaEKo5PpWTHtEEgo6tbwOwZp+eZbUj0qWWbIudRplNEYW7aI9dIAsdIwq6AxVUQ1RdIce0TN6MR6MJ+PVeB+3ZozJzBb6A+PjB2RQml0=</latexit>
…

Output: Pred. Geo.

Input: Fragment Posed Images Unprojection (Sparse TSDF)

Figure 2. NeuralRecon architecture. NeuralRecon predicts TSDF with a three-level coarse-to-fine approach that gradually increases the
density of sparse voxels. Key-frame images in the local fragment are first passed through the image backbone to extract the multi-level
features. These image features are later back-projected along each ray and aggregated into a 3D feature volume Flt , where l represents the
level index. At the first level (l = 1), a dense TSDF volume S1t is predicted. At the second and third levels, the upsampled Sl−1
t from the
last level is concatenated with Flt and used as the input for the GRU Fusion and MLP modules. A feature volume defined in the world
frame is maintained at each level as the global hidden state of the GRU. At the last level, the output Slt is used to replace corresponding
voxels in the global TSDF volume Sgt , yielding the final reconstruction at time t.

more scalable and robust. RoutedFusion [49, 50] changes as input for the networks. To provide enough motion par-
the fusion operation from a simple linear addition into a allax while keeping multi-view co-visibility for reconstruc-
data-dependent process. tion, the selected key frames should be neither too close
Neural Implicit Representations. Recently, neural im- nor far from each other. Following [13], a new incoming
plicit representations [29, 33, 36, 17, 54, 25] have gained frame is selected as a key frame if its relative translation is
significant advances. Our work also learns a neural implicit greater than tmax and the relative rotation angle is greater
representation by predicting SDF with the neural network Rmax . A window with N key frames is defined as a lo-
from the encoded image features similar to PIFu [36]. The cal fragment. After key frames are selected, a cubic-shaped
key difference is that we are using sparse 3D convolution fragment bounding volume (FBV) that encloses all the key
to predict a discrete TSDF volume, instead of querying the frame view-frustums is computed with a fixed max depth
MLP network with image features and 3D coordinates. range dmax in each view. Only the region within the FBV
is considered during the reconstruction of each fragment.
3. Methods 3.2. Joint Fragment Reconstruction and Fusion
Given a sequence of monocular images {It } and camera We propose to simultaneously reconstruct the TSDF vol-
pose trajectory {ξt } ∈ SE(3) provided by a SLAM system, ume of a local fragment Slt and fuse it with global TSDF
the goal is to reconstruct dense 3D scene geometry accu- volume Sgt with a learning-based approach. The joint re-
rately in real-time. We denote the global TSDF volume to construction and fusion is carried out in the local coordinate
reconstruct as Sgt , where t represents the current time step. system. The definition of the local and global coordinate
The system architecture is illustrated in Fig. 2. systems as well as the construction of FBV are illustrated in
Fig. 1 of the supplementary material.
3.1. Key Frame Selection
Image Feature Volume Construction. The N images in
To achieve real-time 3D reconstruction that is suit- the local fragment are first passed through the image back-
able for interactive applications, the reconstruction process bone to extract the multi-level features. Similar to previ-
needs to be incremental and the input images should be pro- ous works on volumetric reconstruction [18, 15, 30], the
cessed sequentially in local fragments [40]. We seek to find extracted features are back-projected along each ray into
a set of suitable key frames from the incoming image stream the 3D feature volume. The image feature volume Flt is

3
Hgt 1 Hgt
Fragment
<latexit sha1_base64="FhAJp2R5i8MqwrlQPMjVx9UqCnY=">AAAB+XicbVDLSsNAFL2pr1pfUZduBovgqiRu1F3BTZcVbCu0sUymk3bo5MHMTaGE/IkbF4q49U/c+TdO2iy09cDA4Zx7uWeOn0ih0XG+rcrG5tb2TnW3trd/cHhkH590dZwqxjsslrF69KnmUkS8gwIlf0wUp6Evec+f3hV+b8aVFnH0gPOEeyEdRyIQjKKRhrY9CClO/CBr5cMM86fx0K47DWcBsk7cktShRHtofw1GMUtDHiGTVOu+6yToZVShYJLntUGqeULZlI5539CIhlx72SJ5Ti6MMiJBrMyLkCzU3xsZDbWeh76ZLHLqVa8Q//P6KQY3XiaiJEUeseWhIJUEY1LUQEZCcYZybghlSpishE2oogxNWTVTgrv65XXSvWq4TsO9d+rN27KOKpzBOVyCC9fQhBa0oQMMZvAMr/BmZdaL9W59LEcrVrlzCn9gff4AB3OT3A==</latexit>
sha1_base64="f5BGDqvn49MBECq6FuAs+4I+H6w=">AAAB+XicbVDLSsNAFL2pr1pfUVeii8EiuCqJG3VX6KbLCvYBbQyT6aQdOpmEmUmhhPyJGxeKuPVP3Pk3TtoutPXAwOGce7lnTpBwprTjfFuljc2t7Z3ybmVv/+DwyD4+6ag4lYS2Scxj2QuwopwJ2tZMc9pLJMVRwGk3mDQKvzulUrFYPOpZQr0IjwQLGcHaSL5tDyKsx0GYNXM/0/nTyLerTs2ZA60Td0mq9bOLBgKAlm9/DYYxSSMqNOFYqb7rJNrLsNSMcJpXBqmiCSYTPKJ9QwWOqPKyefIcXRlliMJYmic0mqu/NzIcKTWLAjNZ5FSrXiH+5/VTHd55GRNJqqkgi0NhypGOUVEDGjJJieYzQzCRzGRFZIwlJtqUVTEluKtfXiedm5rr1NwH08Y9LFCGc7iEa3DhFurQhBa0gcAUnuEV3qzMerHerY/FaMla7pzCH1ifPyuIlLQ=</latexit>
sha1_base64="F4GTW4goxy3pbUd5UpKWmIA8yH0=">AAAB+XicbVDLSsNAFL2pr1pfUVeii8EiuJCSuFF3hW66rGAf0MYwmU7aoZMHM5NiDfkTNy4q4tY/ceffOGm70NYDA4dz7uWeOV7MmVSW9W0U1tY3NreK26Wd3b39A/PwqCWjRBDaJBGPRMfDknIW0qZiitNOLCgOPE7b3qiW++0xFZJF4YOaxNQJ8CBkPiNYack1zV6A1dDz03rmpip7HLhm2apYM6BVYi9IuXpyVruaPj81XPOr149IEtBQEY6l7NpWrJwUC8UIp1mpl0gaYzLCA9rVNMQBlU46S56hC630kR8J/UKFZurvjRQHUk4CT0/mOeWyl4v/ed1E+bdOysI4UTQk80N+wpGKUF4D6jNBieITTTARTGdFZIgFJkqXVdIl2MtfXiWt64ptVex73cYdzFGEUziHS7DhBqpQhwY0gcAYXmAKb0ZqvBrvxsd8tGAsdo7hD4zPH1utllc=</latexit>

<latexit sha1_base64="nWRUpMZh5AGEM4jKgRsE/p2IUxM=">AAAB+3icbVC7TsMwFHV4lvIKZWSxqJBYqBIWYKvE0rFI9CG1IXJcp7XqOJF9g6ii/AoLAwix8iNs/A1OmwFajmTp6Jx7dY9PkAiuwXG+rbX1jc2t7cpOdXdv/+DQPqp1dZwqyjo0FrHqB0QzwSXrAAfB+oliJAoE6wXT28LvPTKleSzvYZYwLyJjyUNOCRjJt2vDiMAkCLNW7mdw4eYPY9+uOw1nDrxK3JLUUYm2b38NRzFNIyaBCqL1wHUS8DKigFPB8uow1SwhdErGbGCoJBHTXjbPnuMzo4xwGCvzJOC5+nsjI5HWsygwk0VSvewV4n/eIIXw2su4TFJgki4OhanAEOOiCDziilEQM0MIVdxkxXRCFKFg6qqaEtzlL6+S7mXDdRrunVNv3pR1VNAJOkXnyEVXqIlaqI06iKIn9Ixe0ZuVWy/Wu/WxGF2zyp1j9AfW5w/sLpRO</latexit>
sha1_base64="ReQlye1GGVYcWdfB5jQdE0X8GGk=">AAAB+3icbVC7TsMwFL3hWcorlJHFUCGxUCUswFaJpWOR6ENqS3Bcp7XqOJHtIKoov8LCAEKszPwDG//AR+C0HaDlSJaOzrlX9/j4MWdKO86XtbS8srq2Xtgobm5t7+zae6WmihJJaINEPJJtHyvKmaANzTSn7VhSHPqctvzRVe637qlULBI3ehzTXogHggWMYG0kzy51Q6yHfpDWMi/Vp252O/DsslNxJkCLxJ2RcvXw4/sOAOqe/dntRyQJqdCEY6U6rhPrXoqlZoTTrNhNFI0xGeEB7RgqcEhVL51kz9CxUfooiKR5QqOJ+nsjxaFS49A3k3lSNe/l4n9eJ9HBRS9lIk40FWR6KEg40hHKi0B9JinRfGwIJpKZrIgMscREm7qKpgR3/suLpHlWcZ2Ke23auIQpCnAAR3ACLpxDFWpQhwYQeIBHeIYXK7OerFfrbTq6ZM129uEPrPcfMVeWug==</latexit>
sha1_base64="muN5+vWfuQgfs0L0ix3JdMOnuYM=">AAAB+3icbVDLSsNAFJ3UV62vWJduRovgxpK4UXcFN11WsA9oY5hMJ+3QyYOZG7GE/IobF4q6de0nCIIL/8GPcNJ2odUDA4dz7uWeOV4suALL+jQKC4tLyyvF1dLa+sbmlrldbqkokZQ1aSQi2fGIYoKHrAkcBOvEkpHAE6ztjc5zv33NpOJReAnjmDkBGYTc55SAllyz3AsIDD0/rWduCkd2djVwzYpVtSbAf4k9I5Xa3utX6+PtqeGa771+RJOAhUAFUaprWzE4KZHAqWBZqZcoFhM6IgPW1TQkAVNOOsme4QOt9LEfSf1CwBP150ZKAqXGgacn86Rq3svF/7xuAv6pk/IwToCFdHrITwSGCOdF4D6XjIIYa0Ko5DorpkMiCQVdV0mXYM9/+S9pHVdtq2pf6DbO0BRFtIv20SGy0QmqoTpqoCai6Abdonv0YGTGnfFoPE9HC8ZsZwf9gvHyDTnImP8=</latexit>

Bounding Volume

Image Feature Volume Extract

Replace
Feature

Camera 0 Avg

2
<latexit sha1_base64="BNnIJUzjRpa6ynfypyV/gzTj/cY=">AAAB73icbVDLSsNAFL3xWeur6tLNYBFclaQIupKCG5cV7APaUCaTSTt0MokzN0IJ/Qk3LhRx6++482+ctllo64GBwznnMveeIJXCoOt+O2vrG5tb26Wd8u7e/sFh5ei4bZJMM95iiUx0N6CGS6F4CwVK3k01p3EgeScY3878zhPXRiTqAScp92M6VCISjKKVuvW+tNmQDipVt+bOQVaJV5AqFGgOKl/9MGFZzBUySY3peW6Kfk41Cib5tNzPDE8pG9Mh71mqaMyNn8/3nZJzq4QkSrR9Cslc/T2R09iYSRzYZExxZJa9mfif18swuvZzodIMuWKLj6JMEkzI7HgSCs0ZyokllGlhdyVsRDVlaCsq2xK85ZNXSbte89yad39ZbdwUdZTgFM7gAjy4ggbcQRNawEDCM7zCm/PovDjvzsciuuYUMyfwB87nD62nj7U=</latexit>
<latexit

Hlt 1 Surface Position

Glt Hlt
<latexit sha1_base64="pr35Pg47ycs86VY0CjJWOgdxNM8=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16WSyCF0viRb0VvPRYwX5AG8Nmu2mXbjZhdyItIX/FiwdFvPpHvPlv3LY5aOuDgcd7M8zMCxLBNTjOt1Xa2Nza3invVvb2Dw6P7ONqR8epoqxNYxGrXkA0E1yyNnAQrJcoRqJAsG4wuZv73SemNI/lA8wS5kVkJHnIKQEj+XZ1AGwKQZg1cz+DSzd/FL5dc+rOAniduAWpoQIt3/4aDGOaRkwCFUTrvusk4GVEAaeC5ZVBqllC6ISMWN9QSSKmvWxxe47PjTLEYaxMScAL9fdERiKtZ1FgOiMCY73qzcX/vH4K4Y2XcZmkwCRdLgpTgSHG8yDwkCtGQcwMIVRxcyumY6IIBRNXxYTgrr68TjpXddepu/dOrXFbxFFGp+gMXSAXXaMGaqIWaiOKpugZvaI3K7derHfrY9lasoqZE/QH1ucPHj2Ubg==</latexit>
sha1_base64="Kmn/+WXVAowQO2haK0g/yjU4nSU=">AAAB+3icbVA9SwNBEJ2LXzFqPGNpsxgEG8OdjdoFbFJGMB+QxLC32UuW7H2wOycJx/0VGwtFbP0N9nb+GzcfhSY+GHi8N8PMPC+WQqPjfFu5jc2t7Z38bmFv/6B4aB+VmjpKFOMNFslItT2quRQhb6BAydux4jTwJG9549uZ33rkSosovMdpzHsBHYbCF4yikfp2qYt8gp6f1rJ+ihdu9iD7dtmpOHOQdeIuSblKPosEAOp9+6s7iFgS8BCZpFp3XCfGXkoVCiZ5VugmmseUjemQdwwNacB1L53fnpEzowyIHylTIZK5+nsipYHW08AznQHFkV71ZuJ/XidB/7qXijBOkIdsschPJMGIzIIgA6E4Qzk1hDIlzK2EjaiiDE1cBROCu/ryOmleVlyn4t6ZNG5ggTycwCmcgwtXUIUa1KEBDCbwBC/wamXWs/VmvS9ac9Zy5hj+wPr4AeuAlcM=</latexit>
sha1_base64="TRNFoWpNnNcZA/vUMNjHcDg7ZMw=">AAAB+3icbVC7TsNAEDyHVwgvE0qaExESBUQ2DdBFokkZJPKQkmCdL+fklPPZulujRJYbPoSGAoSg5Bvo6fgbLo8CEkZaaTSzq90dPxZcg+N8W7mV1bX1jfxmYWt7Z3fP3i82dJQoyuo0EpFq+UQzwSWrAwfBWrFiJPQFa/rD64nfvGdK80jewjhm3ZD0JQ84JWAkzy52gI3AD9Jq5qVw5mZ3wrNLTtmZAi8Td05KFfy5e1p8f6h59lenF9EkZBKoIFq3XSeGbkoUcCpYVugkmsWEDkmftQ2VJGS6m05vz/CxUXo4iJQpCXiq/p5ISaj1OPRNZ0hgoBe9ifif104guOymXMYJMElni4JEYIjwJAjc44pREGNDCFXc3IrpgChCwcRVMCG4iy8vk8Z52XXK7o1J4wrNkEeH6AidIBddoAqqohqqI4pG6BE9oxcrs56sV+tt1pqz5jMH6A+sjx+oJJcQ</latexit>

Image Flt Sparse Unoccupied o6✓

GRU
<latexit sha1_base64="r6AXouozG7la/VB64DH51MNC2g4=">AAAB+3icbVDLSsNAFJ34rPUV69LNYBFclUQEXRbcuKxgH9CUMpnetEMnkzhzI5bQX3HjQhG3/og7/8Zpm4W2Hhg4nHMv98wJUykMet63s7a+sbm1Xdop7+7tHxy6R5WWSTLNockTmehOyAxIoaCJAiV0Ug0sDiW0w/HNzG8/gjYiUfc4SaEXs6ESkeAMrdR3KwkNJDwYyRTSAEeArO9WvZo3B10lfkGqpECj734Fg4RnMSjkkhnT9b0UeznTKLiEaTnIDKSMj9kQupYqFoPp5fPsU3pmlQGNEm2fjTBXf2/kLDZmEod2MmY4MsveTPzP62YYXfdyodIMQfHFoSiTFBM6K4IOhAaOcmIJ41rYrJSPmGYcbV1lW4K//OVV0rqo+V7Nv/Oq9cuijhI5IafknPjkitTJLWmQJuHkiTyTV/LmTJ0X5935WIyuOcXOMfkD5/MHvgiULg==</latexit>
sha1_base64="sVPy7EqpRaXe8EcV658xmdRtEQ8=">AAAB+3icbVDLSgNBEOyNrxhfMR69DAbBU9gVQW8GvHiMYB6QhDA76SRDZh/O9Ioh5ENyEcGDIl49+Bve/Bsnj4MmFgwUVd10TfmxkoZc99tJrayurW+kNzNb2zu7e9n9XMVEiRZYFpGKdM3nBpUMsUySFNZijTzwFVb9/tXEr96jNjIKb2kQYzPg3VB2pOBkpVY2F7GGwjujeEisQT0k3srm3YI7BVsm3pzkLz/H4ycAKLWyX412JJIAQxKKG1P33JiaQ65JCoWjTCMxGHPR512sWxryAE1zOM0+YsdWabNOpO2zEabq740hD4wZBL6dDDj1zKI3Ef/z6gl1LppDGcYJYShmhzqJYhSxSRGsLTUKUgNLuNDSZmWixzUXZOvK2BK8xS8vk8ppwXML3o2bL57BDGk4hCM4AQ/OoQjXUIIyCHiAR3iBV2fkPDtvzvtsNOXMdw7gD5yPH4Kmlvg=</latexit>
sha1_base64="6PVkAWk4kXlWeaBNnQJCkgHqm24=">AAAB+3icbVC7SgNBFJ31GeNrjaXNYBCswq4I2hmwsYxgHpANYXZyNxky+3DmrhiWfIgWNhaK2Fr4G3b+jbNJCk08MHA4517umeMnUmh0nG9raXlldW29sFHc3Nre2bX3Sg0dp4pDnccyVi2faZAigjoKlNBKFLDQl9D0h5e537wDpUUc3eAogU7I+pEIBGdopK5diqkn4VZLFiH1cADIunbZqTgT0EXizkj54vMhx2Ota395vZinIUTIJdO67ToJdjKmUHAJ46KXakgYH7I+tA2NWAi6k02yj+mRUXo0iJV5JsJE/b2RsVDrUeibyZDhQM97ufif104xOO9kIkpShIhPDwWppBjTvAjaEwo4ypEhjCthslI+YIpxNHUVTQnu/JcXSeOk4joV99opV0/JFAVyQA7JMXHJGamSK1IjdcLJPXkiL+TVGlvP1pv1Ph1dsmY7++QPrI8f3+OYvQ==</latexit>

Feature Map
{ xo >2 (✓
<latexit sha1_base64="ahCvpUPgAacmgBHpnqj1PIMSZeU=">AAAB+XicbVDLSsNAFL3xWesr6tJNsAiuSuJG3RVc6LKCfUAbw2Q6aYdOJmHmplBC/8SNC0Xc+ifu/BsnbRbaemDgcM693DMnTAXX6Lrf1tr6xubWdmWnuru3f3BoHx23dZIpylo0EYnqhkQzwSVrIUfBuqliJA4F64Tj28LvTJjSPJGPOE2ZH5Oh5BGnBI0U2HY/JjgKo/xuFuQ4exKBXXPr7hzOKvFKUoMSzcD+6g8SmsVMIhVE657npujnRCGngs2q/UyzlNAxGbKeoZLETPv5PPnMOTfKwIkSZZ5EZ67+3shJrPU0Ds1kkVMve4X4n9fLMLr2cy7TDJmki0NRJhxMnKIGZ8AVoyimhhCquMnq0BFRhKIpq2pK8Ja/vEral3XPrXsPbq1xU9ZRgVM4gwvw4AoacA9NaAGFCTzDK7xZufVivVsfi9E1q9w5gT+wPn8ADXyT4A==</latexit>
sha1_base64="UN8focMp7A4baWEvazho9Ky93Wo=">AAAB+XicbVDLSsNAFL2pr1pfUVfiZrAIrkrixroruNBlBfuANobJdNIOnUzCzKRQQv7EjQtF3PoDfoM7/8ZJ24W2Hhg4nHMv98wJEs6Udpxvq7S2vrG5Vd6u7Ozu7R/Yh0dtFaeS0BaJeSy7AVaUM0FbmmlOu4mkOAo47QTjm8LvTKhULBYPeppQL8JDwUJGsDaSb9v9COtREGa3uZ/p/JH7dtWpOTOgVeIuSLVxUv8Eg6Zvf/UHMUkjKjThWKme6yTay7DUjHCaV/qpogkmYzykPUMFjqjyslnyHJ0bZYDCWJonNJqpvzcyHCk1jQIzWeRUy14h/uf1Uh3WvYyJJNVUkPmhMOVIx6ioAQ2YpETzqSGYSGayIjLCEhNtyqqYEtzlL6+S9mXNdWruvWnjGuYowymcwQW4cAUNuIMmtIDABJ7gBV6tzHq23qz3+WjJWuwcwx9YHz/U3pUx</latexit>
sha1_base64="qpgTwFXR447uFdkZAyRz4jW0SVY=">AAAB+XicbVDLSsNAFJ2o1VpfUVfiZrAIbiyJG+tGCi50WcE+oI1hMp20QyeTMHNTKKF/4saFIm79Ab/BnX/jpO1CWw8MHM65l3vmBIngGhzn21pZXSusbxQ3S1vbO7t79v5BU8epoqxBYxGrdkA0E1yyBnAQrJ0oRqJAsFYwvMn91ogpzWP5AOOEeRHpSx5ySsBIvm13IwKDIMxuJ34Gk0fh22Wn4kyBl4k7J+XaUfWzcH0e1H37q9uLaRoxCVQQrTuuk4CXEQWcCjYpdVPNEkKHpM86hkoSMe1l0+QTfGqUHg5jZZ4EPFV/b2Qk0nocBWYyz6kXvVz8z+ukEFa9jMskBSbp7FCYCgwxzmvAPa4YBTE2hFDFTVZMB0QRCqaskinBXfzyMmleVFyn4t6bNq7QDEV0jE7QGXLRJaqhO1RHDUTRCD2hF/RqZdaz9Wa9z0ZXrPnOIfoD6+MH7V+WAw==</latexit>

<latexit sha1_base64="9uho03nn5NSjIAtT6B/B61Tu82A=">AAAB+XicbVDLSgNBEJyNrxhfqx69DAbBU9j1ot4CXnKMYB6QrMvspDcZMvtgpjcYlvyJFw+KePVPvPk3TpI9aGJBQ1HVTXdXkEqh0XG+rdLG5tb2Tnm3srd/cHhkH5+0dZIpDi2eyER1A6ZBihhaKFBCN1XAokBCJxjfzf3OBJQWSfyA0xS8iA1jEQrO0Ei+bfcRnjAI88bMz3H2KH276tScBeg6cQtSJQWavv3VHyQ8iyBGLpnWPddJ0cuZQsElzCr9TEPK+JgNoWdozCLQXr64fEYvjDKgYaJMxUgX6u+JnEVaT6PAdEYMR3rVm4v/eb0MwxsvF3GaIcR8uSjMJMWEzmOgA6GAo5wawrgS5lbKR0wxjiasignBXX15nbSvaq5Tc++dav22iKNMzsg5uSQuuSZ10iBN0iKcTMgzeSVvVm69WO/Wx7K1ZBUzp+QPrM8fOT2T/A==</latexit>
sha1_base64="yHGFJcHcfb34P0armzQxP2fehB0=">AAAB+XicbVDJSgNBEK1xjXEb9SReGoPgKcx4Md4CXnKMYBZIxqGn05M06VnorgmGIX/ixYMiXv0Bv8Gbf2NnOWjig4LHe1VU1QtSKTQ6zre1tr6xubVd2Cnu7u0fHNpHx02dZIrxBktkotoB1VyKmDdQoOTtVHEaBZK3guHt1G+NuNIiie9xnHIvov1YhIJRNJJv213kjxiEeW3i5zh5kL5dcsrODGSVuAtSqp5WPsGg7ttf3V7CsojHyCTVuuM6KXo5VSiY5JNiN9M8pWxI+7xjaEwjrr18dvmEXBilR8JEmYqRzNTfEzmNtB5HgemMKA70sjcV//M6GYYVLxdxmiGP2XxRmEmCCZnGQHpCcYZybAhlSphbCRtQRRmasIomBHf55VXSvCq7Ttm9M2ncwBwFOINzuAQXrqEKNahDAxiM4Ale4NXKrWfrzXqft65Zi5kT+APr4wcArpVN</latexit>
sha1_base64="QYgkAN2jSTC8i+EmVFK2fMjk/QY=">AAAB+XicbVDLSsNAFJ1Uq7W+oq7EzWAR3FgSN9aNFNx0WcE+oI1hMp20QyeTMHNTLKF/4saFIm79Ab/BnX/j9LHQ1gMXDufcy733BIngGhzn28qtrec3Ngtbxe2d3b19++CwqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsY3k791ogpzWN5D+OEeRHpSx5ySsBIvm13gT1CEGa1iZ/B5EH4dskpOzPgVeIuSKl6XPnM31wEdd/+6vZimkZMAhVE647rJOBlRAGngk2K3VSzhNAh6bOOoZJETHvZ7PIJPjNKD4exMiUBz9TfExmJtB5HgemMCAz0sjcV//M6KYQVL+MySYFJOl8UpgJDjKcx4B5XjIIYG0Ko4uZWTAdEEQomrKIJwV1+eZU0L8uuU3bvTBrXaI4COkGn6By56ApVUQ3VUQNRNEJP6AW9Wpn1bL1Z7/PWnLWYOUJ/YH38ABkvlh8=</latexit>

<latexit sha1_base64="gsWHSUk22Jw1YzSwbOMluahh2r0=">AAAB+XicbVDLSsNAFL3xWesr6tJNsAiuSuJG3RUEcVnBPqCNYTKdtEMnkzBzUyihf+LGhSJu/RN3/o2TNgttPTBwOOde7pkTpoJrdN1va219Y3Nru7JT3d3bPzi0j47bOskUZS2aiER1Q6KZ4JK1kKNg3VQxEoeCdcLxbeF3JkxpnshHnKbMj8lQ8ohTgkYKbLsfExyFUX43C3KcPYnArrl1dw5nlXglqUGJZmB/9QcJzWImkQqidc9zU/RzopBTwWbVfqZZSuiYDFnPUElipv18nnzmnBtl4ESJMk+iM1d/b+Qk1noah2ayyKmXvUL8z+tlGF37OZdphkzSxaEoEw4mTlGDM+CKURRTQwhV3GR16IgoQtGUVTUleMtfXiXty7rn1r0Ht9a4KeuowCmcwQV4cAUNuIcmtIDCBJ7hFd6s3Hqx3q2PxeiaVe6cwB9Ynz8L8ZPf</latexit>
sha1_base64="zscCZLYBc9eSGjwF9FNmqCkZ8s0=">AAAB+XicbVDLSsNAFL2pr1pfUVfiZrAIrkrixrorCOKygn1AG8NkOmmHTiZhZlIoIX/ixoUibv0Bv8Gdf+Ok7UJbDwwczrmXe+YECWdKO863VVpb39jcKm9Xdnb39g/sw6O2ilNJaIvEPJbdACvKmaAtzTSn3URSHAWcdoLxTeF3JlQqFosHPU2oF+GhYCEjWBvJt+1+hPUoCLPb3M90/sh9u+rUnBnQKnEXpNo4qX+CQdO3v/qDmKQRFZpwrFTPdRLtZVhqRjjNK/1U0QSTMR7SnqECR1R52Sx5js6NMkBhLM0TGs3U3xsZjpSaRoGZLHKqZa8Q//N6qQ7rXsZEkmoqyPxQmHKkY1TUgAZMUqL51BBMJDNZERlhiYk2ZVVMCe7yl1dJ+7LmOjX33rRxDXOU4RTO4AJcuIIG3EETWkBgAk/wAq9WZj1bb9b7fLRkLXaO4Q+sjx/TU5Uw</latexit>
sha1_base64="9K78WYfZ+Ney4obANaRDivOGP/A=">AAAB+XicbVDLSsNAFJ2o1VpfUVfiZrAIbiyJG+tGCoK4rGAf0MYwmU7aoZNJmLkplNA/ceNCEbf+gN/gzr9x0nahrQcGDufcyz1zgkRwDY7zba2srhXWN4qbpa3tnd09e/+gqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsY3uR+a8SU5rF8gHHCvIj0JQ85JWAk37a7EYFBEGa3Ez+DyaPw7bJTcabAy8Sdk3LtqPpZuD4P6r791e3FNI2YBCqI1h3XScDLiAJOBZuUuqlmCaFD0mcdQyWJmPayafIJPjVKD4exMk8Cnqq/NzISaT2OAjOZ59SLXi7+53VSCKtexmWSApN0dihMBYYY5zXgHleMghgbQqjiJiumA6IIBVNWyZTgLn55mTQvKq5Tce9NG1dohiI6RifoDLnoEtXQHaqjBqJohJ7QC3q1MuvZerPeZ6Mr1nznEP2B9fED69SWAg==</latexit>

Conv SDF
<latexit sha1_base64="RdHVnulM9Oz9vKpg4DDSQOKtqUQ=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoCcJePEYwTwwWcLsZDYZMjuzzPQKIeQvvHhQxKt/482/cZLsQRMLGoqqbrq7olQKi77/7RXW1jc2t4rbpZ3dvf2D8uFR0+rMMN5gWmrTjqjlUijeQIGSt1PDaRJJ3opGtzO/9cSNFVo94DjlYUIHSsSCUXTSoyY3pItDjrRXrvhVfw6ySoKcVCBHvVf+6vY1yxKukElqbSfwUwwn1KBgkk9L3czylLIRHfCOo4om3IaT+cVTcuaUPom1caWQzNXfExOaWDtOIteZUBzaZW8m/ud1Moyvw4lQaYZcscWiOJMENZm9T/rCcIZy7AhlRrhbCRtSQxm6kEouhGD55VXSvKgGfjW49yu1yzyOIpzAKZxDAFdQgzuoQwMYKHiGV3jzrPfivXsfi9aCl88cwx94nz+g8JAr</latexit>
sha1_base64="j/E3UN3AXgMV3cYKoKyO5zVn9Vc=">AAAB8XicbZDLSgMxFIbP1Futt6pLN8EiuCozItiVFty4rGAv2A4lk2ba0ExmSM4IZehbuHGhiFvfxp3vIPgKppeFtv4Q+Pj/c8g5J0ikMOi6n05uZXVtfSO/Wdja3tndK+4fNEycasbrLJaxbgXUcCkUr6NAyVuJ5jQKJG8Gw+tJ3nzg2ohY3eEo4X5E+0qEglG01n1MLkkHBxxpt1hyy+5UZBm8OZSuvirfVQCodYsfnV7M0ogrZJIa0/bcBP2MahRM8nGhkxqeUDakfd62qGjEjZ9NJx6TE+v0SBhr+xSSqfu7I6ORMaMosJURxYFZzCbmf1k7xbDiZ0IlKXLFZh+FqSQYk8n6pCc0ZyhHFijTws5K2IBqytAeqWCP4C2uvAyNs7Lnlr1bt1Q9h5nycATHcAoeXEAVbqAGdWCg4BGe4cUxzpPz6rzNSnPOvOcQ/sh5/wEae5K9</latexit>
sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit>
sha1_base64="fQQUssI7htwtqLGnRJG0PuZ0aRU=">AAAB5nicbZDNSgMxFIXv1L9aq1a3boJFcFVm3OhKBDcuK9hWbIeSSe+0oZnMkNwRytC3cONCER/JnW9j+rPQ1gOBj3MScu+JMiUt+f63V9rY3NreKe9W9qr7B4e1o2rbprkR2BKpSs1jxC0qqbFFkhQ+ZgZ5EinsROPbWd55RmNlqh9okmGY8KGWsRScnPWUsmvWoxES79fqfsOfi61DsIQ6LNXs1756g1TkCWoSilvbDfyMwoIbkkLhtNLLLWZcjPkQuw41T9CGxXziKTtzzoDFqXFHE5u7v18UPLF2kkTuZsJpZFezmflf1s0pvgoLqbOcUIvFR3GuGKVstj4bSIOC1MQBF0a6WZkYccMFuZIqroRgdeV1aF80Ar8R3PtQhhM4hXMI4BJu4A6a0AIBGl7gDd496716H4u6St6yt2P4I+/zB2+TjuE=</latexit>
sha1_base64="TOMUc7rB9VTaUI6ZiRtRphENRz4=">AAAB8XicbZDLSgMxFIbP1Futt6pLN8EiuCozItiVFty4rGAv2A4lk6ZtaCYzJGeEMvQt3LhQ1K1v40LwHQRfwfSy0NYfAh//fw455wSxFAZd99PJLC2vrK5l13Mbm1vbO/ndvZqJEs14lUUy0o2AGi6F4lUUKHkj1pyGgeT1YHA5zut3XBsRqRscxtwPaU+JrmAUrXUbkXPSwj5H2s4X3KI7EVkEbwaFi6/S9+nHS1Bp599bnYglIVfIJDWm6bkx+inVKJjko1wrMTymbEB7vGlR0ZAbP51MPCJH1umQbqTtU0gm7u+OlIbGDMPAVoYU+2Y+G5v/Zc0EuyU/FSpOkCs2/aibSIIRGa9POkJzhnJogTIt7KyE9ammDO2RcvYI3vzKi1A7KXpu0bt2C+VTmCoLB3AIx+DBGZThCipQBQYK7uERnhzjPDjPzuu0NOPMevbhj5y3H4LTlIo=</latexit>

Avg Average
<latexit sha1_base64="V0LyWXF6dvrzWkP3UxjI34gwPaU=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoMeCF49VTFtoQ9lsJ+3SzSbsboQS+g+8eFDEq//Im//GbZuDtj4YeLw3w8y8MBVcG9f9dkobm1vbO+Xdyt7+weFR9fikrZNMMfRZIhLVDalGwSX6hhuB3VQhjUOBnXByO/c7T6g0T+SjmaYYxHQkecQZNVZ66OeDas2tuwuQdeIVpAYFWoPqV3+YsCxGaZigWvc8NzVBTpXhTOCs0s80ppRN6Ah7lkoaow7yxaUzcmGVIYkSZUsaslB/T+Q01noah7YzpmasV725+J/Xy0x0E+RcpplByZaLokwQk5D522TIFTIjppZQpri9lbAxVZQZG07FhuCtvrxO2ld1z617941as1HEUYYzOIdL8OAamnAHLfCBQQTP8ApvzsR5cd6dj2VrySlmTuEPnM8fl06NVw==</latexit>

<latexit sha1_base64="5pJYQ1FxONN6fEPb7SC/lKU3ins=">AAAB9HicbVBNSwMxEJ2tX7V+VT16CRahgpaNFPRY8OKxgv2AdinZNNuGZrNrki2Wpb/DiwdFvPpjvPlvTNs9aOuDgcd7M8zM82PBtXHdbye3tr6xuZXfLuzs7u0fFA+PmjpKFGUNGolItX2imeCSNQw3grVjxUjoC9byR7czvzVmSvNIPphJzLyQDCQPOCXGSt4T6nKJypf4AuHzXrHkVtw50CrBGSlBhnqv+NXtRzQJmTRUEK072I2NlxJlOBVsWugmmsWEjsiAdSyVJGTaS+dHT9GZVfooiJQtadBc/T2RklDrSejbzpCYoV72ZuJ/XicxwY2Xchknhkm6WBQkApkIzRJAfa4YNWJiCaGK21sRHRJFqLE5FWwIePnlVdK8qmC3gu+rpVo1iyMPJ3AKZcBwDTW4gzo0gMIjPMMrvDlj58V5dz4WrTknmzmGP3A+fwCssJAF</latexit>
1, 1)
Camera 1 Input: Image Output: Updated <latexit sha1_base64="oUEKJz6q1t6aHWXfbPmUmQLxGVE=">AAAB7nicbVDLSsNAFL2pr1pfVZduBovgqiQiVHcFNy4r2Ae0oUwmk3boZBJmboQS+hFuXCji1u9x5984bbPQ1gMDh3POZe49QSqFQdf9dkobm1vbO+Xdyt7+weFR9fikY5JMM95miUx0L6CGS6F4GwVK3ks1p3EgeTeY3M397hPXRiTqEacp92M6UiISjKKVugNpoyEdVmtu3V2ArBOvIDUo0BpWvwZhwrKYK2SSGtP33BT9nGoUTPJZZZAZnlI2oSPet1TRmBs/X6w7IxdWCUmUaPsUkoX6eyKnsTHTOLDJmOLYrHpz8T+vn2F04+dCpRlyxZYfRZkkmJD57SQUmjOUU0so08LuStiYasrQNlSxJXirJ6+TzlXdc+vew3WteVvUUYYzOIdL8KABTbiHFrSBwQSe4RXenNR5cd6dj2W05BQzp/AHzucPOhCPdA==</latexit>

Truncation Distance
Feature Volume Hidden State

i. Unprojection ii. GRU Fusion iii. Sparse TSDF Representation Slt <latexit sha1_base64="ePLbwnVW87J9FXIxverajVcU7oI=">AAAB+XicbVDLSsNAFL3xWesr6tJNsAiuSuJG3RXcuKxoH9DGMJlO2qGTSZi5KZTQP3HjQhG3/ok7/8ZJm4W2Hhg4nHMv98wJU8E1uu63tba+sbm1Xdmp7u7tHxzaR8dtnWSKshZNRKK6IdFMcMlayFGwbqoYiUPBOuH4tvA7E6Y0T+QjTlPmx2QoecQpQSMFtt2PCY7CKH+YBTnOnkRg19y6O4ezSryS1KBEM7C/+oOEZjGTSAXRuue5Kfo5UcipYLNqP9MsJXRMhqxnqCQx034+Tz5zzo0ycKJEmSfRmau/N3ISaz2NQzNZ5NTLXiH+5/UyjK79nMs0Qybp4lCUCQcTp6jBGXDFKIqpIYQqbrI6dEQUoWjKqpoSvOUvr5L2Zd1z6969W2vclHVU4BTO4AI8uIIG3EETWkBhAs/wCm9Wbr1Y79bHYnTNKndO4A+szx8gAJPs</latexit>
sha1_base64="sOkuONVNRUV0ZBOU2geUVVInglk=">AAAB+XicbVDLSsNAFL2pr1pfUVfiZrAIrkrixroruHFZ0T6gjWEynbRDJ5MwMymUkD9x40IRt/6A3+DOv3HSdqGtBwYO59zLPXOChDOlHefbKq2tb2xulbcrO7t7+wf24VFbxakktEViHstugBXlTNCWZprTbiIpjgJOO8H4pvA7EyoVi8WDnibUi/BQsJARrI3k23Y/wnoUhNl97mc6f+S+XXVqzgxolbgLUm2c1D/BoOnbX/1BTNKICk04VqrnOon2Miw1I5zmlX6qaILJGA9pz1CBI6q8bJY8R+dGGaAwluYJjWbq740MR0pNo8BMFjnVsleI/3m9VId1L2MiSTUVZH4oTDnSMSpqQAMmKdF8aggmkpmsiIywxESbsiqmBHf5y6ukfVlznZp7Z9q4hjnKcApncAEuXEEDbqEJLSAwgSd4gVcrs56tN+t9PlqyFjvH8AfWxw/nYpU9</latexit>
sha1_base64="596seo/vfBMFhravzSeO86NIDjQ=">AAAB+XicbVDLSsNAFJ2o1VpfUVfiZrAIbiyJG+tGCm5cVrQPaGOYTCft0MkkzNwUSuifuHGhiFt/wG9w5984abvQ1gMDh3Pu5Z45QSK4Bsf5tlZW1wrrG8XN0tb2zu6evX/Q1HGqKGvQWMSqHRDNBJesARwEayeKkSgQrBUMb3K/NWJK81g+wDhhXkT6koecEjCSb9vdiMAgCLP7iZ/B5FH4dtmpOFPgZeLOSbl2VP0sXJ8Hdd/+6vZimkZMAhVE647rJOBlRAGngk1K3VSzhNAh6bOOoZJETHvZNPkEnxqlh8NYmScBT9XfGxmJtB5HgZnMc+pFLxf/8zophFUv4zJJgUk6OxSmAkOM8xpwjytGQYwNIVRxkxXTAVGEgimrZEpwF7+8TJoXFdepuHemjSs0QxEdoxN0hlx0iWroFtVRA1E0Qk/oBb1amfVsvVnvs9EVa75ziP7A+vgB/+OWDw==</latexit>

Figure 3. 2D toy examples to illustrate the unprojection, GRU fusion and sparse TSDF representation. In figure i and ii, the colored
grids mean different features. In figure iii, the colored grids mean different TSDF values. Best viewed in color.

obtained by averaging the features from different views ac- tion layers to extract 3D geometric features Glt . The hidden
cording to the visibility weight of each voxel. The visibil- state Hlt−1 is extracted from the global hidden state Hgt−1
ity weight is defined as the number of views from which a within the fragment bounding volume. GRU fuses Glt with
voxel can be observed in the local fragment. A visualization hidden state Hlt−1 and produces the updated hidden state
of this unprojection process can be found in Fig.3 i. Hlt , which will be passed through the MLP layers to predict
Coarse-to-fine TSDF Reconstruction. We adopt a coarse- the TSDF volume Slt at this level. The hidden state Hlt will
to-fine approach to gradually refine the predicted TSDF vol- also be updated to global hidden state Hgt by directly replac-
ume at each level. We use 3D sparse convolution to effi- ing the corresponding voxels. Formally, denoting zt as the
ciently process the feature volume Flt . The sparse volumet- update gate, rt as the reset gate, σ as the sigmoid function
ric representation also naturally integrates with the coarse- and W∗ as the weight for sparse convolution, GRU fuses Glt
to-fine design. Specifically, each voxel in the TSDF volume with hidden state Hlt−1 with the following operations:
Slt contains two values, the occupancy score o and the SDF
value x. At each level, both o and x are predicted by the zt = σ(SparseConv([Hlt−1 , Glt ], Wz ))
MLP. The occupancy score represents the confidence of a rt = σ(SparseConv([Hlt−1 , Glt ], Wr ))
voxel being within the TSDF truncation distance λ. The
H̃lt = tanh(SparseConv([rt Hlt−1 , Glt ], Wh ))
voxel whose occupancy score is lower than the sparsifica-
tion threshold θ is defined as void space and will be sparsi- Hlt = (1 − zt ) Hlt−1 + zt H̃lt
fied. This representation of sparse TSDF volume is visually
illustrated in Fig.3 iii. After the sparsification, Slt is upsam- Intuitively, in the context of joint reconstruction and fu-
pled by 2× and concatenated with the Fl+1 t as the input for sion of TSDF, the update gate zt and forget gate rt in
the GRU Fusion module (introduced later) in the next level. the GRU determine how much information from the pre-
Instead of estimating single-view depth maps for each vious reconstructions (i.e. hidden state Hlt−1 ) is fused to
key frame, NeuralRecon jointly reconstructs the implicit the current-fragment geometric feature Glt , as well as how
surface within the bounding volume of the local fragment much information from the current-fragment will be fused
window. This design guides the network to learn the natu- into the hidden state Hlt . As a data-driven approach, the
ral surface prior directly from the training data. As a result, GRU serves as a selective attention mechanism that replaces
the reconstructed surface is locally smooth and coherent in the linear running-average operation in conventional TSDF
scale. Notably, this design also leads to less redundant com- fusion [31]. By predicting Slt after the GRU, the MLP
putation compared to depth-based methods since each area network can leverage the context information accumulated
on the 3D surface is estimated only once during the frag- from history fragments to produce consistent surface geom-
ment reconstruction. etry across local fragments. This is also conceptually anal-
GRU Fusion. To make the reconstruction consistent be- ogous to the depth filter in a non-learning-based 3D recon-
tween fragments, we propose to make the current-fragment struction pipeline [38, 34], where the current observation
reconstruction to be conditioned on the reconstructions in and the temporally-fused depths are fused with the Bayesian
previous fragments. We use a 3D convolutional variant of filter. The effectiveness of joint reconstruction and fusion is
Gated Recurrent Unit (GRU) [6] module for this purpose. validated in the ablation study.
As illustrated in Fig.3 ii, at each level the image feature Integration to the Global TSDF Volume. At the last
volume Flt is first passed through the 3D sparse convolu- coarse-to-fine level, S3t is predicted and further sparsified

4
to Slt . Since the fusion between Slt and Sgt has been done in these 3D and 2D metrics, we consider F-score as the most
GRU Fusion, Slt is integrated into Sgt by directly replacing suitable metrics to measure 3D reconstruction quality since
the corresponding voxels after being transformed into the both the accuracy and completeness of the reconstruction
global coordinate. At each time step t, Marching Cubes is are considered.
performed on Sgt to reconstruct the mesh. Baselines. We compare our method with the following
Supervision. Following [9], two loss functions are used baseline methods in three categories: 1) Real-time meth-
to supervise the network. The occupancy loss is defined ods for multi-view depth estimation [48, 13, 24, 26]. Due
as the binary cross-entropy (BCE) between the predicted to the efficiency constraints, the estimated depth accuracy
occupancy values and the ground-truth occupancy values. by these methods is rather limited. We compare with these
The SDF loss is defined as the `1 distance between the pre- methods to demonstrate the better reconstruction accuracy
dicted SDF values and the ground-truth SDF values. We of NeuralRecon given the same efficiency. 2) Multiple View
log-transform the SDF values of predictions and ground- Stereo methods [37, 14, 53, 30, 28]. These offline methods
truth before applying the `1 loss. The supervision is applied have much higher accuracy compared to real-time methods.
to all the coarse-to-fine levels. These baselines are used to demonstrate that NeuralRecon
achieves a reconstruction quality on-par with offline meth-
3.3. Implementation Details
ods but runs in real-time. 3) Learning-based SLAM meth-
We use torchsparse [43] as the implementation of 3D ods [45, 42, 44]. These monocular SLAM methods estimate
sparse convolution. The image backbone is a variant of camera poses and perform reconstruction simultaneously,
MnasNet [41] and is initialized with the weights pretrained thus the scale factor of pose and depth is usually not ac-
from ImageNet. Feature Pyramid Network [23] is used in curately estimated. For a fair comparison, we use ground-
the backbone to extract more representative multi-level fea- truth camera poses for these methods and apply a scaling
tures. The entire network is trained end-to-end with ran- factor to the predicted depth map using ground-truth depth.
domly initialized weights except for the image backbone. Among all these baseline methods, GPMVS [13] and At-
The occupancy score o is predicted with a Sigmoid layer. las [30] are the most relevant real-time and offline methods,
The voxel size of the last level is 4cm and the TSDF trun- respectively.
cation distance λ is set to 12cm. dmax is set to 3m. Rmax Evaluation Protocols. Since our method does not estimate
and tmax are set to 15°and 0.1m respectively. θ is set to
depth maps explicitly, we render the reconstructed mesh to
0.5. Nearest-neighbor interpolation is used in the upsam-
the image plane and obtain depth map estimations [30]. Key
pling between coarse-to-fine levels.
frames used for evaluation are sampled from the video se-
quence with an interval of 10 frames for both depth-based
4. Experiments
methods and Atlas. Following [30, 26], [53, 48, 14, 13] are
In this section, we conduct a series of experiments to fine-tuned on ScanNet. To evaluate depth-based methods
evaluate the reconstruction quality and different design con- [37, 48, 13, 14] in 3D, we use the point cloud fusion to ob-
siderations of NeuralRecon. tain the 3D reconstruction following Atlas. For other depth-
based methods, we use the standard TSDF fusion proposed
4.1. Datasets, Metrics, Baselines and Protocols. in [31, 7]. For the reasons we detailed in the supplementary
material, in order to make a fair comparison with Atlas, we
Datasets. We perform the experiments on two indoor
also report the evaluation results using the double-layered
datasets, ScanNet (V2) [8] and 7-Scenes [39]. The ScanNet
mesh (same as Atlas). The evaluation of 3D geometry on 7-
dataset contains 1613 indoor scenes with ground-truth cam-
Scenes uses the single-layered mesh. We also evaluate the
era poses, surface reconstructions, and semantic segmenta-
depth filtering operation with multi-view consistency check,
tion labels. There are two training/validation splits com-
which will be elaborated in the supplementary material.
monly used in previous works (defined in [30] and [42]) for
the ScanNet dataset. We use the same training and valida-
4.2. Evaluation Results
tion data with the corresponding baseline methods to make
a fair comparison. The 7-Scenes dataset is another chal- ScanNet. 2D depth metrics and 3D geometry metrics are
lenging RGB-D dataset captured in indoor scenes. Follow- used on the ScanNet dataset. The 3D geometry evalua-
ing the baseline method [26], we use the model trained on tion results are shown in Tab. 1. Our method produces
ScanNet to perform the validation on 7-Scenes. much better performance than recent learning-based meth-
Metrics. The 3D reconstruction quality is evaluated using ods and achieves slightly better results than COLMAP. We
3D geometry metrics presented in [30], as well as standard believe that the improvements come from the joint recon-
2D depth metrics defined in [11]. The definitions of these struction and fusion design achieved by the GRU Fusion
metrics are detailed in the supplementary material. Among module. Compared to depth-based methods, NeuralRecon

5
Method Layer Comp ↓ Acc ↓ Recall ↑ Prec ↑ F-score ↑ Time (ms) ↓
MVDepthNet [48] single 0.040 0.240 0.831 0.208 0.329 48
GPMVS [13] single 0.031 0.879 0.871 0.188 0.304 51
DPSNet [14] single 0.045 0.284 0.793 0.223 0.344 322
COLMAP [37] single 0.069 0.135 0.634 0.505 0.558 2076
Ours single 0.128 0.054 0.479 0.684 0.562 30
Atlas [30] double 0.062 0.128 0.732 0.382 0.499 292
Ours double 0.106 0.073 0.609 0.450 0.516 30
DeepV2D [44] single 0.057 0.239 0.646 0.329 0.431 347
Consistent Depth [28] single 0.091 0.344 0.461 0.266 0.331 2321
Ours single 0.120 0.062 0.428 0.592 0.494 30

Table 1. 3D geometry metrics on ScanNet. We use two different training/validation splits following Atlas [30] (top block) and BA-Net
[42] (bottom block). We elaborate the meaning of the single and double layer in the supplementary material.

Method Abs Rel ↓ Abs Diff ↓ Sq Rel ↓ RMSE ↓ δ < 1.25 ↑ Comp ↑
COLMAP [37] 0.137 0.264 0.138 0.502 83.4 0.871
MVDepthNet [48] 0.098 0.191 0.061 0.293 89.6 0.928
GPMVS [13] 0.130 0.239 0.339 0.472 90.6 0.928
DPSNet [14] 0.087 0.158 0.035 0.232 92.5 0.928
Atlas [30] 0.065 0.123 0.045 0.251 93.6 0.999
Ours 0.065 0.106 0.031 0.195 94.8 0.909
Method Abs Rel ↓ Sq Rel ↓ RMSE ↓ RMSE log ↓ Sc Inv ↓ -
DeMoN [45] 0.231 0.520 0.761 0.289 0.284 -
BA-Net [42] 0.161 0.092 0.346 0.214 0.184 -
DeepV2D [44] 0.057 0.010 0.168 0.080 0.077 -
Consistent Depth [28] 0.073 0.037 0.217 0.105 0.103 -
Ours 0.047 0.024 0.164 0.093 0.092 -
Table 2. 2D depth metrics on ScanNet. We use two different training/validation splits following Atlas [30] (top block) and BA-Net [42]
(bottom block).

can produce coherent reconstructions both locally and glob- structure information as in CNMNet. Since the model used
ally. Our method also surpasses the volumetric baseline here is only trained on ScanNet, the results also demonstrate
method Atlas [30] on the accuracy, precision, and F-score. that NeuralRecon can generalize well beyond the domain of
The improvements potentially come from the design of lo- the training data.
cal fragment separation in our method, which can act as a
Efficiency. We also report the average running time of the
view-selection mechanism that avoids irrelevant image fea-
baselines and our method in Tab. 1. Only the inference time
tures to be fused into the 3D volume. In terms of complete-
on key frames is computed. A detailed timing analysis for
ness and recall, the proposed method has an inferior perfor-
each module of NeuralRecon is presented in Table 4. For
mance compared to both depth-based methods and Atlas.
volumetric methods (Atlas and ours), the running time is
Since depth-based methods predict pixel-wise depth maps
obtained by dividing the time of reconstructing the TSDF
on each view, the coverage of their predictions is high by
volume of a local fragment by the number of key frames in
nature, but with the cost of accuracy. Being an offline ap-
the local fragment. Notice that the time for TSDF fusion
proach, Atlas has the advantage of having a global context
is not included for depth-based methods. The running time
from the entire sequence before predicting the geometry. As
for [44, 28, 24, 26, 45] and NeuralRecon is measured on an
a result, Atlas sometimes achieves even better completeness
NVIDIA RTX 2080Ti GPU. We use running time reported
compared to the ground-truth due to its TSDF completion
in [30] and [55] for [48, 14, 37, 13, 30] and [53], respec-
capability. However, Atlas tends to predict over-smoothed
tively.
geometries, and the completed regions may be inaccurate.
As shown in Tab. 1, our time cost is 30ms per key
As for 2D depth metrics, NeuralRecon also outperforms
frame, achieving real-time speed at 33 key frames per sec-
previous state-of-the-art methods for almost all 2D depth
ond and outperforming all previous methods. Specifically,
metrics, as shown in Tab. 2.
our method runs ∼10× faster than Atlas, and 77× faster
7-Scenes. 2D depth metrics and 3D geometry metrics are than Consistent Depth. Predicting the volumetric represen-
evaluated on the 7-Scenes dataset. As shown in Tab. 3, tation removes the redundant computation in depth-based
our method achieves comparable performance to the state- methods, which contributes to the fast running speed of
of-the-art method CNMNet [26] and outperforms all other our method. Compared to Atlas, incrementally reconstruct-
methods. We believe that the accuracy of the proposed ing geometry in local fragment avoids processing a huge
method can be further improved by leveraging the planar 3D volume, leading to a faster speed than Atlas. The use

6
Method Comp ↓ Acc ↓ Recall ↑ Prec ↑ F-score ↑ Img. Enc. Unproj. Sparse Conv. GRU Total
DeepV2D [44] 0.180 0.518 0.175 0.087 0.115 Level 1 1.27 3.70 2.18
CNMNet [26] 0.150 0.398 0.246 0.111 0.149
Ours 0.228 0.100 0.227 0.389 0.282
4.03 Level 2 1.21 3.84 2.24 29.56
Method δ < 1.25 ↑ Abs Rel ↓ Sq Rel ↓ RMSE ↓ Time ↓ Level 3 2.18 5.11 3.80
DeMoN [45] 31.88 0.3888 0.4198 0.8549 110
MVSNet [53] 64.09 0.2339 0.1904 0.5078 1050 Table 4. Timing analysis of NeuralRecon measured in millisec-
N-RGBD [24] 69.26 0.1758 0.1123 0.4408 202 onds per key frame. The level number indicates the different
MVDNet [48] 71.79 0.1925 0.2350 0.4585 48 coarse-to-fine level. Img. Enc. stands for image encoder, Unproj.
DPSNet [14] 70.96 0.1991 0.1420 0.4382 322
DeepV2D [44] 42.80 0.4370 0.5530 0.8690 347 stands for unprojection.
CNMNet [26] 76.64 0.1612 0.0832 0.3614 80
Ours 82.00 0.1550 0.1040 0.3470 30 Fusion 3D Geometry Metrics
#views
Area Method Recall Prec F-score
Table 3. 3D geometry metrics (top block) and 2D depth metrics i 5 OCC Linear 0.576 0.386 0.462
(bottom block) on 7-Scenes. Time is measured in milliseconds. ii 5 OCC Avg 0.535 0.432 0.478
iii 5 OCC GRU 0.572 0.426 0.488
iv 5 FBV GRU 0.613 0.421 0.494
of sparse convolution also contributes to the superior effi- - 7 FBV GRU 0.607 0.435 0.507
ciency of NeuralRecon. v 9 FBV GRU 0.609 0.450 0.516
- 11 FBV GRU 0.593 0.398 0.474
4.3. Ablation Study
Table 5. Ablation study. We report 3D geometry metrics on Scan-
In this section, we conduct several ablation experiments
Net. OCC: fuse 3D geometric features Glt within the occupied
on the ScanNet dataset to discuss the effectiveness of com- area where occupancy score o > θ. FBV: fuse 3D geometric fea-
ponents in our method. tures Glt within the fragment bounding volume. Linear: remove
GRU Fusion. We validate the GRU Fusion design by com- GRU-Fusion and use the conventional running-average-based lin-
paring rows from (i) to (iv) in Tab. 5. ear TSDF fusion to update the global TSDF volume. Avg: fuse 3D
geometric features Glt with the average operation. GRU: fuse 3D
To validate the benefit of feature fusion, we compare row
geometric features Glt with GRU. We use row (v) in all other ex-
(i) and row (ii) in Tab. 5. Using feature fusion with the av- periments. More details about ablation experiments can be found
erage operation obtains nearly 5% improvement for the pre- in the supplementary material.
cision metric than conventional linear TSDF fusion. Visual-
ization in Fig. 5 shows that feature fusion with the average
operation can reconstruct smoother geometry. These results Qualitative Results. We provide the qualitative results and
demonstrate that feature fusion can be more effective than the corresponding analysis in Fig. 4.
TSDF fusion using the same average operation.
5. Conclusion
Comparing row (ii) and row (iii) in Tab. 5 shows that
replacing average operation with GRU gives 4% improve- In this paper, we introduced a novel system NeuralRecon
ment in terms of recall. The mesh in Fig. 5 (iii) is also more for real-time 3D reconstruction with monocular video. The
complete than that in Fig. 5 (ii). These results demonstrate key idea is to jointly reconstruct and fuse sparse TSDF vol-
that the GRU is more effective to selectively integrate only umes for each video fragment incrementally by 3D sparse
the consistent information from the current-fragment to the convolutions and GRU. This design enables NeuralRecon
hidden state. to output accurate and coherent reconstruction in real-time.
The recalls in row (iii) and row (iv) in Tab. 5 show that Experiments show that NeuralRecon outperforms state-of-
fusion in the fragment bounding volume can produce much the-art methods in both reconstruction quality and running
more complete results. Visualization results in Fig. 5 (iii) speed. The sparse TSDF volume reconstructed by Neural-
and (iv) show that, with fusion in the fragment bounding Recon can be directly used in downstream tasks like 3D
volume, our method produces fewer artifacts on the ground. object detection, 3D semantic segmentation and neural ren-
Fusion in the fragment bounding volume can leverage the dering. We believe that, by jointly training with the down-
context information in boundaries and produce more con- stream tasks end-to-end, NeuralRecon enables new possi-
sistent and complete surface estimation. bilities in learning-based multi-view perception and recog-
nition systems.
Number of views. We set 5, 7, 9 and 11 views as the
length of a fragment respectively. As shown in row (v) in Acknowledgement. The authors would like to acknowl-
Tab. 5, the F-score has over 2% improvement when 9 views edge the support from the National Key Research and De-
are used as a fragment. As shown in visualization results in velopment Program of China (No. 2020AAA0108901),
Fig. 5 (v), with more views in a fragment, the geometry can NSFC (No. 61806176), and ZJU-SenseTime Joint Lab of
be reconstructed more accurately compared to Fig. 5 (iv). 3D Vision.

7
COLMAP CNMNet DeepV2D Ground Truth

Consistent Depth Atlas Ours

COLMAP CNMNet DeepV2D

Consistent Depth Atlas Ours

COLMAP CNMNet DeepV2D

Consistent Depth Atlas Ours

Figure 4. Qualitative results on ScanNet. Compared to depth-based methods, NeuralRecon can produce much more coherent recon-
struction results. Notice that our method also recovers sharper geometry compared to Atlas [30], which illustrates the effectiveness of
the local fragment design in our method. Reconstructing only within the local fragment window avoids irrelevant image features from
far-away camera views to be fused into the 3D volume. The color indicates surface normal. More qualitative results can be found in the
supplementary material and the project webpage. Zoom in for details.

i ii iii iv v Ground Truth

Figure 5. Ablation study. The indications of Roman numerals are in Tab. 5. The analysis is presented in Sec. 4.3.

8
References [18] Abhishek Kar, Christian Häne, and Jitendra Malik. Learning
a Multi-View Stereo Machine. In NeurIPS, 2017. 2, 3.2
[1] Augmented Reality with ARKit- Apple Developer. 1
[19] Michael Kazhdan and Hugues Hoppe. Screened Poisson Sur-
[2] Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis,
face Reconstruction. ACM TOG, 2013. 2
Engin Tola, and Anders Bjorholm Dahl. Large-Scale Data
for Multiple-View Stereopsis. IJCV, 2016. 2 [20] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen
Koltun. Tanks and Temples: Benchmarking Large-Scale
[3] Carlos Campos, Richard Elvira, Juan J. Gómez Rodrı́guez,
Scene Reconstruction. ACM TOG, 2017. 2
José M. M. Montiel, and Juan D. Tardós. ORB-SLAM3:
An Accurate Open-Source Library for Visual, Visual-Inertial [21] Kalin Kolev, Petri Tanskanen, Pablo Speciale, and Marc
and Multi-Map SLAM. ArXiv, 2020. 1 Pollefeys. Turning Mobile Phones into 3D Scanners. In
[4] Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran CVPR, 2014. 2
Li, Ravi Ramamoorthi, and Hao Su. Deep stereo using adap- [22] P. Labatut, J.-P. Pons, and R. Keriven. Robust and Efficient
tive thin volume representation with uncertainty awareness. Surface Reconstruction From Range Data. Computer Graph-
In CVPR, 2020. 2 ics Forum, 2009. 2
[5] Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin [23] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
Chen, and Silvio Savarese. 3D-R2N2: A unified approach Bharath Hariharan, and Serge Belongie. Feature pyramid
for single and multi-view 3D object reconstruction. In networks for object detection. In CVPR, 2017. 3.3
ECCV, 2016. 2 [24] Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa G Narasimhan,
[6] Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and and Jan Kautz. Neural RGB-> D Sensing: Depth and uncer-
Yoshua Bengio. Empirical evaluation of gated recurrent neu- tainty from a video camera. In CVPR, 2019. 1, 2, 4.1, 4.2
ral networks on sequence modeling. In NeurIPS 2014 Work- [25] Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and
shop on Deep Learning, 2014. 3.2 Christian Theobalt. Neural Sparse Voxel Fields. In NeurIPS,
[7] Brian Curless and Marc Levoy. A Volumetric Method for 2020. 2
Building Complex Models from Range Images. In SIG- [26] Xiaoxiao Long, Lingjie Liu, Christian Theobalt, and Wen-
GRAPH, 1996. 2, 4.1 ping Wang. Occlusion-Aware Depth Estimation with Adap-
[8] Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- tive Normal Constraints. In ECCV, 2020. 2, 4.1, 4.2
ber, Thomas Funkhouser, and Matthias Nießner. ScanNet: [27] William E. Lorensen and Harvey E. Cline. Marching Cubes:
Richly-annotated 3D Reconstructions of Indoor Scenes. In A High Resolution 3D Surface Construction Algorithm.
CVPR, 2017. 4.1 SIGGRAPH, 1987. 1
[9] Angela Dai, Christian Diller, and Matthias Nießner. SG- [28] Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen,
NN: Sparse Generative Neural Networks for Self-Supervised and Johannes Kopf. Consistent Video Depth Estimation.
Scene Completion of RGB-D Scans. In CVPR, 2020. 3.2 ACM TOG, 2020. 4.1, 4.1, 4.2
[10] Angela Dai, Matthias Nießner, Michael Zollhöfer, Shahram
[29] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se-
Izadi, and Christian Theobalt. BundleFusion: Real-time
bastian Nowozin, and Andreas Geiger. Occupancy Net-
globally consistent 3d reconstruction using on-the-fly surface
works: Learning 3d reconstruction in function space. In
reintegration. ACM TOG, 2017. 2
CVPR, 2019. 2
[11] David Eigen, Christian Puhrsch, and Rob Fergus. Depth map
[30] Zak Murez, Tarrence van As, James Bartolozzi, Ayan Sinha,
prediction from a single image using a multi-scale deep net-
Vijay Badrinarayanan, and Andrew Rabinovich. Atlas: End-
work. In NeurIPS, 2014. 4.1
to-End 3D Scene Reconstruction from Posed Images. In
[12] Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong
ECCV, 2020. 1, 2, 3.2, 4.1, 1, 4.1, 2, 4.2, 4
Tan, and Ping Tan. Cascade cost volume for high-resolution
multi-view stereo and stereo matching. In CVPR, 2020. 2 [31] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D.
[13] Yuxin Hou, Juho Kannala, and Arno Solin. Multi-view Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A.
stereo by temporal nonparametric fusion. In ICCV, 2019. Fitzgibbon. KinectFusion: Real-time dense surface mapping
1, 3.1, 4.1, 4.1, 4.2 and tracking. In ISMAR, 2011. 1, 2, 3.2, 4.1
[14] Sunghoon Im, Hae-Gon Jeon, Stephen Lin, and In So [32] Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and
Kweon. DPSNet: End-to-end Deep Plane Sweep Stereo. In Marc Stamminger. Real-Time 3D Reconstruction at Scale
ICLR, 2019. 4.1, 4.1, 4.2 Using Voxel Hashing. ACM TOG, 2013. 2
[15] Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, and Lu [33] Jeong Joon Park, Peter Florence, Julian Straub, Richard
Fang. SurfaceNet: An end-to-end 3D neural network for Newcombe, and Steven Lovegrove. DeepSDF: Learning
multiview stereopsis. In ICCV, 2017. 2, 3.2 Continuous Signed Distance Functions for Shape Represen-
[16] Mengqi Ji, Jinzhi Zhang, Qionghai Dai, and Lu Fang. Sur- tation. In CVPR, 2019. 2
faceNet+: An End-to-End 3D Neural Network for Very [34] Matia Pizzoli, Christian Forster, and Davide Scaramuzza.
Sparse Multi-View Stereopsis. IEEE TPAMI, 2020. 2 REMODE: Probabilistic, Monocular Dense Reconstruction
[17] Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei in Real Time. In ICRA, 2014. 2, 3.2
Huang, Matthias Nießner, and Thomas Funkhouser. Local [35] Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen. A Gen-
implicit grid representations for 3d scenes. In CVPR, 2020. eral Optimization-Based Framework for Local Odometry Es-
2 timation with Multiple Sensors. ArXiv, 2019. 1

9
[36] Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- time Monocular 3D Reconstruction on a Mobile Phone.
ishima, Angjoo Kanazawa, and Hao Li. PIFu: Pixel-Aligned IEEE TVCG, 2020. 2
Implicit Function for High-Resolution Clothed Human Dig- [52] Zhenfei Yang, Fei Gao, and Shaojie Shen. Real-Time
itization. In ICCV, 2019. 2 Monocular Dense Mapping on Aerial Robots Using Visual-
[37] Johannes L. Schönberger, Enliang Zheng, Jan-Michael Inertial Fusion. In ICRA, 2017. 1
Frahm, and Marc Pollefeys. Pixelwise View Selection for [53] Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan.
Unstructured Multi-View Stereo. In ECCV. 2016. 2, 2, 4.1, MVSNet: Depth Inference for Unstructured Multi-View
4.1, 4.2 Stereo. In ECCV, 2018. 2, 4.1, 4.2
[38] Thomas Schops, Torsten Sattler, Christian Hane, and Marc [54] Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan
Pollefeys. 3D Modeling on the Go: Interactive 3D Recon- Atzmon, Basri Ronen, and Yaron Lipman. Multiview Neu-
struction of Large-Scale Scenes on Mobile Devices. In 3DV, ral Surface Reconstruction by Disentangling Geometry and
2015. 1, 2, 3.2 Appearance. In NeurIPS, 2020. 2
[39] Jamie Shotton, Ben Glocker, Christopher Zach, Shahram [55] Zehao Yu and Shenghua Gao. Fast-MVSNet: Sparse-to-
Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene dense multi-view stereo with learned propagation and gauss-
Coordinate Regression Forests for Camera Relocalization in newton refinement. In CVPR, 2020. 4.2
RGB-D Images. In CVPR, 2013. 4.1 [56] Enliang Zheng, Enrique Dunn, Vladimir Jojic, and Jan-
[40] Sungjoon Choi, Q. Zhou, and V. Koltun. Robust reconstruc- Michael Frahm. PatchMatch Based Joint View Selection and
tion of indoor scenes. In CVPR, 2015. 3.1 Depthmap Estimation. In CVPR, 2014. 2
[41] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, [57] Huizhong Zhou, Benjamin Ummenhofer, and Thomas Brox.
Mark Sandler, Andrew Howard, and Quoc V Le. Mnas- DeepTAM: Deep Tracking and Mapping. In ECCV, 2018. 2
Net: Platform-aware neural architecture search for mobile.
In CVPR, 2019. 3.3
[42] Chengzhou Tang and Ping Tan. BA-Net: Dense Bundle Ad-
justment Networks. In ICLR, 2019. 2, 4.1, 1, 4.1, 2
[43] Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin,
Hanrui Wang, and Song Han. Searching Efficient 3D Ar-
chitectures with Sparse Point-Voxel Convolution. In ECCV,
2020. 3.3
[44] Zachary Teed and Jia Deng. DeepV2D: Video to Depth with
Differentiable Structure from Motion. In ICLR, 2020. 2, 4.1,
4.1, 4.2
[45] Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Niko-
laus Mayer, Eddy Ilg, Alexey Dosovitskiy, and Thomas
Brox. DeMoN: Depth and Motion Network for Learning
Monocular Stereo. In CVPR, 2017. 2, 4.1, 4.1, 4.2
[46] Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal
Wadhwa, Max Dzitsiuk, Michael Schoenberg, Vivek Verma,
Ambrus Csaszar, Eric Turner, Ivan Dryanovski, Joao
Afonso, Jose Pascoal, Konstantine Tsotsos, Mira Leung,
Mirko Schmidt, Onur Guleryuz, Sameh Khamis, Vladimir
Tankovitch, Sean Fanello, Shahram Izadi, and Christoph
Rhemann. Depth from Motion for Smartphone AR. ACM
TOG, 2019. 1, 2
[47] George Vogiatzis and Carlos Hernández. Video-Based, Real-
Time Multi-View Stereo. Image and Vision Computing,
2011. 2
[48] Kaixuan Wang and Shaojie Shen. MVDepthNet: Real-Time
Multiview Depth Estimation Neural Network. In 3DV, 2018.
1, 2, 4.1, 4.1, 4.2
[49] Silvan Weder, Johannes Schönberger, Marc Pollefeys, and
Martin R. Oswald. RoutedFusion: Learning Real-Time
Depth Map Fusion. In CVPR, 2020. 2
[50] Silvan Weder, Johannes L. Schönberger, Marc Pollefeys, and
Martin R. Oswald. NeuralFusion: Online Depth Fusion in
Latent Space, 2020. 2
[51] Xingbin Yang, L. Zhou, Hanqing Jiang, Z. Tang, Yuanbo
Wang, H. Bao, and Guofeng Zhang. Mobile3DRecon: Real-

Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
100% (6)
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
637 pages
Dust3R: Geometric 3D Vision Made Easy
No ratings yet
Dust3R: Geometric 3D Vision Made Easy
23 pages
Algorithmic Trading Unleashed AI Driven Strategies For Success
No ratings yet
Algorithmic Trading Unleashed AI Driven Strategies For Success
54 pages
GenMath First Summative Test
No ratings yet
GenMath First Summative Test
3 pages
Integer Linear Programming: Multiple Choice
No ratings yet
Integer Linear Programming: Multiple Choice
12 pages
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
No ratings yet
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
15 pages
Attention Aware Cost Volume Pyramid Based Multi-View Stereo Network For 3D Reconstruction
No ratings yet
Attention Aware Cost Volume Pyramid Based Multi-View Stereo Network For 3D Reconstruction
21 pages
3D Reconstruction 2021
No ratings yet
3D Reconstruction 2021
27 pages
Monocular Depth Estimation Based On Deep Learning: An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning: An Overview
14 pages
3 DRec
No ratings yet
3 DRec
31 pages
SAIL-VOS 3D: A Synthetic Dataset and Baselines For Object Detection and 3D
No ratings yet
SAIL-VOS 3D: A Synthetic Dataset and Baselines For Object Detection and 3D
13 pages
Turning Mobile Phones Into 3D Scanners
No ratings yet
Turning Mobile Phones Into 3D Scanners
8 pages
Context-Guided Multi-View Stereo With Depth Back-Projection
No ratings yet
Context-Guided Multi-View Stereo With Depth Back-Projection
12 pages
NeuralFusion Online Depth Fusion in Latent Space CVPR
No ratings yet
NeuralFusion Online Depth Fusion in Latent Space CVPR
11 pages
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
No ratings yet
Mathematics: Voxel-Based 3D Object Reconstruction From Single 2D Image Using Variational Autoencoders
11 pages
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
No ratings yet
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
10 pages
GP-Recon Online Monocular Neural 3D Reconstruction With Geometric Prior
No ratings yet
GP-Recon Online Monocular Neural 3D Reconstruction With Geometric Prior
16 pages
cvpr06 3dreconstructionindoor
No ratings yet
cvpr06 3dreconstructionindoor
8 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
A Real World Dataset For Multi-View 3D
No ratings yet
A Real World Dataset For Multi-View 3D
18 pages
High-Fidelity Neural Surface Reconstruction
No ratings yet
High-Fidelity Neural Surface Reconstruction
18 pages
Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network
No ratings yet
Fast Single-View 3D Object Reconstruction With Fine Details Through Dilated Downsample and Multi-Path Upsample Deep Neural Network
5 pages
Point-Based Multi-View Stereo Network
No ratings yet
Point-Based Multi-View Stereo Network
13 pages
AutoRecon 自动检测物体并重建
No ratings yet
AutoRecon 自动检测物体并重建
10 pages
Turning Mobile Phones Into 3D Scanners
No ratings yet
Turning Mobile Phones Into 3D Scanners
8 pages
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
No ratings yet
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
8 pages
Depth-Regularized Optimization For 3D Gaussian Splatting in Few-Shot Images
No ratings yet
Depth-Regularized Optimization For 3D Gaussian Splatting in Few-Shot Images
10 pages
Nerf Slam
No ratings yet
Nerf Slam
10 pages
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
No ratings yet
Mescheder Occupancy Networks Learning 3D Reconstruction in Function Space CVPR 2019 Paper
11 pages
PSDF Fusion: Probabilistic Signed Distance Function For On-The-Fly 3D Data Fusion and Scene Reconstruction
No ratings yet
PSDF Fusion: Probabilistic Signed Distance Function For On-The-Fly 3D Data Fusion and Scene Reconstruction
17 pages
Learning-Based Multi-View Stereo: A Survey
No ratings yet
Learning-Based Multi-View Stereo: A Survey
23 pages
Ju DG-Recon Depth-Guided Neural 3D Scene Reconstruction ICCV 2023 Paper
No ratings yet
Ju DG-Recon Depth-Guided Neural 3D Scene Reconstruction ICCV 2023 Paper
11 pages
Sparse Neu S
No ratings yet
Sparse Neu S
22 pages
Directional TSDF
No ratings yet
Directional TSDF
8 pages
D S: C G S D: Epth Plat Onnecting Aussian Platting AND Epth
No ratings yet
D S: C G S D: Epth Plat Onnecting Aussian Platting AND Epth
15 pages
Conv Occ Nerf
No ratings yet
Conv Occ Nerf
28 pages
Neus: Learning Neural Implicit Surfaces by Volume Rendering For Multi-View Reconstruction
No ratings yet
Neus: Learning Neural Implicit Surfaces by Volume Rendering For Multi-View Reconstruction
23 pages
Structure Tensor-Based Gaussian Kernel Edge-Adaptive Depth Map Refinement With Triangular Point View in Images
No ratings yet
Structure Tensor-Based Gaussian Kernel Edge-Adaptive Depth Map Refinement With Triangular Point View in Images
9 pages
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
No ratings yet
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
10 pages
2d 3d Reconstruction
No ratings yet
2d 3d Reconstruction
11 pages
Research On 3D Reconstruction Based On Multiple Views
No ratings yet
Research On 3D Reconstruction Based On Multiple Views
5 pages
DUSt 3 R
No ratings yet
DUSt 3 R
13 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
No ratings yet
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
12 pages
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
No ratings yet
Surfacenet: An End-To-End 3D Neural Network For Multiview Stereopsis
9 pages
VisFusion Supp
No ratings yet
VisFusion Supp
7 pages
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
No ratings yet
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
9 pages
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
No ratings yet
Octnetfusion: Learning Depth Fusion From Data: (Riegler, Bischof) @icg - Tugraz.At (Osman - Ulusoy, Andreas - Geiger) @tue - Mpg.De
10 pages
Neuralangelo: High-Fidelity Neural Surface Reconstruction
No ratings yet
Neuralangelo: High-Fidelity Neural Surface Reconstruction
10 pages
Dust3R: Geometric 3D Vision Made Easy
No ratings yet
Dust3R: Geometric 3D Vision Made Easy
23 pages
NeuralRecon Suppmat
No ratings yet
NeuralRecon Suppmat
3 pages
Fates GS
No ratings yet
Fates GS
16 pages
VGGT
No ratings yet
VGGT
20 pages
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
No ratings yet
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
10 pages
Unit Iv Aicv Aids
No ratings yet
Unit Iv Aicv Aids
22 pages
Neus 2
No ratings yet
Neus 2
15 pages
Singh 2020
No ratings yet
Singh 2020
5 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Q-SLAM: Quadric Representations For Monocular SLAM
No ratings yet
Q-SLAM: Quadric Representations For Monocular SLAM
18 pages
Point-SLAM Dense Neural Point Cloud-Based SLAM
No ratings yet
Point-SLAM Dense Neural Point Cloud-Based SLAM
17 pages
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
No ratings yet
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
16 pages
Lab Experiment 8: Effect of Feedback On Disturbance & Control System Design Objective
No ratings yet
Lab Experiment 8: Effect of Feedback On Disturbance & Control System Design Objective
3 pages
Spann3r 2408.16061v1
No ratings yet
Spann3r 2408.16061v1
14 pages
M (It) 101 - Engineering Mathematics - I - r23
No ratings yet
M (It) 101 - Engineering Mathematics - I - r23
2 pages
Lecture17 PDF
No ratings yet
Lecture17 PDF
19 pages
Economic Order Quantity in Fuzzy Sense For Inventory
No ratings yet
Economic Order Quantity in Fuzzy Sense For Inventory
74 pages
2019-20-Detection Points, Lines and Edges
No ratings yet
2019-20-Detection Points, Lines and Edges
32 pages
Convolution Presentation
No ratings yet
Convolution Presentation
65 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
MAGMAP Filtering: Montaj Extension Developed by Geosoft
No ratings yet
MAGMAP Filtering: Montaj Extension Developed by Geosoft
2 pages
Project Report
100% (1)
Project Report
35 pages
Beste
No ratings yet
Beste
16 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Fault Analysis Using Z Bus
No ratings yet
Fault Analysis Using Z Bus
11 pages
A Crop Recommendation System To Improve Crop Produ
No ratings yet
A Crop Recommendation System To Improve Crop Produ
5 pages
017 OnTheEdgeOfEdgeColouring LMZatesko
No ratings yet
017 OnTheEdgeOfEdgeColouring LMZatesko
201 pages
Explaining Deep Neural Network (2022)
No ratings yet
Explaining Deep Neural Network (2022)
16 pages
Mask3D Pretraining 2D Vision Transformers by Learning Masked 3D Priors
No ratings yet
Mask3D Pretraining 2D Vision Transformers by Learning Masked 3D Priors
10 pages
Chapter 6 Project Schedule Management
No ratings yet
Chapter 6 Project Schedule Management
35 pages
Fem Project 1
No ratings yet
Fem Project 1
11 pages
Goertzel's Algorithm
No ratings yet
Goertzel's Algorithm
4 pages
Saputra 2018
No ratings yet
Saputra 2018
36 pages
Lab 5A: Design A Digital Fir Low Pass Filter With The Following Specifications
No ratings yet
Lab 5A: Design A Digital Fir Low Pass Filter With The Following Specifications
6 pages
A Survey Neural Network-Interpretability
No ratings yet
A Survey Neural Network-Interpretability
17 pages
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
No ratings yet
Espinosa Et Al. (2023) - Predictability and Financial Sufficiency in Colombia - Bayesian Approach
18 pages
Physical Chemistry: An Indian Journal: Explanation of The Gibbs Paradox in Statistical Mechanics
No ratings yet
Physical Chemistry: An Indian Journal: Explanation of The Gibbs Paradox in Statistical Mechanics
6 pages
Real Sense
No ratings yet
Real Sense
8 pages
2 - 04. Energy Method (5. Minimum Total Potential E - 02)
No ratings yet
2 - 04. Energy Method (5. Minimum Total Potential E - 02)
18 pages
A Volumetric Method For Building Complex Models From Range Images
No ratings yet
A Volumetric Method For Building Complex Models From Range Images
10 pages
BUOL A Bottom-Up Framework With Occupancy-Aware Lifting For Panoptic 3D Scene Reconstruction From A Single Image
No ratings yet
BUOL A Bottom-Up Framework With Occupancy-Aware Lifting For Panoptic 3D Scene Reconstruction From A Single Image
10 pages
Point Cloud Survey
No ratings yet
Point Cloud Survey
9 pages
High Speed High Dynamic Range 3D Shape Measuremente Based On Deep Learning
No ratings yet
High Speed High Dynamic Range 3D Shape Measuremente Based On Deep Learning
9 pages
Neural 3D Reconstruction in The Wild
No ratings yet
Neural 3D Reconstruction in The Wild
9 pages
Nonlinear Observer Design For L-V System
No ratings yet
Nonlinear Observer Design For L-V System
8 pages
Sibgrapi 2020
No ratings yet
Sibgrapi 2020
4 pages
Pre-Calculus 11 - Chapter 7 Review Absolute Values
No ratings yet
Pre-Calculus 11 - Chapter 7 Review Absolute Values
5 pages
MCQ's Chapter 2
No ratings yet
MCQ's Chapter 2
4 pages
Index DSA
No ratings yet
Index DSA
2 pages
Cazoom Maths. Linear Functions. Equations of Parallel Lines
No ratings yet
Cazoom Maths. Linear Functions. Equations of Parallel Lines
2 pages
Scopusresults
No ratings yet
Scopusresults
1 page
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet

Neural Recon

Uploaded by

Neural Recon

Uploaded by

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

We present a novel framework named NeuralRecon for

Output: Pred. Geo.

Image Feature Volume Extract

Hlt 1 Surface Position

Image Flt Sparse Unoccupied o6✓

Consistent Depth Atlas Ours

COLMAP CNMNet DeepV2D

Consistent Depth Atlas Ours

COLMAP CNMNet DeepV2D

Consistent Depth Atlas Ours

i ii iii iv v Ground Truth

You might also like