0% found this document useful (0 votes)

29 views11 pages

Shin Deep Depth Estimation From Thermal Image CVPR 2023 Paper

Uploaded by

21je0918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views11 pages

Shin Deep Depth Estimation From Thermal Image CVPR 2023 Paper

Uploaded by

21je0918

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

the final published version of the proceedings is available on IEEE Xplore.

Deep Depth Estimation from Thermal Image

Ukcheol Shin Jinsun Park In So Kweon

KAIST Pusan National University KAIST
[email protected] [email protected] [email protected]

Abstract

Robust and accurate geometric understanding against

adverse weather conditions is one top prioritized condi-
tions to achieve a high-level autonomy of self-driving cars.
However, autonomous driving algorithms relying on the vis-
ible spectrum band are easily impacted by weather and
lighting conditions. A long-wave infrared camera, also
known as a thermal imaging camera, is a potential res-
cue to achieve high-level robustness. However, the miss- (a) Depth from thermal images via unified depth network
ing necessities are the well-established large-scale dataset
and public benchmark results. To this end, in this pa-
per, we first built a large-scale Multi-Spectral Stereo (MS2 )
dataset, including stereo RGB, stereo NIR, stereo thermal,
and stereo LiDAR data along with GNSS/IMU informa-
tion. The collected dataset provides about 195K synchro-
nized data pairs taken from city, residential, road, campus,
and suburban areas in the morning, daytime, and nighttime
under clear-sky, cloudy, and rainy conditions. Secondly,
we conduct an exhaustive validation process of monocu- (b) RGB (Reference) (c) Thermal image (d) Depth map
lar and stereo depth estimation algorithms designed on
visible spectrum bands to benchmark their performance
in the thermal image domain. Lastly, we propose a uni- Figure 1. Depth from thermal images in various environments.
fied depth network that effectively bridges monocular depth Our proposed network can estimate both monocular and stereo
depth maps regardless of given a single or stereo thermal image
and stereo depth tasks from a conditional random field
via unified network architecture. Furthermore, depth estimation
approach perspective. Our dataset and source code are
results from thermal images show high-level reliability and robust-
available at https://github.com/UkcheolShin/ ness under day-light, low-light, and rainy conditions.
MS2-MultiSpectralStereoDataset.

1. Introduction Therefore, recent works have actively investigated alter-

native sensors such as Near-Infrared (NIR) cameras [39],
Recently, a number of researches have been conducted LiDARs [16, 51], radars [14, 32], and long-wave infrared
for accurate and robust geometric understanding in self- (LWIR) cameras [35, 45] to achieve reliable and robust
driving cars based on the widely-used benchmark datasets, geometric understanding in adverse conditions. Among
such as KITTI [15], DDAD [17], and nuScenes [4]. Mod- these alternative and complementary sensors, LWIR cam-
ern computer vision algorithms deploy a deep neural net- era (i.e., thermal camera) has become more popular be-
work and data-driven machine learning technique to achieve cause of its competitive price, adverse weather robustness,
high-level accuracy, which needs large-scale datasets. How- and unique modality information (i.e., temperature). There-
ever, from the perspective of robustness in real-world, the fore, various thermal image based computer vision solu-
algorithms mostly rely on visible spectrum images and are tions [3, 21–23, 27, 35, 45, 47–50] to achieve high-level ro-
easily degenerated by weather and lighting conditions. bustness have been actively attracting attention recently.

1043
Table 1. Comprehensive comparison of multi-modal datasets. Compared to previous datasets [6, 8, 24, 25, 28, 53], the proposed Multi-
Spectral Stereo (MS2 ) dataset provides about 195K synchronized and rectified multi-spectral stereo sensor data (i.e., RGB, NIR, thermal,
LiDAR, and GNSS/IMU data) covering diverse locations (e.g., city, campus, residential, road, and suburban), times (e.g., morning, daytime,
and nighttime), and weathers (e.g., clear-sky, cloudy, and rainy).

Total # of RGB NIR Thermal Weather

Dataset Year Environment Platform LiDAR IMU
Data Pairs Mono Stereo Mono Stereo Mono Stereo RAW Daytime Nighttime Rain
CATS [53] 2017 In/Outdoor Handheld 1.4K ✓ ✓ ✓ ✓ ✕ ✕ ✓ ✓ ✓ ✓ ✓ ✕
KAIST [6] 2018 Outdoor Vehicle Unknown ✓ ✓ ✓ ✓ ✕ ✕ ✓ ✕ ✓ ✓ ✓ ✕
ViViD [25] 2019 In/Outdoor Handheld 5.3K/4.3K ✓ ✓ ✓ ✕ ✕ ✕ ✓ ✕ ✓ ✓ ✓ ✕
MultiSpectralMotion [8] 2021 In/Outdoor Handheld 121K/27.3K ✓ ✓ ✓ ✕ ✓ ✕ ✓ ✕ ✓ ✓ ✓ ✕
ViViD++ [24] 2022 Outdoor Vehicle 56K ✓ ✓ ✓ ✕ ✕ ✕ ✓ ✕ ✓ ✓ ✓ ✕
Handheld/ 71K/
OdomBeyondVision [28] 2022 Indoor ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✕ ✓ ✓ ✓ ✕
UGV/UAV 117K/21K
Ours 2022 Outdoor Vehicle 195K ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

However, the missing necessities are the well- • We perform exhaustive validation and investigate that
established large-scale dataset and public benchmark monocular and stereo depth estimation algorithms
results. The publicly available datasets for autonomous originally designed for visible spectral bands work rea-
driving are overwhelmingly composed of the visible spec- sonably in thermal spectral bands.
trum band (i.e., RGB images), but it very rarely includes
other spectrum bands, such as the NIR band and LWIR • We propose a unified depth network that bridges
band. Especially, despite the advantage of the LWIR band, monocular depth and stereo depth estimation tasks
just a few LWIR datasets have been recently released. from the perspective of a conditional random field ap-
However, these datasets are indoor oriented [8, 25, 28], proach.
small scale [25, 53], publicly unavailable [6], or limited
sensor diversity [6, 24]. Therefore, the necessity is getting 2. Related Work
increase to design a large-scale multi-sensor driving dataset
to investigate the feasibility and challenges associated 2.1. Thermal Image Dataset for 3D Vision
with an autonomous driving perception system from A well-established large-scale dataset is the most fun-
multi-spectral sensors. damental and top priority for modern deep neural network
The other necessity is thoroughly validating vision ap- training. For the visible spectrum band, numerous large-
plications on the LWIR band. Estimating a depth map from scale datasets have been proposed such as KITTI [15],
monocular and stereo images is one fundamental task for DDAD [17], Cityscape [7], Oxford [36], and nuScenes [4]
geometric understanding. Despite numerous recent stud- datasets. On the other hand, the InfraRed (IR) spectrum
ies in depth estimation, these works have mainly focused band (e.g., near-IR, short-wave IR, long-wave IR) is very
on depth estimation using RGB images. However, thermal rarely included in just a few datasets in a limited form de-
images, which typically have lower resolution, less texture, spite its superior environmental robustness.
and more noise than RGB images, could pose a challenge The comprehensive comparison is shown in Tab. 1.
for stereo-matching algorithms. This means that the perfor- Most datasets are insufficient to investigate the feasibil-
mances of these previous works in thermal image domains ity of geometric and semantic understanding from multi-
are uncertain and may not be guaranteed. spectrum image sensors under diverse outdoor driving sce-
To this end, in this paper, we provide a large-scale multi- narios. More specifically, these datasets are indoor ori-
spectral dataset along with exhaustive experimental results ented [8, 25, 28], small scale [25, 53], publicly unavail-
and a new perspective of depth unification to encourage ac- able [6], limited sensor diversity [6, 24], limited weather
tive research of various geometry algorithms from multi- condition [6, 24, 25], or missing RAW thermal data [53].
spectral data to achieve high-level performance, reliability,
and robustness against hostile conditions. Our contributions 2.2. Depth From Visible Spectrum Band
can be summarized as follows:
Monocular Depth Estimation (MDE) has high-level
• We provide a large-scale Multi-Spectral Stereo (MS2 ) universality because it estimates depth map from a single
dataset, including stereo RGB, stereo NIR, stereo ther- image. There have been numerous mainstream methods for-
mal, and stereo LiDAR data along with GNSS/IMU mulating depth estimation as per-pixel regression [26, 41,
data. Our dataset provides about 195K synchronized 42, 56] by directly estimating per-pixel depth value through
data pairs taken from city, residential, road, campus, a neural network, per-pixel classification [12, 13] by dis-
and suburban areas in the morning, daytime, and night- cretizing continuous depth range into discrete intervals, and
time under clear-sky, cloudy, and rainy conditions. classification-and-regression problems [2, 29].

1044
However, MDE is an ill-posed problem; a single 2D im- Table 2. Sensor specification for the multi-spectral stereo sys-
age can be generated from an infinite number of distinct 3D tem. Our sensor system consists of RGB, NIR, thermal, and Li-
scenes. Therefore, the estimated monocular depth map is DAR stereo system along with a GNSS/IMU module. The data
from RGB, NIR, and thermal stereo system was taken at 15 fps
inherently scale-ambiguous, has low generalization perfor-
with synchronized signals. Lidar stereo data were taken at 10 fps.
mance, and provides lower performance than depth estima-
tion from multi-view images.
Sensor Model Frame Rate Characteristics
Stereo Depth Estimation (SDE) can estimate metric- PointGrey BlackFly-S
Max 75 fps
2448×2048 pixel
RGB camera BFS-U3-51S5C Global Shutter
scale depth map by utilizing a known camera baseline and Kowa LM5JC10M 82.2◦ (H) × 66.5◦ (V) FoV
disparity map from a rectified stereo image pair. Existing 1280 × 720 pixel
NIR camera Intel RealSense D435i Max 90 fps Global Shutter
stereo matching networks can be categorized into 3D cost 69◦ (H) × 42◦ (V) FoV
volume [30, 37, 52, 55] and 4D cost volume based meth- 640×512 pixel
ods [5, 18, 20, 43, 54]. The former one estimates a single- 45◦ (H) × 37◦ (V) FoV
Thermal camera FLIR A65C Max 30 fps
Uncooled VOX microbolometer
channel cost volume (e.g., D×H×W) by measuring the sim- 16-bit Raw data
ilarity between left and right features. Then, they aggregate Accuracy: ± 3 cm
LiDAR Velodyne VLP-16 Max 20 fps Measurement range : 100m
the contextual information via 2D convolution. These meth- 360◦ (H), ±15◦ (V) FoV
ods have high memory and computational efficiency, yet the GNSS/IMU
LORD Microstrain
10/100 Hz
Position, Velocity,
encoded volume loses large content information leading to 3DM-GX5-45 Attitude, Acceleration, etc.

unsatisfactory accuracy.
The latter one builts multiple-channel cost volume (e.g.,
D×C×H×W) by concatenating two left-right feature vol- 3. Multi-Spectral Stereo (MS2 ) Dataset
umes [5,20], correlation-volume and left-right features [18],
or attention-added features [54]. Then, they aggregate the 3.1. Multi-Spectral Stereo Sensor System
4D cost volume with 3D convolution layers. Current state-
Despite the well-known advantages of the long-wave in-
of-the-art models are mostly based on this method. How-
frared camera (i.e., thermal camera) [9, 19, 57], the absence
ever, this demands high memory consumption and cubic
of a large-scale dataset still interrupts the development and
computational complexity that is expensive to deploy in a
investigation of condition-agnostic autonomous driving per-
real-world application. The SDE task yields significant per-
ception systems from the thermal spectrum domain. To this
formance gains compared to the MDE task, yet the SDE
end, we designed a data collection platform that consists of
task is still struggling to find accurate corresponding points
RGB, NIR, thermal, and LiDAR stereo system along with
in inherently ill-posed regions such as occlusion areas, re-
a GNSS/IMU module, as shown in Fig. 2-(a),(b), and (c).
peated patterns, textureless regions, and reflective surfaces.
Each sensor specification is described in Tab. 2.
Accurate time-synchronization is one important prereq-
2.3. Depth From Thermal Spectrum Band uisite for various geometric tasks with multiple sensors,
such as depth estimation, odometry, 3D detection, and 3D
Thermal spectrum band has high-level robustness reconstruction. Therefore, we synchronize RGB and NIR
against various adverse weather and lighting conditions, stereo cameras via an external synchronizer. Thermal stereo
such as rain, fog, dust, haze, and low-light conditions. How- cameras are synchronized with the sync signal of the left
ever, due to the absence of a large-scale dataset, most pre- thermal camera. Also, a software trigger is used to synchro-
vious studies for geometric understanding [3, 10, 21, 38, 47] nize the two systems at the start time of each data acquisi-
are conducted on their own testbed. Also, most works focus tion. Please refer to the supplementary material for more
to utilizes a thermal camera along with other heterogeneous details on calibration and sensor system configuration.
sensors for the target geometric task rather than focusing on
the thermal camera itself. 3.2. Data Collection
For the geometric understanding task that utilizes a We collect multi-spectral stereo data (i.e., stereo RGB,
deep neural network, a few researches [22, 35, 44–46] have NIR, thermal, and LiDAR data) along with GNSS/IMU data
been proposed recently. Most studies focus on the self- under various locations, lighting conditions, and weather
supervised depth estimation from thermal images with aux- conditions. Specifically, we obtain the synchronized multi-
iliary modality guidance, such as aligned-and-paired RGB spectral data from campus, city, residential area, suburban
images [22], style transfer network [35], and paired RGB area, and multiple road environments. Also, we provide var-
images [45]. Unlike the previous studies, in this paper, we ious time diversities (e.g., morning, daytime, and nighttime)
target a supervised depth estimation from a single and stereo and weather diversities (e.g., clear-sky, cloudy, and rainy)
thermal image that has not yet been actively explored. for each representative location (Fig. 2-(d) and (e)).

1045
(a) Frontal view of sensor system (b) Sensor system details (c) Coordinate system of our platform

(d) Driving Scenarios - Campus/City/Residential (e) Driving Scenarios - Road/Suburban

(f) Driving Scenario - Campus (RGB/NIR/THR) (g) Driving Scenario - Road (RGB/NIR/THR)

Figure 2. Overview of our proposed Multi-Spectral Stereo (MS2 ) outdoor driving dataset. We designed a data collection platform
that consists of RGB, NIR, thermal, and LiDAR stereo system along with a GPS/IMU module (i.e. (a),(b),(c)). The collected dataset are
taken under locations of campus, city, residential area, road, and suburban with various time slots (morning, day, and night) and weather
conditions (clear-sky, cloudy, and rainy) (i.e. (d) and (e)). According to the surrounding conditions, each spectrum sensor shows different
aspects, advantages, and disadvantages induced by their sensor characteristics (i.e., (f) and (g)). Further examples and details are described
in the supplementary material.

This aims to investigate and evaluate the generalization GNSS/IMU sensor data. Afterward, we aggregate 10 suc-
and domain gap handling abilities of a deep neural net- cessive stereo LiDAR data for each target thermal image
work. It also targets to explore the possibility of multi- via transformation matrices between consecutive data and
sensor complementation and the characteristics of each sen- refine the aggregated point cloud via the Iterative Closest
sor under various conditions (Fig. 2-(f) and (g)). Com- Point (ICP) algorithm [1]. Then, the refined and aggregated
pared to previous datasets [6, 8, 24, 25, 28, 53], the pro- 3D point cloud is projected to the thermal image plane to
posed dataset provides about 195K synchronized and rec- get the final semi-dense depth map.
tified multi-spectral data pairs (i.e., RGB, NIR, thermal,
LiDAR, and GNSS/IMU data) covering diverse locations,
Training Set Configuration. From the MS2 dataset, we
times, weathers, and sensors.
periodically sampled the thermal images and filter out the
3.3. Multi-Spectral Stereo (MS2 ) Depth Dataset static vehicle movement to make training, validation, and
evaluation splits for the learning of monocular and stereo
Ground-Truth Generation Process. To create a dense depth networks. We utilize 26K data pairs for training,
Ground-Truth (GT) depth map, we accumulated 10 succes- 4K pairs for validation, and 5.8K, 6.8K, and 5.2K pairs
sive stereo LiDAR data by utilizing interpolated odometry for evaluation of daytime, nighttime, and rainy conditions.
information from GNSS/IMU sensor in a similar way to We make the training set splits have almost zero overlap in
KITTI dataset [15]. Specifically, we calculate every pose time, weather, and location diversity. The split details can
information of each sensor’s time stamp by interpolating be found in the supplementary material.

1046
Figure 3. Overall pipeline of our proposed depth estimation network. We design a single network that can estimate both monocular
and stereo depth maps from given a single or stereo thermal image. We bridge monocular depth and stereo depth estimation by regarding
the cost-volume as additional information for Neural Window Conditional Random Field (NeWCRF) block [56]. Initially, the network
extracts multi-scale feature maps via Swin-Transformer backbone model [31] and aggregates the global contextual information via Pyramid
Pooling Module (PPM) head [58]. If the right thermal image is available, the network generates each scale of single-channel cost-volume
(i.e., Dscale × H scale × W scale ) based on feature similarity of the left-right features. If only the left image is available, the network
utilizes zero-filled cost-volume. The depth maps are estimated from the multi-scale concatenated features via NeWCRF blocks [56].

4. Depth Estimation from Thermal Image 4.2. Feature Extraction and Aggregation
We adopt Swin transformer [31] as our backbone net-
4.1. Bridging Monocular and Stereo Depth Estima- work. The backbone network extract feature in four scale-
tion level (i.e., 1/4, 1/8, 1/16, and 1/32) from the given images.
After that, the pyramid pooling module(PPM) [58] aggre-
In this section, we connect the Monocular Depth Esti- gates global context information with global average pool-
mation (MDE) and Stereo Depth Estimation (SDE) tasks ing of receptive fields 1, 2, 3, and 6 from the last scale-level.
via the Conditional Random Field(CRF) perspective. MDE The features of remained scales are provided to each level
network has the advantage of high-level universality that of decoders via a skip-connected manner.
doesn’t need extra constrain such as pre-rectification, ex- 4.3. Cost Volume Construction
trinsic matrix information, and additional images. How-
ever, MDE networks suffer from inherent scale ambiguity Most state-of-the-art stereo matching networks [5,18,54]
and generalization issues. On the other hand, SDE net- utilize a 4D cost volume with 3D convolution layer to
works provide an accurate metric-scale depth map by find- achieve higher performance. However, the 4D cost vol-
ing horizontal correspondences between rectified left-and- ume based method requires costly memory and computa-
right images. But, the SDE network is hard to provide a re- tion consumptions. Also, the method makes it hard to as-
liable depth map in the ill-posed regions such as occlusion sociate monocular depth estimation in the network archi-
areas, repeated patterns, textureless regions, and reflective tecture by enforcing the utilization of both left-right feature
surfaces. maps always.
Therefore, we utilize correlation cost volume (i.e., 3D
They can complement each other by bridging two tasks cost volume) [30,37,52,55] that has a single-channel corre-
and, at the same time, flexibly estimate depth maps from lation map for each disparity level. The method loses some
given monocular or stereo images, as shown in Fig. 3. To correlation information between left-right features, yet it
this end, we utilize the recently proposed MDE network, can be easily associated with a monocular depth estimation
Neural Window FC-CRF (NeWCRF), to connect two tasks. network as additional information. The cost volume of each
Specifically, we regard the estimated cost volume as addi- scale is estimated as follows:
tional information for NeWCRF blocks. Therefore, when 1
the right image is available, we add each cost volume of C scale (d, x, y) = < flscale (x, y), frscale (x − d, y) >,
Nc
multi-scale left-and-right features to the left image feature (1)
FLscale . If only the left image is available, the network uti- where < ·, · > is the inner product, Nc denotes the number
lizes zero-filled cost volume. of channels, and flscale and frscale are the feature map of

1047
each scale. The cost volume of each scale is concatenated 5. Experimental Results
with the corresponding feature map of the left image flscale
to form skip-connection input F for the NeWCRF blocks. 5.1. Implementation Details
MDE and SDE Networks For the validation of various
4.4. Neural Window FC-CRF
MDE and SDE networks designed for the visible spectrum
NeWCRF [56] implements traditional CRF as the form band, we train and evaluate representative MDE and SDE
of neural network in a computation efficient way by uti- networks on the proposed MS2 dataset. Specifically, we
lizing shifted window multi-head attention module [31]. adopt regression [26], classification [13], classification-and-
Given the previous prediction result X and concatenated regression [2], and modern transformer [56] based MDE
feature F, the NeWCRF block estimate unary potential ψu networks (i.e., BTS, DORN, AdaBins, and NeWCRF).
and pairwise potential ψp via multi-head attention mecha- Also, we employ 3D cost volume [55] and 4D cost vol-
nism (i.e., NeWCRF block of Fig. 3), as follows: ume [18, 54] based SDE networks (i.e., AANet, Gwc-
X Net, and ACVNet). We utilize their official source code
ψpu = θu (X), ψpi = SoftMax(Q · K T + P ) · X, (2) to implement each network architecture. All networks
i are initialized with ImageNet pretrained [11] or provided
backbone model by following their original implementa-
where θu is the parameter of a unary network and Q, K, P
tions [2, 13, 18, 26, 54–56]. We utilize the PyTorch li-
are query, key, and position embedding matrix of attention
brary [40] to implement our proposed method and other
block. After that, the optimized net, which consists of two
′ comparison methods.
MLP layers, estimates the current stage result X . And the
′ Optimizer and Data Augmentation All models are
X is regarded as X for the next NeWCRF block.
trained for 60 epochs on a single A6000 GPU with 48GB
4.5. Disparity and Inverse Depth Prediction of memory. We utilize a batch size of 8 for all MDE model
training and 4 for all SDE model training. For our method,
The proposed network estimates four scale prediction we use a batch size of 6. We adopt AdamW optimizer [34]
results (i.e., 1/4, 1/8, 1/16, and 1/32) from the last four with an initial learning rate 1e−4 for all model training. Co-
NeWCRF blocks. When a single image is fed to the net- sine Annealing Warm Restarts [33] is used as a learning
work, we regard the prediction results as an inverse depth rate scheduler. For the data augmentation, we apply ran-
map. For the stereo image pair, we regard the prediction re- dom center crop-and-resize, brightness jitter, and contrast
sults as a common disparity map. For the prediction features jitter for all model training. Horizontal flip is additionally
X of each scale, the network employs two convolution lay- applied to the MDE networks. We set the coefficients of
ers to get a single-channel (disparity/inverse depth) volume. multi-scale L1 loss λscale to 0.5, 0.5, 0.7, and 1.0. The
After that, the volume is upsampled and converted into a maximum value of disparity range Dmax is set to 192.
probability volume by the softmax function along the dis-
parity dimension. Finally, the predicted value is computed 5.2. Depth Estimation from Thermal Images
as follows:
X−1
Dmax We provide the comprehensive comparison of represen-
Dpred = k · pk , (3) tative MDE and SDE networks on our MS2 depth dataset,
k=0 as shown in Tab. 3. Also, the advantage of depth estimation
where k denotes disparity level, pk indicates the corre- from thermal images can be observed in Fig. 4.
sponding probability, and Dmax is the maximum value of Monocular Depth Estimation The performance ten-
disparity range. dency of MDE networks is generally preserved in the ther-
mal spectrum domain, similar to KITTI depth benchmark
4.6. Loss Function results [15]. MDE networks with regression heads for depth
map prediction (i.e., BTS and NeWCRF) have clear advan-
We utilize a multi-scale smooth L1 loss, that is com-
tages in error metrics over methods with classification heads
monly adopted in the SDE task, to train our network.
by directly regressing precise depth values. On the other
3
X hand, the classification head (i.e., DORN and Ours) achieve
scale
Lsup = λscale · (SmoothL1 (Dpred,mono , DGT ) higher accuracy scores by explicitly binning depth range.
scale=0 The proposed unified network (i.e., Ours (Mono)) gen-
scale erally shows comparable results with the state-of-the-art
+ SmoothL1 (Dpred,stereo , DGT )), (4)
MDE method by showing higher scores in accuracy met-
where λ indicates the coefficient for the prediction result rics yet, lower metrics in some error metrics. We think the
of each scale, DGT denotes the GT disparity map, and performance gap comes from the depth prediction head and
SmoothL1 is the smooth L1 loss. loss function. All MDE networks utilize GT depth maps

1048
Table 3. Quantitative comparison of depth estimation results on the proposed dataset. We compare our network with state-of-the-art
monocular and stereo depth estimation networks [2,13,18,26,54–56]. Ours shows comparable results in both monocular and stereo depth
estimation results. Differing from the other networks, Ours has high-level practicality and flexibility in that it can flexibly estimate a depth
map regardless of a single or stereo thermal image input. Reg and Cls indicate regression and classification heads for MDE task. The two
types of SDE (i.e., 3D and 4D CV) denote 3D and 4D cost volume, respectively. The best performance in each block is highlighted in bold.

(a) Monocular Depth Estimation Results on the Evaluation Set of Our MS2 Depth Dataset.
Error ↓ Accuracy ↑ Type
Methods TestSet
AbsRel SqRel RMSE RMSElog δ < 1.25 δ < 1.252 δ < 1.253 Reg Cls
Day 0.144 1.288 5.483 0.230 0.856 0.941 0.970
Night 0.136 1.136 5.290 0.212 0.863 0.950 0.976
DORN [13] ✓
Rain 0.180 1.934 6.735 0.276 0.781 0.910 0.955
Avg 0.151 1.419 5.776 0.237 0.837 0.935 0.968
Day 0.122 0.905 4.923 0.198 0.857 0.951 0.980
Night 0.114 0.798 4.701 0.184 0.870 0.959 0.984
BTS [26] ✓
Rain 0.157 1.395 6.053 0.243 0.791 0.926 0.969
Avg 0.129 1.008 5.169 0.206 0.843 0.947 0.978
Day 0.129 0.976 5.108 0.205 0.847 0.947 0.979
Night 0.119 0.822 4.749 0.187 0.864 0.958 0.984
AdaBins [2] ✓ ✓
Rain 0.168 1.545 6.336 0.254 0.771 0.918 0.965
Avg 0.137 1.084 5.330 0.212 0.831 0.943 0.977
Day 0.120 0.864 4.852 0.195 0.858 0.952 0.982
Night 0.112 0.755 4.594 0.179 0.875 0.961 0.985
NeWCRF [56] ✓
Rain 0.155 1.352 5.956 0.240 0.795 0.929 0.970
Avg 0.127 0.965 5.077 0.202 0.846 0.949 0.980
Day 0.115 0.983 4.895 0.201 0.882 0.952 0.977
Night 0.107 0.850 4.658 0.185 0.894 0.961 0.981
Ours (Mono) ✓
Rain 0.152 1.567 6.020 0.247 0.822 0.928 0.964
Avg 0.123 1.103 5.134 0.208 0.869 0.948 0.975
Day 0.113 0.948 4.852 0.200 0.884 0.953 0.977
Night 0.105 0.811 4.584 0.183 0.896 0.961 0.981
Ours (Stereo) ✓
Rain 0.149 1.499 5.940 0.245 0.826 0.929 0.965
Avg 0.120 1.057 5.068 0.207 0.872 0.949 0.975

(b) Disparity Estimation Results on the Evaluation Set of Our MS2 Depth Dataset.
Lower is better Type
Methods TestSet
EPE-all(px) D1-all(%) > 1px(%) > 2px(%) > 3px(%) 3D CV 4D CV
Day 0.905 5.5 19.2 8.4 5.5
Night 0.946 5.6 26.0 10.2 5.6
GwcNet [18] ✓
Rain 1.070 7.2 24.3 11.1 7.2
Avg 0.969 6.0 23.3 9.9 6.0
Day 0.939 5.8 20.2 8.8 5.8
Night 0.995 6.1 27.9 11.1 6.1
AANet [55] ✓
Rain 1.091 7.5 25.3 11.6 7.5
Avg 1.005 6.4 24.7 10.5 6.4
Day 0.898 5.5 18.9 8.3 5.5
Night 0.943 5.5 25.9 10.1 5.5
ACVNet [54] ✓
Rain 1.056 7.2 23.6 10.9 7.2
Avg 0.962 6.0 23.0 9.8 6.0
Day 1.033 6.4 23.1 10.5 6.4
Night 0.946 5.6 29.6 9.8 5.6
Ours (Mono) ✓
Rain 1.261 8.7 24.4 14.6 8.7
Avg 1.066 6.8 24.4 11.4 6.8
Day 0.957 5.7 22.7 9.1 5.7
Night 0.853 4.8 21.3 8.2 4.8
Ours (Stereo) ✓
Rain 1.159 7.7 29.1 12.4 7.7
Avg 0.976 5.9 24.0 9.7 5.9

1049
(a) RGB (Reference Only) (b) NIR (Reference Only) (c) THR (d) GT disparity (e) Ours (stereo)

Figure 4. Qualitative results of stereo disparity estimation on the MS2 depth dataset. Predicted disparity map from stereo thermal
images shows high-level robust estimation results regardless of lighting and weather condition. However, inherent hardware noise and the
absence of high-frequency information lead to blurry prediction results for specific regions such as the regions that have similar thermal
radiation values (i.e., temperature) and noisy areas generated by the sensor itself. We think multi-spectral modality fusion can achieve both
robustness and reliability. Further results and comparisons with other MDE and SDE networks can be found in the supplementary material.

that can provide precise distance information. On the other 6. Conclusion

hand, our network is trained with the disparity map that can In this paper, we built a large-scale Multi-Spectral Stereo
be regarded as discretized distance information. Luckily, (MS2 ) dataset, including stereo RGB, stereo NIR, stereo
the performance gaps are narrowed down by utilizing the thermal, and stereo LiDAR data along with GNSS/IMU in-
right image as additional guidance thanks to our unified ar- formation. Also, we conduct an exhaustive validation pro-
chitecture. Also, we believe an investigation of an effective cess of MDE and SDE algorithms whether they work well
form of prediction head for unified MDE and SDE tasks can in the thermal spectrum band. Lastly, we propose a unified
boost the overall performance. depth network that effectively bridges monocular depth and
Stereo Depth Estimation Generally, the method utiliz- stereo depth tasks from a conditional random field perspec-
ing 4D cost volume aggregation with 3D convolution layer tive. We hope our paper encourage active research of var-
(i.e., GwcNet and ACVNet) provide precise disparity esti- ious computer vision algorithm from multi-spectral data to
mation results than 3D cost volume methods (i.e., AANet achieve high-level performance, reliability, and robustness
and Ours (stereo)). However, the strict constraints of the ar- against challenging environments.
chitecture module and left-and-right images degenerate net- Acknowledgment This work was supported by Police-
work flexibility. On the other hand, our proposed network Lab 2.0 Program funded by the Ministry of Sci-
has high-level practicality and flexibility by exploiting a sin- ence and ICT(MSIT, Korea) and Korean National Po-
gle network for both monocular and stereo depth estimation. lice Agency(KNPA, Korea) [Project Name: AI System
At the same time, the proposed network can provide com- Development for a Image processing Based on Multi-
parable performance with 4D cost volume based methods. Band(visible,NIR,LWIR) Fusion Sensing / 220122M0500]

1050
References IEEE conference on computer vision and pattern recogni-
tion, pages 2002–2011, 2018. 2, 6, 7
[1] Paul J Besl and Neil D McKay. Method for registration of [14] Stefano Gasperini, Patrick Koch, Vinzenz Dallabetta, Nas-
3-d shapes. In Sensor fusion IV: control paradigms and data sir Navab, Benjamin Busam, and Federico Tombari. R4dyn:
structures, volume 1611, pages 586–606. Spie, 1992. 4 Exploring radar for self-supervised monocular depth estima-
[2] Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. tion of dynamic scenes. In 2021 International Conference
Adabins: Depth estimation using adaptive bins. In Proceed- on 3D Vision (3DV), pages 751–760. IEEE, 2021. 1
ings of the IEEE/CVF Conference on Computer Vision and [15] Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel
Pattern Recognition, pages 4009–4018, 2021. 2, 6, 7 Urtasun. Vision meets robotics: The kitti dataset. The Inter-
[3] Paulo Vinicius Koerich Borges and Stephen Vidas. Practical national Journal of Robotics Research, 32(11):1231–1237,
infrared visual odometry. IEEE Transactions on Intelligent 2013. 1, 2, 4, 6
Transportation Systems, 17(8):2205–2213, 2016. 1, 3 [16] Vitor Guizilini, Rares Ambrus, Wolfram Burgard, and
[4] Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Adrien Gaidon. Sparse auxiliary networks for unified
Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- monocular depth prediction and completion. In Proceedings
ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- of the IEEE/CVF Conference on Computer Vision and Pat-
modal dataset for autonomous driving. In Proceedings of tern Recognition, pages 11078–11088, 2021. 1
the IEEE/CVF conference on computer vision and pattern [17] Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven-
recognition, pages 11621–11631, 2020. 1, 2 tos, and Adrien Gaidon. 3d packing for self-supervised
[5] Jia-Ren Chang and Yong-Sheng Chen. Pyramid stereo monocular depth estimation. In IEEE Conference on Com-
matching network. In Proceedings of the IEEE conference on puter Vision and Pattern Recognition (CVPR), 2020. 1, 2
computer vision and pattern recognition, pages 5410–5418, [18] Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, and
2018. 3, 5 Hongsheng Li. Group-wise correlation stereo network. In
[6] Yukyung Choi, Namil Kim, Soonmin Hwang, Kibaek Park, Proceedings of the IEEE/CVF Conference on Computer Vi-
Jae Shin Yoon, Kyounghwan An, and In So Kweon. Kaist sion and Pattern Recognition, pages 3273–3282, 2019. 3, 5,
multi-spectral day/night data set for autonomous and assisted 6, 7
driving. IEEE Transactions on Intelligent Transportation [19] Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang,
Systems, 19(3):934–948, 2018. 2, 4 and Yikang Li. Multi-modal sensor fusion for auto driv-
[7] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo ing perception: A survey. arXiv preprint arXiv:2202.02703,
Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe 2022. 3
Franke, Stefan Roth, and Bernt Schiele. The cityscapes [20] Alex Kendall, Hayk Martirosyan, Saumitro Dasgupta, Peter
dataset for semantic urban scene understanding. In Proc. Henry, Ryan Kennedy, Abraham Bachrach, and Adam Bry.
of the IEEE Conference on Computer Vision and Pattern End-to-end learning of geometry and context for deep stereo
Recognition (CVPR), 2016. 2 regression. In Proceedings of the IEEE international confer-
ence on computer vision, pages 66–75, 2017. 3
[8] Weichen Dai, Yu Zhang, Shenzhou Chen, Donglei Sun, and
[21] Shehryar Khattak, Christos Papachristos, and Kostas Alexis.
Da Kong. A multi-spectral dataset for evaluating motion es-
Keyframe-based thermal–inertial odometry. Journal of Field
timation systems. In 2021 IEEE International Conference on
Robotics, 37(4):552–579, 2020. 1, 3
Robotics and Automation (ICRA), pages 5560–5566. IEEE,
[22] Namil Kim, Yukyung Choi, Soonmin Hwang, and In So
2021. 2, 4
Kweon. Multispectral transfer network: Unsupervised depth
[9] Kevser Irem Danaci and Erdem Akagunduz. A sur-
estimation for all-day vision. In Thirty-Second AAAI Con-
vey on infrared image and video sets. arXiv preprint
ference on Artificial Intelligence, 2018. 1, 3
arXiv:2203.08581, 2022. 3
[23] Yeong-Hyeon Kim, Ukcheol Shin, Jinsun Park, and In So
[10] Jeff Delaune, Robert Hewitt, Laura Lytle, Cristina Sorice, Kweon. Ms-uda: Multi-spectral unsupervised domain adap-
Rohan Thakker, and Larry Matthies. Thermal-inertial odom- tation for thermal image semantic segmentation. IEEE
etry for autonomous flight throughout the night. In 2019 Robotics and Automation Letters, 6(4):6497–6504, 2021. 1
IEEE/RSJ International Conference on Intelligent Robots [24] Alex Junho Lee, Younggun Cho, Young-sik Shin, Ayoung
and Systems (IROS), pages 1122–1128. IEEE, 2019. 3 Kim, and Hyun Myung. Vivid++: Vision for visibility
[11] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, dataset. IEEE Robotics and Automation Letters, 7(3):6282–
and Li Fei-Fei. Imagenet: A large-scale hierarchical image 6289, 2022. 2, 4
database. In 2009 IEEE conference on computer vision and [25] Alex Junho Lee, Younggun Cho, Sungho Yoon, Youngsik
pattern recognition, pages 248–255. Ieee, 2009. 6 Shin, and Ayoung Kim. ViViD : Vision for Visibility Dataset.
[12] Raul Diaz and Amit Marathe. Soft labels for ordinal re- In ICRA Workshop on Dataset Generation and Benchmark-
gression. In Proceedings of the IEEE/CVF conference on ing of SLAM Algorithms for Robotics and VR/AR, Montreal,
computer vision and pattern recognition, pages 4738–4747, May. 2019. Best paper award. 2, 4
2019. 2 [26] Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, and
[13] Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Bat- Il Hong Suh. From big to small: Multi-scale local planar
manghelich, and Dacheng Tao. Deep ordinal regression net- guidance for monocular depth estimation. arXiv preprint
work for monocular depth estimation. In Proceedings of the arXiv:1907.10326, 2019. 2, 6, 7

1051
[27] Chenglong Li, Wei Xia, Yan Yan, Bin Luo, and Jin Tang. [40] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
Segmenting objects in day and night: Edge-conditioned cnn Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-
for thermal image semantic segmentation. arXiv preprint ban Desmaison, Luca Antiga, and Adam Lerer. Automatic
arXiv:1907.10303, 2019. 1 differentiation in pytorch. 2017. 6
[28] Peize Li, Kaiwen Cai, Muhamad Risqi U Saputra, [41] René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi-
Zhuangzhuang Dai, and Chris Xiaoxuan Lu. Odombe- sion transformers for dense prediction. In Proceedings of
yondvision: An indoor multi-modal multi-platform odom- the IEEE/CVF International Conference on Computer Vi-
etry dataset beyond the visible spectrum. arXiv preprint sion, pages 12179–12188, 2021. 2
arXiv:2206.01589, 2022. 2, 4 [42] René Ranftl, Katrin Lasinger, David Hafner, Konrad
[29] Zhenyu Li, Xuyang Wang, Xianming Liu, and Junjun Jiang. Schindler, and Vladlen Koltun. Towards robust monocular
Binsformer: Revisiting adaptive bins for monocular depth depth estimation: Mixing datasets for zero-shot cross-dataset
estimation. arXiv preprint arXiv:2204.00987, 2022. 2 transfer. IEEE transactions on pattern analysis and machine
[30] Zhengfa Liang, Yiliu Feng, Yulan Guo, Hengzhu Liu, Wei intelligence, 2020. 2
Chen, Linbo Qiao, Li Zhou, and Jianfeng Zhang. Learn- [43] Zhelun Shen, Yuchao Dai, and Zhibo Rao. Cfnet: Cascade
ing for disparity estimation through feature constancy. In and fused cost volume for robust stereo matching. In Pro-
Proceedings of the IEEE conference on computer vision and ceedings of the IEEE/CVF Conference on Computer Vision
pattern recognition, pages 2811–2820, 2018. 3, 5 and Pattern Recognition, pages 13906–13915, 2021. 3
[31] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng [44] Ukcheol Shin, Kyunghyun Lee, Byeong-Uk Lee, and In So
Zhang, Stephen Lin, and Baining Guo. Swin transformer: Kweon. Maximizing self-supervision from thermal im-
Hierarchical vision transformer using shifted windows. In age for effective self-supervised learning of depth and ego-
Proceedings of the IEEE/CVF International Conference on motion. IEEE Robotics and Automation Letters, 7(3):7771–
Computer Vision, pages 10012–10022, 2021. 5, 6 7778, 2022. 3
[32] Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Cas-
[45] Ukcheol Shin, Kyunghyun Lee, Seokju Lee, and In So
tro, Punarjay Chakravarty, and Praveen Narayanan. Radar-
Kweon. Self-supervised depth and ego-motion estimation
camera pixel depth association for depth completion. In Pro-
for monocular thermal video using multi-spectral consis-
ceedings of the IEEE/CVF Conference on Computer Vision
tency loss. IEEE Robotics and Automation Letters, 2021.
and Pattern Recognition, pages 12507–12516, 2021. 1
1, 3
[33] Ilya Loshchilov and Frank Hutter. Sgdr: Stochas-
[46] Ukcheol Shin, Kwanyong Park, Byeong-Uk Lee,
tic gradient descent with warm restarts. arXiv preprint
Kyunghyun Lee, and In So Kweon. Self-supervised
arXiv:1608.03983, 2016. 6
monocular depth estimation from thermal images via
[34] Ilya Loshchilov and Frank Hutter. Decoupled weight de-
adversarial multi-spectral adaptation. In Proceedings of the
cay regularization. In International Conference on Learning
IEEE/CVF Winter Conference on Applications of Computer
Representations, 2018. 6
Vision, pages 5798–5807, 2023. 3
[35] Yawen Lu and Guoyu Lu. An alternative of lidar in night-
[47] Young-Sik Shin and Ayoung Kim. Sparse depth enhanced
time: Unsupervised depth estimation based on single ther-
direct thermal-infrared slam beyond the visible spectrum.
mal image. In Proceedings of the IEEE/CVF Winter Confer-
IEEE Robotics and Automation Letters, 4(3):2918–2925,
ence on Applications of Computer Vision, pages 3833–3843,
2019. 1, 3
2021. 1, 3
[36] Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul [48] Shreyas S Shivakumar, Neil Rodrigues, Alex Zhou, Ian D
Newman. 1 year, 1000 km: The oxford robotcar dataset. Miller, Vijay Kumar, and Camillo J Taylor. Pst900: Rgb-
The International Journal of Robotics Research, 36(1):3–15, thermal calibration, dataset and segmentation network. arXiv
2017. 2 preprint arXiv:1909.10980, 2019. 1
[37] Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, [49] Yuxiang Sun, Weixun Zuo, and Ming Liu. Rtfnet: Rgb-
Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A thermal fusion network for semantic segmentation of urban
large dataset to train convolutional networks for disparity, scenes. 4(3):2576–2583, 2019. 1
optical flow, and scene flow estimation. In Proceedings of [50] Yuxiang Sun, Weixun Zuo, Peng Yun, Hengli Wang, and
the IEEE conference on computer vision and pattern recog- Ming Liu. Fuseseg: Semantic segmentation of urban scenes
nition, pages 4040–4048, 2016. 3, 5 based on rgb and thermal data fusion. IEEE Trans. on Au-
[38] Yasuto Nagase, Takahiro Kushida, Kenichiro Tanaka, tomation Science and Engineering (TASE), 2020. 1
Takuya Funatomi, and Yasuhiro Mukaigawa. Shape from [51] Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, and Ping Tan.
thermal radiation: Passive ranging using multi-spectral lwir Learning guided convolutional network for depth comple-
measurements. In Proceedings of the IEEE/CVF Conference tion. IEEE Transactions on Image Processing, 30:1116–
on Computer Vision and Pattern Recognition, pages 12661– 1129, 2020. 1
12671, 2022. 3 [52] Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mat-
[39] Jinsun Park, Yongseop Jeong, Kyungdon Joo, Donghyeon toccia, and Luigi Di Stefano. Real-time self-adaptive deep
Cho, and In So Kweon. Adaptive cost volume fusion net- stereo. In Proceedings of the IEEE/CVF Conference on Com-
work for multi-modal depth estimation in changing environ- puter Vision and Pattern Recognition, pages 195–204, 2019.
ments. IEEE Robotics and Automation Letters, 2022. 1 3, 5

1052
[53] Wayne Treible, Philip Saponaro, Scott Sorensen, Abhishek
Kolagunda, Michael O’Neal, Brian Phelan, Kelly Sher-
bondy, and Chandra Kambhamettu. Cats: A color and ther-
mal stereo benchmark. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, pages
2961–2969, 2017. 2, 4
[54] Gangwei Xu, Junda Cheng, Peng Guo, and Xin Yang. Atten-
tion concatenation volume for accurate and efficient stereo
matching. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 12981–
12990, 2022. 3, 5, 6, 7
[55] Haofei Xu and Juyong Zhang. Aanet: Adaptive aggrega-
tion network for efficient stereo matching. In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 1959–1968, 2020. 3, 5, 6, 7
[56] Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, and
Ping Tan. Neural window fully-connected crfs for monocu-
lar depth estimation. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition, pages
3916–3925, 2022. 2, 5, 6, 7
[57] Yuxiao Zhang, Alexander Carballo, Hanting Yang, and
Kazuya Takeda. Autonomous driving in adverse weather
conditions: A survey. arXiv preprint arXiv:2112.08936,
2021. 3
[58] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang
Wang, and Jiaya Jia. Pyramid scene parsing network. In
Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 2881–2890, 2017. 5

1053

Decoder Modulation For Indoor Depth Completion
No ratings yet
Decoder Modulation For Indoor Depth Completion
11 pages
Solid Works Training BASIC
100% (4)
Solid Works Training BASIC
176 pages
A Real-World Dataset For Omnidirectional Stereo Depth Estimation.18335v1
No ratings yet
A Real-World Dataset For Omnidirectional Stereo Depth Estimation.18335v1
21 pages
The Application of Deep Learning in Stereo Matching and Disparity
No ratings yet
The Application of Deep Learning in Stereo Matching and Disparity
24 pages
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
No ratings yet
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
26 pages
Deep Learning-Based Thermal Image Reconstruction and Object Detection
No ratings yet
Deep Learning-Based Thermal Image Reconstruction and Object Detection
21 pages
Fdsafdsfsafasdfbrwa
No ratings yet
Fdsafdsfsafasdfbrwa
14 pages
Data-Augmented Manifold Learning Thermography For Defect Detection and Evaluation of Polymer Composites
No ratings yet
Data-Augmented Manifold Learning Thermography For Defect Detection and Evaluation of Polymer Composites
16 pages
Sensors 23 03282 v2
No ratings yet
Sensors 23 03282 v2
22 pages
4 5 Multiview (System)
No ratings yet
4 5 Multiview (System)
52 pages
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
No ratings yet
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
18 pages
Monocular Depth Estimation Based On Deep Learning An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning An Overview
16 pages
Depthanything
No ratings yet
Depthanything
18 pages
D S: C G S D: Epth Plat Onnecting Aussian Platting AND Epth
No ratings yet
D S: C G S D: Epth Plat Onnecting Aussian Platting AND Epth
15 pages
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
No ratings yet
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
13 pages
CV Sce
No ratings yet
CV Sce
12 pages
Chugunov The Implicit Values of A Good Hand Shake Handheld Multi-Frame CVPR 2022 Paper
No ratings yet
Chugunov The Implicit Values of A Good Hand Shake Handheld Multi-Frame CVPR 2022 Paper
11 pages
Zoe Depth
No ratings yet
Zoe Depth
20 pages
Wang 2023 Deep
No ratings yet
Wang 2023 Deep
14 pages
JournalPaper ASC Updated
No ratings yet
JournalPaper ASC Updated
16 pages
ChiTransformer Towards Reliable Stereo From Cues
No ratings yet
ChiTransformer Towards Reliable Stereo From Cues
11 pages
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
No ratings yet
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
15 pages
Depth Estimation by Combining Binocular Stereo and Monocular
No ratings yet
Depth Estimation by Combining Binocular Stereo and Monocular
10 pages
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
No ratings yet
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
12 pages
Neural RGBRD Sensing Depth and Uncertainty From A Video Camera
No ratings yet
Neural RGBRD Sensing Depth and Uncertainty From A Video Camera
10 pages
Unsupervised Domain Adaptation For Depth Prediction From Images
No ratings yet
Unsupervised Domain Adaptation For Depth Prediction From Images
14 pages
Yang DrivingStereo A Large-Scale Dataset For Stereo Matching in Autonomous Driving CVPR 2019 Paper
No ratings yet
Yang DrivingStereo A Large-Scale Dataset For Stereo Matching in Autonomous Driving CVPR 2019 Paper
10 pages
Automatic Detection and Identification of Defects by Deep Learning Algorithms From Pulsed Thermography Data
No ratings yet
Automatic Detection and Identification of Defects by Deep Learning Algorithms From Pulsed Thermography Data
34 pages
Affine Transform Representation For Reducing Calibration Cost On Absorption-Based LWIR Depth Sensing
No ratings yet
Affine Transform Representation For Reducing Calibration Cost On Absorption-Based LWIR Depth Sensing
9 pages
Isprs Archives XLVIII 2 W8 2024 31 2024
No ratings yet
Isprs Archives XLVIII 2 W8 2024 31 2024
6 pages
Teste
No ratings yet
Teste
12 pages
Park Depth Prompting For Sensor-Agnostic Depth Estimation CVPR 2024 Paper
No ratings yet
Park Depth Prompting For Sensor-Agnostic Depth Estimation CVPR 2024 Paper
11 pages
Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
No ratings yet
Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
16 pages
Zusc S 24 00845
No ratings yet
Zusc S 24 00845
15 pages
Multidepth: Single-Image Depth Estimation Via Multi-Task Regression and Classification
No ratings yet
Multidepth: Single-Image Depth Estimation Via Multi-Task Regression and Classification
9 pages
Sensors 22 08512 v3
No ratings yet
Sensors 22 08512 v3
21 pages
DIODE - A Dense Indoor and Outdoor DEpth Dataset
No ratings yet
DIODE - A Dense Indoor and Outdoor DEpth Dataset
5 pages
08094863
No ratings yet
08094863
13 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
Doctoral Thesis Proposal DCCA
No ratings yet
Doctoral Thesis Proposal DCCA
16 pages
Neural Network Adaption For Depth Sensor Replication
No ratings yet
Neural Network Adaption For Depth Sensor Replication
11 pages
Schoeps 2017 CVPR
No ratings yet
Schoeps 2017 CVPR
10 pages
High-Resolution Depth Estimation For 360 Panoramas Through Perspective and Panoramic Depth Images Registration
No ratings yet
High-Resolution Depth Estimation For 360 Panoramas Through Perspective and Panoramic Depth Images Registration
15 pages
1 s2.0 S1110982324000048 Main
No ratings yet
1 s2.0 S1110982324000048 Main
17 pages
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
No ratings yet
Rukhovich ImVoxelNet Image To Voxels Projection For Monocular and Multi-View General-Purpose WACV 2022 Paper
10 pages
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
No ratings yet
Point Based Rendering Enhancement Via Deep Learning: December 16, 2018
13 pages
Ijarece Vol 2 Issue 3 314 319
No ratings yet
Ijarece Vol 2 Issue 3 314 319
6 pages
Deep Learning Meets Hyperspectral Image Analysis: A Multidisciplinary Review
No ratings yet
Deep Learning Meets Hyperspectral Image Analysis: A Multidisciplinary Review
32 pages
A Stereo Perception Framework For Autonomous Vehicles
No ratings yet
A Stereo Perception Framework For Autonomous Vehicles
6 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
1 s2.0 S0378778823002505 Main
No ratings yet
1 s2.0 S0378778823002505 Main
14 pages
Nasser-IEEE Proceedings 2009
No ratings yet
Nasser-IEEE Proceedings 2009
26 pages
Remote Sensing: Spectral-Spatial Classification of Hyperspectral Imagery With 3D Convolutional Neural Network
No ratings yet
Remote Sensing: Spectral-Spatial Classification of Hyperspectral Imagery With 3D Convolutional Neural Network
21 pages
Neural Ordinary Differential Equations For Hyperspectral Image Classification-Plaza2020
No ratings yet
Neural Ordinary Differential Equations For Hyperspectral Image Classification-Plaza2020
17 pages
Immersive Situational Awareness and Seeing Through Occlusions at Thermal Infrared Wavelengths
No ratings yet
Immersive Situational Awareness and Seeing Through Occlusions at Thermal Infrared Wavelengths
4 pages
GCNet
No ratings yet
GCNet
10 pages
Deep Feature Learning and Classification of Remote Sensing Images
No ratings yet
Deep Feature Learning and Classification of Remote Sensing Images
19 pages
Hyperspectral Remote Sensing Data Analysis and Future Challenges
No ratings yet
Hyperspectral Remote Sensing Data Analysis and Future Challenges
31 pages
Sensors 12 12661 PDF
No ratings yet
Sensors 12 12661 PDF
12 pages
Process Modelling, Simulation and Control For Chemical Engineering. Worked Problems. Chapter 2: Fundamentals
No ratings yet
Process Modelling, Simulation and Control For Chemical Engineering. Worked Problems. Chapter 2: Fundamentals
7 pages
Ec6405 - Control System Engineering Questions and Answers Unit - I Control System Modeling Two Marks
100% (1)
Ec6405 - Control System Engineering Questions and Answers Unit - I Control System Modeling Two Marks
53 pages
2017 H2 Math Functions Lecture Notes
No ratings yet
2017 H2 Math Functions Lecture Notes
32 pages
Evaluating Risks of Construction-Induced Building Damage For Large Underground Construction Projects
No ratings yet
Evaluating Risks of Construction-Induced Building Damage For Large Underground Construction Projects
28 pages
Fracture Analysis of Pressure Vessel Under Dynamic Loading and Thermal Effect PDF
No ratings yet
Fracture Analysis of Pressure Vessel Under Dynamic Loading and Thermal Effect PDF
108 pages
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
No ratings yet
Table 1a: The Complete MSP430 Instruction Set of 27 Core Instructions
9 pages
Measurement of Study Variables
0% (1)
Measurement of Study Variables
12 pages
VLSI Design Automation Syllabus Modified
No ratings yet
VLSI Design Automation Syllabus Modified
3 pages
Get (Ebook PDF) A Second Course in Statistics: Regression Analysis 8th Edition Free All Chapters
100% (8)
Get (Ebook PDF) A Second Course in Statistics: Regression Analysis 8th Edition Free All Chapters
49 pages
Determinants of Commercial Banks Efficiency: Evidence From Selected Commercial Banks of Ethiopia
No ratings yet
Determinants of Commercial Banks Efficiency: Evidence From Selected Commercial Banks of Ethiopia
6 pages
Assignment-1 QT
No ratings yet
Assignment-1 QT
3 pages
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
No ratings yet
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
6 pages
Fluid Dynamics and Transport of PDF
No ratings yet
Fluid Dynamics and Transport of PDF
3 pages
ML Unit Wise Important Questions
No ratings yet
ML Unit Wise Important Questions
2 pages
Chapter 19 - Sight Reductions
No ratings yet
Chapter 19 - Sight Reductions
23 pages
Manozo, Pamela L. BSA3-A 8 Task Performance 1: Spss Solution
No ratings yet
Manozo, Pamela L. BSA3-A 8 Task Performance 1: Spss Solution
3 pages
Model Question Paper - IA, IB & IA, IIB EM&TM
No ratings yet
Model Question Paper - IA, IB & IA, IIB EM&TM
25 pages
Permutations and Combination
No ratings yet
Permutations and Combination
26 pages
Feedback Control System For Inverted Cart Pendulum
No ratings yet
Feedback Control System For Inverted Cart Pendulum
16 pages
Business Statistics Week 7 Tutorial Solutions
No ratings yet
Business Statistics Week 7 Tutorial Solutions
8 pages
Compare and Contrast
No ratings yet
Compare and Contrast
6 pages
Energies 11 02626 PDF
No ratings yet
Energies 11 02626 PDF
16 pages
Artigo SEPOPE - Redes Neurais - Ingles
No ratings yet
Artigo SEPOPE - Redes Neurais - Ingles
12 pages
Unit-II - ADS - IMP QP
No ratings yet
Unit-II - ADS - IMP QP
3 pages
GreenHouse Model IEEEICAACCA2022
No ratings yet
GreenHouse Model IEEEICAACCA2022
6 pages
Ngineering ATA Nalysis: Math 4
No ratings yet
Ngineering ATA Nalysis: Math 4
14 pages
Formulating and Solving LPs Using Excel Solver
No ratings yet
Formulating and Solving LPs Using Excel Solver
10 pages
Digital Communications Over Fading Channels M.K. Simon and M.S. Alouini 2005 Book Review
No ratings yet
Digital Communications Over Fading Channels M.K. Simon and M.S. Alouini 2005 Book Review
2 pages
06.trencher TR 2700 Daily Report Petroserv August 2022
No ratings yet
06.trencher TR 2700 Daily Report Petroserv August 2022
2 pages

Shin Deep Depth Estimation From Thermal Image CVPR 2023 Paper

Uploaded by

Shin Deep Depth Estimation From Thermal Image CVPR 2023 Paper

Uploaded by

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

Deep Depth Estimation from Thermal Image

Ukcheol Shin Jinsun Park In So Kweon

Robust and accurate geometric understanding against

1. Introduction Therefore, recent works have actively investigated alter-

Total # of RGB NIR Thermal Weather

(d) Driving Scenarios - Campus/City/Residential (e) Driving Scenarios - Road/Suburban

that can provide precise distance information. On the other 6. Conclusion

You might also like