Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
Reliable Fusion of Tof and Stereo Depth Driven by Confidence Measures
by Confidence Measures
1 Introduction
Depth estimation is a challenging computer vision problem for which many dif-
ferent solutions have been proposed. Among them, passive stereo vision systems
are widely used since they only require a pair of standard cameras and can pro-
vide a high resolution depth estimation in real-time. However, even if recent
research in this field has greatly improved the quality of the estimated geom-
etry [1], results are still not completely reliable and strongly depend on scene
characteristics. Active devices like ToF cameras and light-coded cameras (e.g.,
Microsoft Kinect), are able to robustly estimate in real time the 3D geometry
of a scene but they are also limited by a low spatial resolution and a high level
of noise in their measurements, especially for low reflective surfaces. Since the
characteristics of ToF cameras and stereo data are complementary, the problem
of their fusion has attracted considerable interest in the last few years.
An effective fusion scheme requires two fundamental building blocks: the first
is an estimation of dense confidence measures for each device and the second is an
2 Marin G., Zanuttigh P., Mattoccia S.
efficient fusion algorithm that estimates the depth values from the data of the two
sensors and their confidence values. In this paper we address these requirements
by introducing accurate models for the estimation of the confidence measures for
ToF and stereo data depending on the scene characteristics at each location, and
then extending the Local Consistency (LC) fusion framework of [2] to account
for the confidence measures associated with the acquired data. First, the depth
data acquired by the ToF camera are upsampled to the spatial resolution of
the stereo vision images by an efficient upsampling algorithm based on image
segmentation and bilateral filtering. A reliable confidence map for the ToF depth
data is computed according to different clues including the mixed pixel effect
caused by the finite size of ToF sensor pixels. Second, a dense disparity map is
obtained by a global (or semi-global) stereo vision algorithm, and the confidence
measure of the estimated depth data is computed considering both the raw block
matching cost and the globally optimized cost function. Finally, the upsampled
ToF depth data and the stereo vision disparity map are fused together. The
proposed fusion algorithm extends the LC method [3] by taking into account
the confidence measures of the data produced by the two devices and providing
a dense disparity map with subpixel precision. Both the confidence measures and
the subpixel disparity estimation represent novel contributions not present in the
previous versions of the LC framework [3,2], and to the best of our knowledge,
the combination of local and global cost functions is new and not used by any
other confidence measure proposed in the literature.
2 Related Work
Matricial ToF range cameras have been the subject of several recent studies, e.g.,
[4,5,6,7,8,9]. In particular, [8] focuses on the various error sources that influence
range measurements while [9] presents a qualitative analysis of the influence of
scene reflectance on the acquired data.
Stereo vision systems have also been the subject of a significant amount of
research, and a recent review on this topic can be found in [1]. The accuracy of
stereo vision depth estimation strongly depends on the framed scene’s character-
istics and the algorithm used to compute the depth map, and a critical issue is
the estimation of the confidence associated with the data. Various metrics have
been proposed for this task and a complete review can be found in [10].
These two subsystems have complementary characteristics, and the idea of
combining ToF sensors with standard cameras has been used in several recent
works. A complete survey of this field can be found in [11,6]. Some work focused
on the combination of a ToF camera with a single color camera [12,13,14,15,16,17].
An approach based on bilateral filtering is proposed in [13] and extended in [14].
The approach of [16] instead exploits an edge-preserving scheme to interpolate
the depth data produced by the ToF sensor. The recent approach of [15] also
accounts for the confidence measure of ToF data. The combination of a ToF
camera and a stereo camera is more interesting, because in this case both sub-
systems can produce depth data [18,9,19,20]. A method based on a probabilistic
Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures 3
3 Proposed Method
We consider an acquisition system made of a ToF camera and a stereo vision
system. The goal of the proposed method is to provide a dense confidence map
for each depth map computed by the two sensors, then use this information to
fuse the two depth maps into a more accurate description of the 3D scene. The
approach assumes that the two acquisition systems have been jointly calibrated,
e.g., using the approach of [21]. In this method, the stereo pair is rectified and
calibrated using a standard approach [28], then the intrinsic parameters of the
ToF sensor are estimated. Finally, the extrinsic calibration parameters between
the two systems are estimated with a closed-form technique. The proposed al-
gorithm is divided into three different steps:
1. The low resolution depth measurements of the ToF camera are reprojected
into the lattice associated with the left camera and a high resolution depth-
map is computed by interpolating the ToF data. The confidence map of ToF
depth data is estimated using the method described in Section 4.
2. A high resolution depth map is computed by applying a stereo vision algo-
rithm on the images acquired by the stereo pair. The confidence map for
stereo depth data is estimated as described in Section 5.
3. The depth measurements obtained by the upsampled ToF data and the
stereo vision algorithm are fused together by means of an extended version of
the LC technique [3] using the confidence measures from the previous steps.
bf bf 2σz σz
2σd = |d1 − d2 | = − = bf 2 ⇒ σd = bf (3)
z − σz z + σz z − σz2 z2 − σz2
where b is the baseline of the stereo system and f is the focal length of the
camera. Equation (3) provides the corresponding standard deviation of the noise
in the disparity space for a given depth value. The standard deviation of the
measurements in the disparity space is also affected by the mean value of the
measurement itself, unlike the standard deviation of the depth measurement.
In order to map the standard deviation of the disparity measurements to
the confidence values, we define two thresholds computed experimentally over
multiple measurements. The first is σmin = 0.5, corresponding to the standard
deviation of a bright object at the minimum measurable distance of 0.5 m,
while the second is σmax = 3, corresponding to the case of a dark object at
the maximum measurable distance of 5 m with the SR4000 sensor used in the
experimental results dataset. If a different sensor is employed, the two thresholds
can be updated by considering these two boundary conditions. Then, we assume
that values smaller than σmin correspond to the maximum confidence value,
6 Marin G., Zanuttigh P., Mattoccia S.
i.e., PAI = 1, values bigger than σmax have PAI = 0 while values in the interval
[σmin , σmax ] are linearly mapped to the confidence range [0, 1], i.e.:
1 if σd ≤ σmin
σmax −σd
PAI = σmax −σmin if σmin < σd < σmax (4)
0 if σd ≥ σmax
Confidence from local variance One of the main limitations of (2) is that
it does not take into account the effect of the finite size of ToF sensor pixels,
i.e., the mixed pixel effect [22]. In order to account for this issue we introduce
another term in the proposed confidence model. When the scene area associated
with a pixel includes two regions at different depths, e.g. close to discontinuities,
the resulting estimated depth measure is a convex combination of the two depth
values. For this reason, it is reasonable to associate a low confidence to these
regions. The mixed pixel effect leads to convex combinations of depth values but
this is not true for the multipath effect. These considerations do not affect the
design of the ToF confidence since the LV metric just assumes that pixels in
depth discontinuities are less reliable. If pixel pTi OF in the low resolution lattice
of the ToF camera is associated with a scene area crossed by a discontinuity,
some of the pixels pTj OF in the 8-neighborhood N (pTi OF ) of pTi OF belong to
points at a closer distance, and some other pixels to points at a farther distance.
Following this intuition the mean absolute difference of the points in N (pTi OF )
has been used to compute the second confidence term, i.e.:
1 X
DlT OF = |zi − zj | (5)
N (pT OF )
i j∈N (pT
i
OF )
where N (pTi OF ) is the cardinality of the considered neighborhood, in this case
equal to 8, and zi and zj are the depth values associated with pixels pTi OF and
pTj OF , respectively. We use the mean absolute difference instead of the variance to
avoid assigning very high values to edge regions due to the quadratic dependence
of the variance with respect to the local differences. For this term we used the
depth values and not the disparity ones because the same depth difference would
lead to different effects on the confidence depending if close or far points are
considered. This computation is performed for every pixel with a valid depth
value. Notice that some pTj OF considered in an 8-connected patch may not have
a valid value. In order to obtain a reliable map, a constant value Kd = Th has
been used in the summation (5) in place of |zi − zj | for the pixels pTj OF without
a valid depth value. To obtain the confidence information Dl on the left camera
lattice, samples pi on this lattice are projected on the ToF camera lattice and
the corresponding confidence value is selected after a bilinear interpolation.
Points with high local variance are associated with discontinuities, therefore,
low confidence should be assigned to them. Where the local variance is close
to zero, the confidence should be higher. In order to compute the confidence
term we normalize Dl to the [0, 1] interval by defining a maximum valid absolute
Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures 7
1− D
Th
l
if Dl < Th
PLV = (6)
0 if Dl ≥ Th
0.8 0.8
0.6 0.6
Cost
Cost
0.4 0.4
0.2 0.2
0 0
70 80 90 100 110 120 70 80 90 100 110 120
Disparity Disparity
a) b)
Fig. 1. Comparison of local (blue) and global (red ) costs: a) Cost functions of a repeti-
tive pattern; b) Cost functions of a uniform region. The green line represent the ground
truth disparity value.
surrounding the selected point has a periodic pattern and in Fig. 1 b) the region
surrounding the selected point has a uniform color. However, the global cost
function has a sharp peak and conventional confidence measures based only on
global cost analysis would assign a high confidence to these pixels.
The terminology used to denote the points of interest on the cost functions
is the following: the minimum cost for a pixel is denoted by C1 and the corre-
sponding disparity value by d1 , i.e.: C1 = C(d1 ) = mind C(d), where disparity d
has subpixel resolution. The second smallest cost value which occurs at disparity
d2 is C2 . For the selection of C2 , disparity values that are too close to d1 (i.e.,
|d2 − d1 | ≤ 1) are excluded to avoid suboptimal local minima too close to d1 .
The proposed stereo confidence metric PS is the combination of multiple
clues, depending both on the properties of the local cost function and on the
relationship between local and global costs. In particular it is defined as the
product of three factors:
∆C l min{∆dl , γ} min{∆dlg , γ}
PS = 1− 1− (7)
C1l γ γ
where ∆C l = C2l −C1l is the difference between the second and first minimum
local cost, ∆dl = |dl2 − dl1 | is the corresponding absolute difference between the
second and first minimum local cost locations, ∆dlg = |dl1 − dg1 | is the absolute
difference between the local and global minimum cost locations and γ is a nor-
malization factor. The first term accounts for the robustness of the match, both
the cost difference and the value of the minimum cost are important, as the
presence of a single strong minimum with an associated small cost are usually
sufficient conditions for a good match. However, in the case of multiple strong
matches, the first term still provides a high score, e.g., in regions of the scene
with a periodic pattern (Fig. 1b). The second term is a truncated measure of
the distance between the first two cost peaks. It discriminates potentially bad
matches due to the presence of multiple local minima. If the two minimum val-
Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures 9
ues are close enough, the associated confidence measure should provide a high
value since the global optimization is likely to propagate the correct value and to
provide a good disparity estimation. So far only the local cost has been consid-
ered so the last term accounts for the relationship between the local and global
cost functions, scaling the overall confidence measure depending on the level of
agreement between the local and global minimum locations. If the two minimum
locations coincide, there is a very high likelihood that the estimated disparity
value is correct, while on the other hand, if they are too far apart the global
optimization may have produced incorrect disparity estimations, e.g. due to the
propagation of disparity values in textureless regions. The constant γ controls
the weight of the two terms and sets the maximum distance of the two mini-
mum locations, after which the estimated value is considered unreliable. In our
experiments we set γ = 10. Finally, if a local algorithm is used to estimate the
disparity map, the same confidence measure can be used by considering only the
first two terms.
Although the proposed metric is not as good as top performing stereo metrics
evaluated in [10] in terms of AUC (e.g., PKRN), it performs better when used
in our fusion framework. Indeed our goal is to propose a good confidence metric
for the stereo system in the context of data fusion, where low confidence should
be assigned to pixels belonging to textureless surfaces propagated by the global
optimization, since ToF data are more reliable there. This feature is well captured
by the proposed metric, but not by conventional stereo confidence metrics.
where f, g and f 0 , g 0 refer to points in the left and right image respectively, ∆
accounts for spatial proximity, ∆ψ and ∆ω encode color similarity, and γs , γc
and γt control the behavior of the distribution (see [3] for a detailed description).
10 Marin G., Zanuttigh P., Mattoccia S.
For each point the plausibility originated by each valid depth measure is
computed and these multiple plausibilities are propagated to neighboring points
that fall within the active support. Finally, the overall plausibility accumulated
for each point is cross-checked by comparing the plausibility stored in the left
and right views and the output depth value for each point is selected by means of
a winner-takes-all strategy. The LC approach has been extended in [2] to allow
the fusion of two different disparity maps. In this case, for each point of the
input image there can be 0, 1 or 2 disparity hypotheses, depending on which
sensor provides a valid measurement. Although [2] produces reasonable results,
it has the fundamental limitation that gives exactly the same relevance to the
information from the two sources without taking into account their reliability.
In this paper we propose an extension to this approach in order to account
for the reliability of the measurements of ToF and stereo described in Sections
4.2 and 5.2. In order to exploit these additional clues, we extend the model of
[2] by multiplying the plausibility for an additional factor that depends on the
reliability of the considered depth acquisition system, computed for each sensor
in the considered point, as follows:
X
Ωf0 (d) = PT (g)Pf,g,T (d) + PS (g)Pf,g,S (d) (10)
g∈A
where PT (g) and PS (g) are the confidence maps for ToF and stereo data respec-
tively, Pf,g,T (d) is the plausibility for ToF data and Pf,g,S (d) for stereo data.
The proposed fusion approach implicitly addresses the complementary na-
ture of the two sensors. In fact, in uniformly textured regions, where the stereo
range sensing is quite inaccurate, the algorithm should propagate mostly the
plausibility originated by the ToF camera. Conversely, in regions where the ToF
camera is less reliable (e.g. dark objects), the propagation of plausibility con-
cerned with the stereo disparity hypothesis should be more influential. Without
the two confidence terms of (10), all the clues are propagated with the same
weight, as in [2]. In this case an erroneous disparity hypothesis from a sensor
could negatively impact the overall result. Therefore, the introduction of reliabil-
ity measures allows us to automatically discriminate between the two disparity
hypotheses provided by the two sensors and thus improve the fusion results.
The adoption of the proposed model for the new plausibility is also supported
by the nature of the confidence maps, that can be interpreted as the probability
that the corresponding disparity measure is correct. A confidence of 0 means
that the disparity value is not reliable and in this case such hypothesis should
not be propagated. The opposite case is when the confidence is 1, meaning a high
Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures 11
likelihood that the associated disparity is correct. All the intermediate values will
contribute as weighting factors. This definition is also coherent when a disparity
value is not available, for example due to occlusions: the associated confidence
is 0 and propagation does not occur at all. An interesting observation on the
effectiveness of this framework is that Eq. (10) can be extended to deal with
more than two input disparity maps, simply adding other plausibility terms for
the new disparity clues and an associated confidence measures. Other families of
sensors can be included as well, by simply devising proper confidence measures.
Both ToF and stereo disparity maps are computed at subpixel resolution,
but the original LC algorithm [3] only produces integer disparities, therefore
we propose an additional extension in order to handle subpixel precision. We
consider a number of disparity bins equals to the number of disparities to be
evaluated multiplied by the inverse of the desired subpixel resolution (i.e., we
multiply by 2 to if the resolution is 0.5). Then, at every step the algorithm
propagates the plausibility of a certain disparity by contributing to the closest
bin. With this strategy, the computation time remains the same as in the original
approach [3,31] and only the final winner-takes-all step is slightly affected.
7 Experimental Results
Fig. 2. Results of the proposed fusion framework. Each row corresponds to one of the 5
different scenes. Dark blue pixels correspond to points that have been ignored because
of occlusions or because a ground truth disparity value is not available. The intensity
of red pixels is proportional to the MSE. (Best viewed in color ).
in all the scenes has higher accuracy than each of the two systems considered
independently.
Fig. 3 shows the confidence maps that are used in the fusion process: the
first row shows the left color camera, the second row shows the ToF confidence
map, and the third row shows the stereo one. Starting from the ToF confidence,
the amplitude and intensity related term tends to assign lower confidence to the
upper part of the table that is almost parallel to the emitted rays. Therefore
the amplitude of the received signal is low, thus reducing the precision. This
term also assigns a smaller confidence to farther regions, reflecting another well
known issue of ToF data. ToF confidence is low for dark objects but measurement
accuracy depends on the reflectivity of the surface at ToF IR wavelengths and
the reflectivity can be different for objects looking similar to the human eye (i.e.,
the black plastic finger in scene 5 reflects more IR light than the bear’s feet). In
addition, the four corners of the image also have lower confidence, in agreement
with the lower quality of the signal in those regions, affected by higher distortion
and attenuation. Local variance instead, as expected, contributes by assigning a
lower confidence value to points near depth discontinuities.
Stereo confidence has on average a lower value, consistently with the fact
that stereo data is less accurate (see Table 1) but locally reflects the texture of
the scene, providing high values in correspondence of high frequency content,
and low values in regions with uniform texture (the blue table) or periodic pat-
tern (e.g., the green book). Scene 2 compared to scene 1 clearly shows the effect
that textured and untextured regions have in the confidence map. The map in
the first scene is able to provide enough texture to consider reliable the depth
measurements in that region. In the orange book on the left side, stereo confi-
dence assigns high values only to the edges and to the logo in the cover, correctly
Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures 13
Fig. 3. Confidence maps for ToF and stereo disparity. Brighter areas correspond to
higher confidence values, while darker pixels are less confident.
penalizing regions with uniform texture. The teddy bear in scene 3, 4 and 5 has
more texture than the table or the books and the relative confidence value is
higher overall. The proposed stereo metric has been developed targeting the fu-
sion of data from the two sensors, and low confidence is associated to textureless
regions on purpose, even if the estimated depth is correct.
Table 1 compares the proposed approach with other state-of-the-art meth-
ods for which we got an implementation from the authors or we were able to
re-implement. Since the output of the fusion process is a disparity map, we com-
puted the error in the disparity space and considered the mean squared error
(MSE) as the metric. For a fair comparison, we computed the error on the same
set of valid pixels for all the methods, where a pixel is considered valid if it has
a valid disparity value in all the compared maps and in the ground truth data.
We also consider the ideal case obtained by selecting for each pixel the ToF or
stereo disparity closer to the ground truth.
Scene 1 2 3 4 5 Avg.
ToF Int. 9.83 10.33 14.43 8.68 15.12 11.67
Stereo 19.17 27.83 18.06 25.52 11.49 20.42
Fusion 7.40 9.33 6.92 6.30 8.39 7.67
[2] 7.43 9.27 12.60 7.99 13.01 10.06
[13] 8.49 9.92 11.44 9.88 15.19 10.98
[23] 9.04 10.04 13.04 9.52 14.03 11.13
[22] 10.98 13.19 9.83 13.93 13.10 12.21
Ideal 2.50 2.60 3.22 2.42 3.16 2.78
Table 1. MSE in disparity units with respect to the ground truth, computed only on
non-occluded pixels for which a disparity value is available in all the methods.
14 Marin G., Zanuttigh P., Mattoccia S.
The average MSE has been calculated considering all the five scenes, and the
results are reported in Table 1. The disparity map of the proposed framework
is compared with the estimates of ToF and stereo system alone and with the
state-of-the-art methods of [2], [13], [23] and [22]. For the methods of [2] and
[22] we obtained the results from the authors. The method of [22] has been com-
puted from the ToF viewpoint at a different resolution, therefore we reprojected
the data on the left camera viewpoint to compare it with other methods. We
re-implemented the methods of [13] and [23] following the description in the pa-
pers. From the MSE values on the five different scenes, it is noticeable how the
proposed framework provides more accurate results than the interpolated ToF
data and the stereo measurements alone. Even if stereo data have typically lower
accuracy the proposed method is still able to improve the results of the ToF in-
terpolation, especially by leveraging on the more accurate edge localization of
stereo data. The proposed approach also obtains a lower average MSE than all
the compared methods. The average error is about 24% lower than [2], which is
the best among the compared schemes. Conventional stereo confidence metrics
of [10] produce an higher MSE if compared with our stereo metric, e.g., by us-
ing PKRN as confidence in the fusion framework the average MSE is 7.9. Our
method has better performance than that of the compared schemes for all scenes
except the very simple scene 2, in particular notice how it has a larger margin on
the most complex scenes. This implies that our approach captures small details
and complex structures while many of the compared approaches rely on low pass
filtering and smoothing techniques which work well on simple planar surfaces but
cannot handle more complex situations. Enlarged figures and a more detailed
analysis are available at http://lttm.dei.unipd.it/paper_data/eccv16.
References
1. Tippetts, B., Lee, D., Lillywhite, K., Archibald, J.: Review of stereo vision al-
gorithms and their suitability for resource-limited systems. Journal of Real-Time
Image Processing (2013) 1–21
2. Dal Mutto, C., Zanuttigh, P., Mattoccia, S., Cortelazzo, G.: Locally consistent tof
and stereo data fusion. In: Workshop on Consumer Depth Cameras for Computer
Vision (ECCV Workshop). Springer (2012) 598–607
3. Mattoccia, S.: A locally global approach to stereo correspondence. In: Proc. of 3D
Digital Imaging and Modeling (3DIM). (October 2009)
4. Hansard, M., Lee, S., Choi, O., Horaud, R.: Time-of-Flight Cameras: Principles,
Methods and Applications. SpringerBriefs in Computer Science. Springer (2013)
5. Remondino, F., Stoppa, D., eds.: TOF Range-Imaging Cameras. Springer (2013)
6. Zanuttigh, P., Marin, G., Dal Mutto, C., Dominio, F., Minto, L., Cortelazzo, G.M.:
Time-of-Flight and Structured Light Depth Cameras: Technology and Applica-
tions. 1 edn. Springer International Publishing (2016)
7. Piatti, D., Rinaudo, F.: Sr-4000 and camcube3.0 time of flight (tof) cameras: Tests
and comparison. Remote Sensing 4(4) (2012) 1069–1089
8. Kahlmann, T., Ingensand, H.: Calibration and development for increased accuracy
of 3d range imaging cameras. Journal of Applied Geodesy 2 (2008) 1–11
9. Gudmundsson, S.A., Aanaes, H., Larsen, R.: Fusion of stereo vision and time of
flight imaging for improved 3d estimation. Int. J. Intell. Syst. Technol. Appl. 5
(2008) 425–433
10. Hu, X., Mordohai, P.: A quantitative evaluation of confidence measures for stereo
vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11)
(2012) 2121–2133
11. Nair, R., Ruhl, K., Lenzen, F., Meister, S., Schäfer, H., Garbe, C., Eisemann,
M., Magnor, M., Kondermann, D.: A survey on time-of-flight stereo fusion. In
Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A., eds.: Time-of-Flight and Depth
Imaging. Sensors, Algorithms, and Applications. Volume 8200 of Lecture Notes in
Computer Science. Springer Berlin Heidelberg (2013) 105–127
12. Diebel, J., Thrun, S.: An application of markov random fields to range sensing.
In: In Proc. of NIPS, MIT Press (2005) 291–298
13. Yang, Q., Yang, R., Davis, J., Nister, D.: Spatial-depth super resolution for range
images. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). (2007) 1–8
14. Yang, Q., Ahuja, N., Yang, R., Tan, K., Davis, J., Culbertson, B., Apostolopoulos,
J., Wang, G.: Fusion of median and bilateral filtering for range image upsampling.
Image Processing, IEEE Transactions on (2013)
15. Schwarz, S., Sjostrom, M., Olsson, R.: Time-of-flight sensor fusion with depth mea-
surement reliability weighting. In: 3DTV-Conference: The True Vision - Capture,
Transmission and Display of 3D Video (3DTV-CON), 2014. (2014) 1–4
16. Garro, V., Dal Mutto, C., Zanuttigh, P., M. Cortelazzo, G.: A novel interpolation
scheme for range data with side information. In: Proc. of CVMP. (2009)
17. Dolson, J., Baek, J., Plagemann, C., Thrun, S.: Upsampling range data in dy-
namic environments. In: Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). (2010) 1141–1148
18. Kuhnert, K.D., Stommel, M.: Fusion of stereo-camera and pmd-camera data for
real-time suited precise 3d environment reconstruction. In: Proc. of Int. Conf. on
Intelligent Robots and Systems. (2006) 4780 – 4785
16 Marin G., Zanuttigh P., Mattoccia S.
19. Frick, A., Kellner, F., Bartczak, B., Koch, R.: Generation of 3d-tv ldv-content
with time-of-flight camera. In: Proc. of 3DTV Conf. (2009)
20. Kim, Y.M., Theobald, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-
view image and tof sensor fusion for dense 3d reconstruction. In: Proc. of 3D
Digital Imaging and Modeling (3DIM). (October 2009)
21. Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.: A probabilistic approach to ToF and
stereo data fusion. In: Proc. of 3DPVT, Paris, France (2010)
22. Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.: Probabilistic tof and stereo data
fusion based on mixed pixels measurement models. IEEE Transactions on Pattern
Analysis and Machine Intelligence 37(11) (2015) 2260–2272
23. Zhu, J., Wang, L., Yang, R., Davis, J.: Fusion of time-of-flight depth and stereo
for high accuracy depth maps. In: Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). (2008)
24. Zhu, J., Wang, L., Gao, J., Yang, R.: Spatial-temporal fusion for high accuracy
depth maps using dynamic mrfs. IEEE Transactions on Pattern Analysis and
Machine Intelligence 32 (2010) 899–909
25. Zhu, J., Wang, L., Yang, R., Davis, J.E., Pan, Z.: Reliability fusion of time-of-flight
depth and stereo geometry for high quality depth maps. IEEE Transactions on
Pattern Analysis and Machine Intelligence 33(7) (2011) 1400–1414
26. Nair, R., Lenzen, F., Meister, S., Schaefer, H., Garbe, C., Kondermann, D.: High
accuracy tof and stereo sensor fusion at interactive rates. In: Proceedings of Eu-
ropean Conference on Computer Vision Workshops (ECCVW). (2012)
27. Evangelidis, G., Hansard, M., Horaud, R.: Fusion of Range and Stereo Data for
High-Resolution Scene-Modeling. IEEE Transactions on Pattern Analysis and
Machine Intelligence 37(11) (2015) 2178 – 2192
28. Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on
Pattern Analysis and Machine Intelligence 22 (1998) 1330–1334
29. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space
analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5)
(2002) 603 –619
30. Hirschmuller, H.: Stereo processing by semiglobal matching and mutual informa-
tion. IEEE Transactions on Pattern Analysis and Machine Intelligence (2008)
31. Mattoccia, S.: Fast locally consistent dense stereo on multicore. In: 6th IEEE
Embedded Computer Vision Workshop (CVPR Workshop). (June 2010)