Geo-Consistency For Multi-Camera Stereo By: July 2005

This document summarizes a research paper about a new model for overcoming occlusion problems in wide baseline multi-camera stereo reconstruction. The model detects occlusions by analyzing inconsistencies in depth maps when reprojected into each camera. Rather than explicitly modeling occlusions, the algorithm modifies the matching cost function based on detected occlusions by removing cameras that cause inconsistencies. The algorithm iteratively refines the depth map and matching cost function until convergence. The framework can be applied to graph-theoretic and other stereo algorithms. Experimental results using real imagery demonstrate the validity of the approach.

Uploaded by

Joy Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views9 pages

Geo-Consistency For Multi-Camera Stereo By: July 2005

Uploaded by

Joy Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4156185

Geo-consistency for Multi-Camera Stereo by

Conference Paper · July 2005

DOI: 10.1109/CVPR.2005.168 · Source: IEEE Xplore

CITATIONS READS
23 54

3 authors, including:

Marc-Antoine Drouin
National Research Council Canada
22 PUBLICATIONS 67 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

active 3D imaging View project

All content following this page was uploaded by Marc-Antoine Drouin on 26 May 2015.

The user has requested enhancement of the downloaded file.

Geo-consistency for Wide Multi-Camera Stereo

Marc-Antoine Drouin Martin Trudeau Sébastien Roy

DIRO
Université de Montréal
{drouim,trudeaum,roys}@iro.umontreal.ca

Abstract
This paper presents a new model to overcome the occlu-
sion problems coming from wide baseline multiple camera
stereo. Rather than explicitly modeling occlusions in the
matching cost function, it detects occlusions in the depth
map obtained from regular efficient stereo matching algo-
rithms. Occlusions are detected as inconsistencies of the
depth map by computing the visibility of the map as it is
reprojected into each camera. Our approach has the par-
ticularity of not discriminating between occluders and oc-
cludees. The matching cost function is modified according
to the detected occlusions by removing the offending cam-
eras from the computation of the matching cost. The al-
gorithm gradually modifies the matching cost function ac-
cording to the history of inconsistencies in the depth map,
until convergence. While two graph-theoretic stereo algo-
rithms are used in our experiments, our framework is gen-
eral enough to be applied to many others. The validity of Figure 1. Example of occlusion. Occluded pixels appear in white,
our framework is demonstrated using real imagery with dif- occluders in black.
ferent baselines.

1. Introduction
This makes it possible to use a standard efficient algorithm
The goal of binocular stereo is to reconstruct the 3D during each iteration, instead of tackling a very difficult op-
structure of a scene from two views. As the baseline gets timization problem. Futhermore, our approach guarantees
wider, the problem of occlusion, wich is often considered to preserve the consistency between the recovered visibility
negligible with small baseline configurations, can become and geometry, a property we call geo-consistency. In this
severe and limit the quality of the obtained depth map. Oc- paper, the maximum flow [19] and graph cut [2] formula-
clusion occurs when part of a scene is visible in one camera tions are used to solve each iteration. Our framework is gen-
image but not in the other (see figure 1). The difficulty of eral enough to be used with many other stereo algorithms.
detecting occlusion comes from the fact that it is induced A survey paper by Scharstein and Szeliski [21] compares
by the 3D structure of the scene, which is unknown until various standard algorithms.
the correspondence is established, as it is the final goal of
the algorithm. We propose a novel multiple camera stereo The rest of this paper is divided as follows: in Section 2,
algorithm that relies on photometric and geometric incon- previous work will be presented. Section 3 describes occlu-
sistencies in the depth map to detect occlusions. As this sion modeling and geometric inconsistency. Our proposed
algorithm is iterative, it does not explicitly model an oc- algorithm is described in Section 4. Experimental results
clusion state or add extra constraints to the cost function. are presented in Section 5.
2. Previous work configuration f : P 7→ Z associates a depth label to every
pixel. When occlusion is not modeled, the energy function
In a recent empirical comparison of strategies to over- to minimize is
come occlusion for 2 cameras, Egnal [4] enumerates 5 basic
ones: left-right checking, bimodality test, goodness Jumps X X X
constraint, duality of depth discontinuity and occlusion, and E(f ) = e(p, f (p)) + s(p, r, f (p), f (r))
p∈P p∈P r∈Np
uniqueness constraint. Some algorithms that have been pro- | {z } | {z }
posed rely on one or more of the these strategies, and are pointwise smoothing
often based on varying a correlation window position or likelihood
size [9, 6, 26, 10]. These methods are binocular in na- (1)
ture and do not generalize well to the case of multiple arbi- where Np is a neighborhood of pixel p. This can be solved
trary cameras. Other algorithms use dynamic programming efficiently because the likelihood term e(p, f (p)) is inde-
[16, 7, 3] because of its ability to efficiently solve more pendent from e(p′ , f (p′ )) for p 6= p′ , and the smoothing
complex matching costs and smoothing terms. Two meth- term has a simple 2-site clique form.
ods using graph theoretical approaches [8, 11] have been To model occlusion, we must compute the volumetric
proposed, but again they do not generalize to multiple cam- visibility Vi (q, f ) of a 3D reference point q from the point
era configurations. of view of a camera i, given a depth configuration f . It is
When extending binocular stereo to multiple cameras, set to 1 if the point is visible, and 0 otherwise. Visibility
the amount of occlusion increases since each pixel of the is a long range interaction and knowledge about immediate
reference camera can be hidden in more than one support- neighborhood configuration is insufficient most of the time
ing camera. This is particularly true when going from a sin- for computing it. The visibility information is collected into
gle to a multiple-baseline configuration, such as a regular a vector, the visibility mask
grid of cameras [15]. Some researchers have proposed spe-
cially designed algorithms to cope with occlusion in mul- V (q, f ) = (V1 (q, f ), . . . , VN (q, f ))
tiple camera configurations. Amongst these, Kang et al.
[10] proposed a visibility approach. While they did not where N is the number of cameras outside the reference;
improve over adaptive windows, their scheme was based a vector (1, . . . , 1) means that the 3D point is visible in all
on the hypothesis that a low matching cost function im- supporting cameras, (0, . . . , 0) that it is invisible instead.
plies the absence of occlusion. This hypothesis is also We call M the set of all possible visibility masks; an M-
made in [15, 20, 17, 18]. In contrast, we do not rely on configuration g : P 7→ M associates a mask to every pixel.
such an assumption. In [27], a relief reconstruction ap- Using this, we transform Eq. 1 into an energy function with
proach based on belief propagation is presented where the mask
correct visibility is approximated by using a low resolution X
E(f, g) = e(p, f (p), g(p)) + smoothing. (2)
base surface obtained from manually established correspon- p∈P
dences. In [14, 23], visibility-based methods are introduced.
The matching cost incorporates the visibility information Typically, we define
into a photo-consistency matching criteria, thereby implic-
itly modeling occlusion in the reconstruction process. Our m · C(p|z)
e(p, z, m) = for p ∈ P, z ∈ Z, m ∈ M
method differs completely in the way it handles smoothing |m|
and by its ability to recover from bad “carving”. Similarly,
where the 3D point p|z is p augmented by z and C(q) =
a level-set method [5] uses the visibility information from
(C1 (q), . . . , CN (q)) is the vector of matching costs of 3D
the evolving reconstructed surface to explicitly model oc-
point q for each camera. We use |m| to represent the l1 -
clusion. In [12] a stereo algorithm based on graph cuts is
norm which is just the number of cameras used from q. The
presented. It strictly enforces visibility constraints to guide
case where |m| = 0 is discussed in section 4.2. A simple
the matching process and ensures that it does not contain
cost function is Ci (q) = (Iref (Mref q) − Ii (Mi q))2 where
any geometric inconsistencies. The formulation imposes
Mref and Mi are projection matrices from the world to the
strict constraints on the form of the smothing term, con-
images of camera ref and i respectively, and Iref and Ii are
straints that will not apply to our method as we will see.
these images. Now, in order to model occlusion properly,
we simply need to examine the case g(p) = V (p|f (p), f ).
3. Modeling occlusion and Geo-consistency If the visibility masks were already known and fixed,
the occlusion problem would be solved and only photogra-
We have a set of reference pixels P, for which we want metric ambiguity would remain to be dealt with; the en-
to compute depth, and a set of depth labels Z. A Z- ergy function (2) would then be relatively easy to minimize.
Since this is not the case and f and V (., f ) are depen- minimize an approximation of Eq. 2 subject to the con-
dent, we relax the problem by introducing the concept of straint of Eq. 3 with spatial smoothing by moving iteratively
geo-consistency: we say that a Z-configuration f is geo- from one geo-consistent solution to the other.
consistent with an M-configuration g if
4. Stereo with a new implicit occlusion model
g(p) ≤ V (p|f (p), f ) (3)
We propose to reduce the dependency between f and g
for each component of these vectors and all p ∈ P. The by making it temporal: we let f 0 be the Z-configuration
inequality thus allows the mask to contain a subset of the minimizing E(f, g 0 ) in f and for t > 0, we define itera-
visible cameras. The removal of extra cameras has been tively f t as the function minimizing
observed to have little impact on the quality of the solution X
[15]. Our problem becomes the minimization of Eq. 2 in f e(p, f t (p), V (p|f t (p), f t−1 ) + smoothing (4)
and g, with the constraint that f is geo-consistent with g. p∈P

and g t as
3.1. Solving simultaneously for depth and visibility g t (p) = V (p|f t (p), f t−1 ),
that is to say, f t minimizes E(f t , g t ), where g t depends on
Lets define g 0 (p) = (1, . . . , 1) for all p ∈ P; this cor- f t according to the above equation. Now, this can be done
responds to the case where all cameras are visible by all using any standard algorithm. Unfortunately, this process
points. Minimizing E(f, g 0 ) in f is equivalent to minimiz- does not always converge [10].
ing E(f ). In general, it is possible to minimize E(f, g)
by explicitly testing all combinations of depth labels and 4.1. Using history for convergence
visibility masks in Z × M. Since #M = 2N , this effec-
tively makes the problem too big to be solved except in the Because of the way g t is defined, cameras that are re-
simplest cases. One way to reduce the number of visibil- moved at one iteration can be kept at the next, possibly in-
ity masks is to realize that for a given camera configuration, troducing cycles. To guarantee convergence, we introduce
some masks may occur for no configuration f . This makes a visibility history mask independent of the matching cost
it possible to precompute a smaller subset of M. Another function value
way to reduce the number of masks is simply to decide on
a reasonable subset to use [15]. Unfortunately, even with H(q, t) = (H1 (q, t), . . . , HN (q, t))
a small number of masks, it is still not practical to mini- where N is again the number of cameras other than the ref-
mize in f and g simultaneously. We can however use photo- erence and
consistency alone to select the visibility mask of a pixel, if Y
it is assumed equivalent to geo-consistency. In order to de- Hi (q, t) = Vi (q, f k ) = min Vi (q, f k ) (5)
0≤k≤t
termine the mask for a pixel p at depth f (p), we can try 0≤k≤t

each mask and select the most photo-consistent one, i.e. we The new problem is obtained by substituting H for V in
define gf∗ as Eq. 4 to obtain
X
t
gf∗ (p) = arg min e(p, f (p), m) w(m) EH (f t ) = e(p, f t (p), H(p|f t (p), t − 1) + smoothing
m∈M p∈P
(6)
where w(m) is a weight function favoring g 0 and eliminat- Mutatis mutandis, f t now minimizes EH t
(f t ) and g t (p) =
ing improbable masks. The problem thus becomes the min- H(p|f t (p), t − 1). This iterative process always converges
imization of E(f, gf∗ ) in f . Since e is point-wise indepen- (or stabilizes) in a polynomial number of steps. Indeed,
dent, the new problem is reduced to the original formula- H(q, t) is monotonically decreasing in t for all q; more-
tion of Eq. 1 and is easily solved using standard algorithms. over, if H(q, t − 1) = H(q, t) for all q, then f t = f t+1
This technique is used in [15, 20, 17]. However, the selected since both are solutions to the same minimization problem,
masks are not guaranteed to preserve geo-consistency. and the process has stabilized. We see that the number of
In space carving[14], the depth f (p) of a pixel is in- iterations is bounded by N · #P · #Z.
creased at a given step if it is not photo-consistent (which Furthermore, after convergence, the final configuration
is determined using a threshold). When depth is changed f T +1 = f T is geo-consistent with g T +1 ; this comes from
at a point, the mask configuration g is updated accordingly, the fact that for all p:
and so preserves geo-consistency. Space carving is a greedy
algorithm that solves Eq. 2 subject to the constraint of Eq. 3 g T +1 (p) = H(p|f T +1 (p), T ) = H(p|f T (p), T )
without smoothing. Kolmogorov and Zabih [12] tried to ≤ V (p|f T (p), f T ) = V (p|f T +1 (p), f T +1 ).
Scenes from Middlebury comparative study
Algo Tsukuba Head and Lamp Sawtooth
Real status of pixels classified from depth map as occluders
Reference Supporting occludee occluder regular occludee occluder regular
camera camera bp [24] 44.8 16.3 38.9 42.6 3.8 53.6
bnv [2] 50.4 15.4 34.2 42.6 4.3 53.3
Real status of pixels classified from depth map as occludees
DM occluders occludee occluder regular occludee occluder regular
DM occludees bp [24] 15.5 5.9 76.6 5.5 1.1 93.4
bnv [2] 16.4 5.8 77.8 7.2 1.1 91.7
Real status of pixels classified from depth map as regulars
occludee occluder regular occludee occluder regular
1 2 3 4 5 bp [24] 1.0 2.0 97.0 0.5 1.5 98.0
bnv [2] 1.0 2.0 96.9 0.5 1.5 98.0
GT occluders
Figure 3. Real (ground truth) status in percentages of pixels ac-
GT occludees cording to their classification. Examples from the Middlebury
comparative study [21]. In bold are the misclassifications favored
Figure 2. Effect of object enlargement on classification of occlud- by the overestimation of the disparity of occluded pixels.
ers and occludees of a scene viewed by 2 cameras. The ground
truth is in thick gray and the depth map in thick dashes. Occluders
and occludees are shown for both ground truth (GT) and com-
puted depth map (DM). Illustration of classification shift. Respec- ular are occluders than occludees. The observation above
tively, the 5 zones represent 1) regular pixels wrongly classified as discourages the direct use of visibility to update the visibil-
occludees 2) occludees correctly classified 3) occludees wrongly ity history mask. Instead, we introduce a pseudo-visibility
classified as occluders 4) occluders correctly classified 5) occlud-
ers wrongly classified as regular. V ′ (q, f ) = (V1′ (q, f ), . . . , VN′ (q, f ))

which compensates for the bias by labeling both occluders

We thus have an algorithm that converges to a geo- and occludees as invisible. An obvious consequence of this
consistent solution, but that can transit through intermediate definition is the fact that
ones that are not. This type of behavior differentiates our
approach from others that strictly enforce geo-consistency Vi′ (p|f (p), f ) ≤ Vi (p|f (p), f ) ∀p ∈ P, 1 ≤ i ≤ N.
during the optimization process [14, 5, 12].
The ordering constraint simply states that when scanning
4.2. Pseudo-visibility an epipolar line, the order in which we encounter two dif-
ferent objects visible in two images of a stereo pair must
For a given f , an occluder p|f (p) is a 3D point blocking be the same in the two images (see Fig 4-left). This con-
an occludee p′ |f (p′ ) in some camera. Figure 1 illustrates straint holds for most scenes (see Fig 4-right) [13]. While
the phenomenon. Each pixel of a depth map can be classi- this constraint is broken in some rare cases, it remains a
fied as an occluder, an occludee, or a regular pixel (neither powerful tool when dealing with occlusion and outliers. If
occluder nor occludee). We have observed experimentally we represent the depth map as an opaque mesh, we are guar-
that many algorithms have a tendency to overestimate the anteed to preserve the ordering constraint between the refer-
disparity of occluded pixels. This has the effect of making ence and any supporting camera for any point visible from
close objects larger, creating a shift in the pixel classifica- them. If a set of pixels O breaks the ordering constraint
tion of occludees and occluders. Occludees have a tendency between the reference camera and some supporting image
to be classified as occluders, occluders as regular pixels i at iteration t, then according to our definition of pseudo-
and regular pixels as occludees (see figure 2 ). To validate visibility (and using an opaque mesh), the history mask is
this assertion, we used the results of two of the best stereo updated to Hi (p|f t+1 (p), t) = 0 for all p in O. After con-
matchers evaluated with the Middlebury dataset. [21, 24, 2]. vergence for the final configuration f T we have for all p
[2] was ranked the best stereo matcher in two comparative H(p|f T +1 (p), T ) = H(p|f T (p), T − 1). In particular
studies [25, 21]. [24] appeared later and achieved an even Hi (p|f T +1 (p), T ) = 0 for all p ∈ O. Since the offending
lower error rate. For each obtained depth map, we com- camera i was not used to compute the final solution, the or-
puted the percentage of pixels classified as occluder by the dering constraint is respected between the reference camera
depth map that really are occludees and that of pixels clas- and the supporting camera i.
sified as occludees that really are regular (figure 3). Both The pseudo-visibility masks Vi′ are computed by using
turned out to be quite high. Since most pixels are regular, rendering techniques. Two renderings of the current depth
the percentage of wrong classification for them is low. Nev- map f are done from the point of view of each support-
ertheless, there is a clear bias: more pixels classified as reg- ing camera i: one with an ordinary Z-buffer and one with
Scenes from Middlebury
2 2 Algorithm Barn1 Barn2 Bull Poster Venus Sawtooth
FULL-BNV 3.5 % 3.1 % 0.7 % 3.7 % 3.4 % 3.3%
FULL-MF 4.0 % 5.4 % 0.7 % 3.4 % 4.4 % 3.8 %
GEO-BNV 0.8 % 0.6 % 0.4 % 1.1 % 2.4 % 1.1 %
1 GEO-MF 1.5 % 0.9 % 0.3 % 1.4 % 3.4 % 1.5 %
1 KAN-BNV 1.4 % 1.5 % 0.9 % 1.1 % 4.0 % 1.5%
KAN-MF 1.1 % 1.2 % 0.3 % 0.9 % 5.8 % 2.2 %

Figure 5. Error percentages for the different scenes of the Middle-

bury data set. The best performance for each image set is high-
lighted.
2 1 2 1 2 1 1 2

where g is defined as

3 if |Iref (Mref p) − Iref (Mref q)| < 5
Figure 4. Left) Ordering constraint is satisfied. In this camera con- g(p, r) =
figuration, the epipolar lines are parallel to the X-axis. The line 2 1 otherwise
is located to the left of the line 1 in both images. Right) Ordering
constraint is broken, the line 2 appears to the left of the line 1 in with l(p, r) = |f (p) − f (r)| for the maximum flow [19]
one image and to the right in the other. formulation and l(p, r) = δ(f (p) − f (r)) for graph cut
formulation [2]. The parameter λ is user-defined. For each
depth map computation, we chose the λ that achieved the
a reverse Z-buffer test. Two depth maps Lfi and Gfi are best performance. A pixel disparity is considered erroneous
thus obtained and contains minimal and maximal depth ob- if it differs by more than one disparity step from the ground
served by the camera. By comparing them, we can detect truth. This error measurement is compatible with the one
when two points of the mesh project to the same location for used in two comparative studies for 2-camera stereo [25,
a given supporting camera. When using rectified images, 21, 12].
this rendering process can be greatly sped up and simplified When minimizing Eq. 6, a visibility mask must be keep
by replacing it by a line drawing using depth buffers. The for every voxel of the reconstruction volume, that is, for
pseudo-visibility function Vi′ (q, f ) can therefore be com- each p ∈ P and z ∈ Z. To reduce memory requirements
puted as and the number of iterations, we kept a single visibility his-
tory for each pixel p Q regardless of the disparity z, i.e. (5)
Vi′ (q, f ) = δ Lfi (Ti q) − Gfi (Ti q) becomes Hi (p, t) = 0≤k≤t Vi (p|f k (p), f k ). This saves
a lot of memory but the convergence is no longer guaran-
where δ is 1 at 0 and 0 elsewhere. teed. We simply stop iterating when H(p, t) = H(p, t − 1)
It is possible for a voxel to have all its cameras removed, for all p ∈ P. We observed that running the algorithm any
i.e. H(p|z, t−1) = 0 even if V (p|z, t−1) 6= 0 In practice, longer only produce minor modifications to f t . However,
when this happens, we replace e(p, z, H(p|z, t − 1)) by the number of pixels with final zero masks increases, usu-
′
e(p, f t +1 (p)), H(p|z, t′ ) in the minimization process that ally in regions where the ordering constraint is broken. Pix-
computes f t (see Eq 6), where t′ is the largest index such els with zero masks are more prone to error, therefore we
that H(p|z, t′ ) 6= 0. It this case, depth is assigned only tried to improve results by adding a second step that reintro-
from the neighborhood through smoothing. duces eliminated cameras. This step consisted in fixing to
their final values the depth labels of the pixels with non-zero
final camera masks. The history of the others was discarded
5. Experimental results and the volumetric visibility recomputed, considering only
occlusion caused by the fixed pixels. Finally, an additional
In all our experiments, the matching cost function was minimization was run to produce a better depth map.
the same for all algorithms, that of [12] which is based on
[1]. We used color images but only the reference images 5.1 Middlebury
in gray scale are shown here. As for the smoothing term,
we used the experimentally defined smoothing function that This datasets from Middlebury [22] consists of 6 series
also comes from [12]: of 9 images of size 434 × 383. We used images 0 to 7 in
our experiments. The disparities between images 2 and 6
range from 0 to 19 pixels and 20 disparity steps were used.
s(p, r, f (p), f (r)) = λ g(p, r) l(f (p), f (r)) Since the ground truth was available for this dataset, we
GEO−BNV GEO−BNV pt
Figure 6. Reference images for the Head and Lamp scene (left)
and the Santa scene (right) from the Multiview Images database of
the University of Tsukuba.

used it to compute error percentages when using the second

image as the reference. We compared our method against
Nakamura’s [15] with a special choice of masks: either all
the cameras to the left of the reference are visible or all the Ground truth Ordering constraint mask
cameras to the right are. This specialized version of Naka-
mura is described in [10, 20]. The abbreviation used for Figure 7. Depth maps for the Head and Lamp scene (Multiview
this method is KAN. Our method is denoted by GEO. The Images database of the University of Tsukuba). Note for GEO-
BNV how the errors are concentrated in regions breaking the or-
results of GEO after one iteration are also shown under the
dering constraint. A mask of pixels breaking the ordering con-
label FULL, since this is a case where no occlusion mod- straint for the smallest baseline is also shown.
eling is made. We used 2 different stereo matchers: maxi-
mum flow [19] (MF) and graph cuts [2] (BNV). Results are Algorithm Baseline Error (whole image) Error (mask)
shown in Figure 5. While KAN’s modeling of occlusion GEO-BNV pt 1x 2.23% 1.53%
KZ1 1x 2.30% 2.01%
achieves impressive results, our approach using the BNV GEO-BNV 1x 2.46% 1.64%
stereo matcher perform better in 4 of the 6 sequences of GEO-MF 1x 3.42% 2.52%
GEO-BNV 2x 2.69% 2.11%
images and were close to KAN in the other two. Oddly GEO-MF 2x 2.62% 1.28%
enough, in the Venus scene, KAN had a higher error rate
than FULL, even though FULL is a simplified version of
KAN (a single mask with all the cameras). Our algorithm Figure 8. Percentages of error of the different algorithms for Head
takes an average of 8 iterations to converge, the improve- and Lamp scene, using 5 images. The right column contains the
amount of error computed after the removal of the pixels breaking
ment after just 4 is minimal.
the ordering constraint, the left shows it for all the pixels.

5.2. Tsukuba Head and Lamp

ter removing the pixels breaking the ordering constraint, in
This dataset is from the Multiview Image Database from particular part of the arm of the lamp. The mask was deter-
the University of Tsukuba (see Figure 6). It is composed of mined by re-projecting the ground truth in each supporting
a 5×5 image grid. Each image has a resolution of 384×288. camera, hence it differs for the two baselines.
The search interval is between 0 and 15 pixels and we used GEO-BNV almost performed as well as KZ1; when re-
16 disparity steps. We only used 5 images for each depth moving pixels breaking the ordering constraint, it achieved
map computation. The reference image is the center one a slightly lower error rate. For some algorithms, the er-
and the 4 supporting images are at an equal distance from ror rate decreased for the larger baseline. This counter-
it, arranged in a cross shape. In addition to those of GEO- intuitive behavior is explained by the fact that the matching
BNV and GEO-MF, the results of GEO-BNV when using cost function in the lamp region is less ambiguous when the
the recovery method described in section 5 are shown under baseline is larger. Figure 9 shows the stability to changes of
the label “GEO-BNV pt”. Some depth maps are shown in the smoothing parameter of our algorithm using graph cuts,
figure 7 and error precentages are shown in table 8. The en- giving the error percentage for 6 values of this parameter.
try KZ1 of the table comes directly from [12]. This method
achieved a very low error rate. However, as the authors 5.3. Baseline test
mentioned, the algorithm has trouble with low textured re-
gions (the top right corner for instance), therefore the error As the baseline increases, the amount of occlusion in
is somewhat underestimated by the removal of an 18 pixel the scene increases as well. A stereo matcher not affected
border in the ground truth. We also computed the error af- by occlusion would give identical depth maps for different
Algorithm Smoothness parameter
1/30 1/10 1 2 3 4
GEO-BNV 1x 2.61 2.67 2.66 2.55 3.53 4.12

Figure 9. Resistance to change of the smoothing parameter for the

Head and Lamp scene. The smoothing parameter increases by a
factor of 120, while the error rate varies by less than 1.6% for the
small baseline.
GEO−MF 1x GEO−MF 3x

baselines. To measure the level of resistance to change of

the baseline, for the different occlusion overcoming strate-
gies, we introduce the notion of depth map incompatibility.
A pixel p is incompatible in two depth maps i and j if

|fi (p) − fj (p)| > 1

KZ1 1x KZ1 3x
(a difference of 1 is meaningless as it could be the result
of discretization errors). It is important to mention that a
low incompatibility level is not necessarily a sign of low
error level in the depth map. But the amount of occlusion
increases with the baseline, and so should the error and in-
compatibility levels for stereo matchers that do not model
occlusion. To test the stability of our algorithm, we used
the Santa scenes from the Multiview Image Database of the NAKA−MF 1x NAKA−MF 3x

University of Tsukuba (see figure 6). This dataset contains

Figure 10. Depth maps obtained by 3 algorithms for 2 differ-
81 images in a 9 × 9 grid and the focal distance of the cam-
ent baselines (1x and 3x) on the Santa scene (Multiview Image
era was 10 mm with successive baselines of 20, 40, 60 and Database of the University of Tsukuba).
80 mm. We only used 5 images in a cross shape configu-
ration. Images were reduced by a factor of 2 to achieve a
resolution of 320 × 240. Each depth map was computed us- a way to apply this framework to add occlusion modeling
ing 23 disparity steps. Note the details on the right side of to standard stereo algorithms. Rather than explicitly model
the hat and on the candle. Again, for each depth map, the occlusion, our iterative approach relies on geo-consistency
smoothing parameter was adjusted to obtain the best possi- of depth maps to determine visibility of cameras and to ag-
ble performance. Since no ground truth was available, the gressively remove them to adjust the matching cost func-
choice was made by visual inspection of every depth maps. tion to the scene structure and to the bias in the type of
The figure 11 contains bar charts of the percentages of error committed by the stereo matcher. One of the main
pixels incompatible between the depth maps obtained for characteristic of our approach is that we do not discrimi-
two baselines. In addition to GEO-MF and KZ1, results nate between occluders and occludees. Our implicit occlu-
from the Nakamura approach [15, 20] using maximum flow sion model is successful in obtaining sharp and well-located
(NAKA-MF) and graph cuts (NAKA-BNV) were also in- depth discontinuities and allows the use of efficient standard
cluded. GEO-MF is twice as stable as NAKA-MF and stereo matching algorithms. Moreover, our framework does
yields less noisy depth maps. KZ1 and NAKA-BNV are not add any parameter or constraint to the matching pro-
less stable by a factor of 5 and more. The results for FULL- cess. The validity of our framework has been demonstrated
MF are again given. We can see in Figure 10 that GEO-MF on standard datasets with ground truth and was compared
achieves the best results for the third baselines. For the first to other state of the art occlusion models for multiple view
baseline, KZ1, NAKA-MF and GEO-MF performed simi- stereo. Our approach was also tested on increasingly wider
larly. The running times for GEO-MF and GEO-BNV are baselines in order to demonstrate its stability to increasing
respectively less than 5 and 9 minutes on a 2.0 GHz AMD amount of occlusion in the scene. While the validity of our
Athlon(tm) XP 2600+. framework has been demonstrated using two stereo match-
ing algorithms, it is general enough to be applied to others.
6. Conclusion It is not limited to regular grids of cameras and also works
with other camera configurations.
We have presented a new framework to model occlusion As for future work, better approach to recover from er-
in stereo by introducing geo-consistency. We also provided ror in scene breaking the ordering constraint should be in-
[11] V. Kolmogorov and R. Zabih. Computing visual correspon-
FULL-MF dence with occlusions via graph cuts. In Proc. of IEEE Con-
NAKA-BNV ference on Computer Vision and Pattern Recognition, pages
KZ1 508–515, 2001.
1x vs 4x
NAKA-MF 3x vs 4x [12] V. Kolmogorov and R. Zabih. Multi-camera scene recon-
2x vs 3x
GEO-MF 1x vs 2x struction via graph cuts. In Proc. European Conference on
0 2 4 6 8 10 12 14% Computer Vision, 2002.
[13] J. Krol and W. van der Grind. The double-nail illusion. Per-
ception, 11:615–619, 1982.
Figure 11. Resistance to baseline change for 5 algorithms for [14] K. Kutulakos and S. Seitz. A theory of shape by space carv-
the Santa scene (Multiview Image Database of the University of ing. International Journal of Computer Vision, 38(3):133–
Tsukuba ); each bar represents a percentage of incompatible pix- 144, 2000.
els between depth maps obtained for two different baselines. [15] Y. Nakamura, T. Matsuura, K. Satoh, and Y. Ohta. Occlusion
detectable stereo -occlusion patterns in camera matrix-. In
Proc. of IEEE Conference on Computer Vision and Pattern
vestigated. Also, the extension of this occlusion model to Recognition, 1996.
full volumetric reconstruction, where occlusion becomes [16] Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline
the dominant problem, should be investigated. using dynamic programming. IEEE Trans. Pattern Analysis
and Machine Intelligence, 7(2):139–154, 1985.
7. Acknowledgment [17] J. Park and S. Inoue. Hierarchical depth mapping from mul-
tiple cameras. In Int. Conf. on Image Analysis and Process-
ing, volume 1, pages 685–692, Florence, Italy, 1997.
This work was made possible by NSERC (Canada) and [18] J. Park and S. Inoue. Acquisition of sharp depth map from
NATEQ (Québec) grants. multiple cameras. Signal Processing: Image Commun.,
14:7–19, 1998.
References [19] S. Roy. Stereo without epipolar lines : A maximum-flow for-
mulation. Int. J. Computer Vision, 34(2/3):147–162, 1999.
[20] M. Sanfourche, G. L. Besnerais, and F. Champagant. On
[1] S. Birchfield and C. Tomasi. A pixel dissimilarity measure
the choice of the correlation term for multi-baseline stereo-
that is insensitive to image sampling. IEEE Trans. Pattern
vision. In Proc. of the IEEE Conf. on British Computer Vi-
Anal. Mach. Intell., 20(4):401–406, 1998.
sion, September 2004.
[2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate en-
[21] D. Scharstein and R. Szeliski. A taxonomy and evaluation
ergy minimization via graph cut. In Proc. Int. Conference
of dense two-frame stereo correspondence algorithms. IJCV
on Computer Vision, pages 377–384, 1999.
47(1/2/3):7-42, April-June 2002., 47, 2002.
[3] I. J. Cox, S. Hingorani, B. M. Maggs, and S. B. Rao. A
[22] D. Scharstein and R. Szeliski. High-accuracy stereo depth
maximum likelihood stereo algorithm. Computer Vision and
maps using structured light. In Proc. of IEEE Conference on
Image Understanding, 63(3):542–567, 1996.
[4] G. Egnal and R. P. Wildes. Detecting binocular half- Computer Vision and Pattern Recognition, 2003.
[23] S. M. Seitz and C. R. Dyer. Photorealistic scene reconstruc-
occlusions: Empirical comparisons of five approaches.
tion by voxel coloring. Int. J. Computer Vision, 35(2):151–
IEEE Trans. Pattern Anal. Mach. Intell., 24(8):1127–1133,
173, 1999.
2002.
[24] J. Sun, N. Zheng, and H. Shum. Stereo matching using be-
[5] O. D. Faugeras and R. Keriven. Complete dense stereovision
lief propagation. IEEE Trans. Pattern Analysis and Machine
using level set methods. In Proc. European Conference on
Intelligence, 25(7):787–800, July 2003.
Computer Vision, pages 379–393, 1998.
[6] A. Fusiello, V. Roberto, and E. Trucco. Efficient stereo with [25] R. Szeliski and R. Zabih. An experimental comparison of
multiple windowing. In Proc. of IEEE Conference on Com- stereo algorithms. In Vision Algorithms: Theory and Prat-
puter Vision and Pattern Recognition, 1997. ice, pages 1–19. Springer-Verlag, 1999.
[7] S. Intille and A. F. Bobick. Disparity-space images and large [26] O. Veksler. Fast variable window for stereo correspondence
occlusion stereo. In Proc. European Conference on Com- using integral images. In Proc. of IEEE Conference on Com-
puter Vision, pages 179–186, 2002. puter Vision and Pattern Recognition, 2003.
[8] H. Ishikawa and D. Geiger. Occlusions, discontinuities, and [27] G. Vogiatzis, P. Torr, S. M. Seitz, and R. Cipolla. Recon-
epipolar lines in stereo. In Fifth European Conference on structing relief surfaces. In Proc. of the IEEE Conf. on
Computer Vision, pages 232–248, 1998. British Computer Vision, September 2004.
[9] T. Kanade and M. Okutomi. A stereo matching algo-
rithm with an adaptive window: Theory and experiment.
IEEE Trans. Pattern Analysis and Machine Intelligence,
16(9):920–932, 1994.
[10] S. Kang, R. Szeliski, and J. Chai. Handling occlusions in
dense multiview stereo. In Proc. of IEEE Conference on
Computer Vision and Pattern Recognition, 2001.

View publication stats

Photoshop Tutorial How To Put Stockings On A Person
100% (10)
Photoshop Tutorial How To Put Stockings On A Person
12 pages
The Art of Pixar Short Films
100% (1)
The Art of Pixar Short Films
152 pages
Genuine Temporary Entrant (GTE) Assessment
100% (1)
Genuine Temporary Entrant (GTE) Assessment
14 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Stereo Vision Due Diligence
No ratings yet
Stereo Vision Due Diligence
6 pages
Automatic Stereo Model Reconstruction Using Directly Measure Orientation Parameters
No ratings yet
Automatic Stereo Model Reconstruction Using Directly Measure Orientation Parameters
11 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Lec 17
No ratings yet
Lec 17
10 pages
Stereo Matching and Rectification
No ratings yet
Stereo Matching and Rectification
13 pages
3D Stereo Camera
No ratings yet
3D Stereo Camera
7 pages
What Is The Goal Stereo Vision?
No ratings yet
What Is The Goal Stereo Vision?
7 pages
What Is Photogrammetry?: Photos - Light Gramma - To Draw Metron - To Measure
No ratings yet
What Is Photogrammetry?: Photos - Light Gramma - To Draw Metron - To Measure
63 pages
PSC Photogrammetry
No ratings yet
PSC Photogrammetry
110 pages
Analog Photogrametric Process
No ratings yet
Analog Photogrametric Process
27 pages
Stereo Vision Using The Opencv Library: Sebastian DR Oppelmann Moos Hueting Sander Latour Martijn Van Der Veen June 2010
No ratings yet
Stereo Vision Using The Opencv Library: Sebastian DR Oppelmann Moos Hueting Sander Latour Martijn Van Der Veen June 2010
15 pages
Drones 03 00002
No ratings yet
Drones 03 00002
17 pages
Stereo Image Processing Using Opencv
No ratings yet
Stereo Image Processing Using Opencv
25 pages
2017RSGIS 01 PPT L2 2 Georeferencing and Geocoding V2 en
No ratings yet
2017RSGIS 01 PPT L2 2 Georeferencing and Geocoding V2 en
38 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
PSC Surveyor Photogrammetry
No ratings yet
PSC Surveyor Photogrammetry
89 pages
2nd Lecture AERIAL PHOTO 1 16102024 055646pm
No ratings yet
2nd Lecture AERIAL PHOTO 1 16102024 055646pm
54 pages
04 Multi-View Geometry
No ratings yet
04 Multi-View Geometry
54 pages
Unit 6 Photogrammetry
No ratings yet
Unit 6 Photogrammetry
15 pages
PP 34-40 Real Time Implementation of
No ratings yet
PP 34-40 Real Time Implementation of
7 pages
Real-Time Depth Enhanced Monocular Odometry: Ji Zhang, Michael Kaess, and Sanjiv Singh
No ratings yet
Real-Time Depth Enhanced Monocular Odometry: Ji Zhang, Michael Kaess, and Sanjiv Singh
8 pages
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
From Everand
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Sensors 24 04177
No ratings yet
Sensors 24 04177
31 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Stereo 3d Vision
No ratings yet
Stereo 3d Vision
53 pages
Remondino Presentation
No ratings yet
Remondino Presentation
31 pages
Applsci 10 06945
No ratings yet
Applsci 10 06945
25 pages
Kinect 6
No ratings yet
Kinect 6
5 pages
Stereo Vision
No ratings yet
Stereo Vision
214 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
99 pages
Lecture9 2
No ratings yet
Lecture9 2
32 pages
Automated Architecture Reconstruction From Close-Range Photogrammetry
No ratings yet
Automated Architecture Reconstruction From Close-Range Photogrammetry
8 pages
Stereo Analyst
No ratings yet
Stereo Analyst
312 pages
Part 09 MD
No ratings yet
Part 09 MD
40 pages
Advanced Mobile Mapping System Development With Integration of Laser Data
No ratings yet
Advanced Mobile Mapping System Development With Integration of Laser Data
8 pages
Visual Odometry (Vo) : Csc2541, Feb 9, 2016 Presented by Patrick Mcgarey
No ratings yet
Visual Odometry (Vo) : Csc2541, Feb 9, 2016 Presented by Patrick Mcgarey
36 pages
Remotesensing 15 04308 v2
No ratings yet
Remotesensing 15 04308 v2
26 pages
Bergamin Vitti Zatelli
No ratings yet
Bergamin Vitti Zatelli
9 pages
Sensors 24 02268
No ratings yet
Sensors 24 02268
22 pages
File 0519
No ratings yet
File 0519
24 pages
Pinhole Camera Model: Understanding Perspective through Computational Optics
From Everand
Pinhole Camera Model: Understanding Perspective through Computational Optics
Fouad Sabry
No ratings yet
Geographical Information Data Sources
No ratings yet
Geographical Information Data Sources
23 pages
Improved Feature Matching For Mobile Devices With IMU
No ratings yet
Improved Feature Matching For Mobile Devices With IMU
20 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Lecture 04
No ratings yet
Lecture 04
35 pages
Real Time 3D Depth Estimation and
No ratings yet
Real Time 3D Depth Estimation and
6 pages
Measuring Height: T B R R B T
No ratings yet
Measuring Height: T B R R B T
51 pages
Preprints202308 1425 v1
No ratings yet
Preprints202308 1425 v1
17 pages
Schoeps 2017 CVPR
No ratings yet
Schoeps 2017 CVPR
10 pages
Carbonneau 2016
No ratings yet
Carbonneau 2016
50 pages
Topics On Optical and Digital Image Processing Using Holography and Speckle Techniques
From Everand
Topics On Optical and Digital Image Processing Using Holography and Speckle Techniques
Abdallah Hamed
No ratings yet
Analysis of Rigorous Orientation Models For Pushbroom Sensors. Applications With Quickbird
No ratings yet
Analysis of Rigorous Orientation Models For Pushbroom Sensors. Applications With Quickbird
6 pages
Direct Visual-Inertial Odometry With Stereo Cameras: Vladyslav Usenko, Jakob Engel, J Org ST Uckler, and Daniel Cremers
No ratings yet
Direct Visual-Inertial Odometry With Stereo Cameras: Vladyslav Usenko, Jakob Engel, J Org ST Uckler, and Daniel Cremers
8 pages
GmE 202 Lecture 5 Imaging Geometry and Correction Rev
No ratings yet
GmE 202 Lecture 5 Imaging Geometry and Correction Rev
80 pages
Towards Automatic Relative Orientation For Architectural Photogrammetry
No ratings yet
Towards Automatic Relative Orientation For Architectural Photogrammetry
6 pages
Multi - View Stereo A Tutorial
No ratings yet
Multi - View Stereo A Tutorial
151 pages
Rigorous Image Formation From Airborne and Spaceborne Digital Array Scanners
No ratings yet
Rigorous Image Formation From Airborne and Spaceborne Digital Array Scanners
7 pages
Aerial Photogrammetry
No ratings yet
Aerial Photogrammetry
149 pages
CVR Module 3
No ratings yet
CVR Module 3
12 pages
Rotameter Equations and Derivations
No ratings yet
Rotameter Equations and Derivations
2 pages
Feedback Controllers: Figure 8.1 Schematic Diagram For A Stirred-Tank Blending System
No ratings yet
Feedback Controllers: Figure 8.1 Schematic Diagram For A Stirred-Tank Blending System
42 pages
Electron Correlations in Narrow Energy Bands
No ratings yet
Electron Correlations in Narrow Energy Bands
20 pages
Intercomparison of Methods To Estimate Black Carbon Emissions
No ratings yet
Intercomparison of Methods To Estimate Black Carbon Emissions
8 pages
Escaping Free-Energy Minima - PNAS
No ratings yet
Escaping Free-Energy Minima - PNAS
7 pages
1 s2.0 S0025326X02002205 Main PDF
No ratings yet
1 s2.0 S0025326X02002205 Main PDF
11 pages
Writing For The TOEFL IBT
No ratings yet
Writing For The TOEFL IBT
369 pages
Foreign Direct Investment and The Environment: Pollution Haven Hypothesis Revisited
No ratings yet
Foreign Direct Investment and The Environment: Pollution Haven Hypothesis Revisited
36 pages
Doctor of Philosophy (PH.D.) Chemistry: Degree Program
No ratings yet
Doctor of Philosophy (PH.D.) Chemistry: Degree Program
1 page
Image Processing
No ratings yet
Image Processing
62 pages
Document
No ratings yet
Document
2 pages
Detailed Drawing Exercises: Solidworks Education
No ratings yet
Detailed Drawing Exercises: Solidworks Education
51 pages
Graphics 2
100% (1)
Graphics 2
7 pages
3 03-Intro-Digital-Camera
No ratings yet
3 03-Intro-Digital-Camera
5 pages
Module 4
No ratings yet
Module 4
131 pages
(Us) MCG Arietta 50 Brochure Ver. 3.0
No ratings yet
(Us) MCG Arietta 50 Brochure Ver. 3.0
2 pages
Overview Technology Roadmap of The Future Trend of Metaverse Based On IoT Blockchain AI Technique and Medical Domain Metaverse Activity
No ratings yet
Overview Technology Roadmap of The Future Trend of Metaverse Based On IoT Blockchain AI Technique and Medical Domain Metaverse Activity
6 pages
Usage of Computer Generated Imagery
100% (1)
Usage of Computer Generated Imagery
11 pages
Iso Draw
No ratings yet
Iso Draw
5 pages
How To Create An Ice Text Effect With Photoshop
No ratings yet
How To Create An Ice Text Effect With Photoshop
18 pages
CGIP MODULE 5 Important Topics
No ratings yet
CGIP MODULE 5 Important Topics
28 pages
QQ
No ratings yet
QQ
2 pages
Dcap109 Graphic Tools
No ratings yet
Dcap109 Graphic Tools
344 pages
Tamsin Spargo Foucault e A Teoria Queer Portuguese
No ratings yet
Tamsin Spargo Foucault e A Teoria Queer Portuguese
37 pages
EC8093-DIP - Model Exam QP
0% (1)
EC8093-DIP - Model Exam QP
2 pages
Digital Image Processing Using Visual C++
100% (24)
Digital Image Processing Using Visual C++
34 pages
Be - Electronics and Telecommunication Engineering - Semester 6 - 2023 - February - Digital Image Processing2019 Pattern
No ratings yet
Be - Electronics and Telecommunication Engineering - Semester 6 - 2023 - February - Digital Image Processing2019 Pattern
2 pages
What Are The Advantages of Digital Cameras To Photojournalism
No ratings yet
What Are The Advantages of Digital Cameras To Photojournalism
7 pages
CSE605 Computer Graphics Scan Conversion
100% (1)
CSE605 Computer Graphics Scan Conversion
22 pages
Sample Data For Pivot Table
No ratings yet
Sample Data For Pivot Table
12 pages
Q2 L3 - Designing The User Interface of A Game App
No ratings yet
Q2 L3 - Designing The User Interface of A Game App
8 pages
Color Mixing
No ratings yet
Color Mixing
29 pages
Ict Notes Senior 3
No ratings yet
Ict Notes Senior 3
43 pages
Monitor Resolution and Aspect
No ratings yet
Monitor Resolution and Aspect
13 pages
What Is Color in Art
No ratings yet
What Is Color in Art
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Tutorial: Cleaning Up Imported Meshes: Chapter 14 Working With Meshes and Polys
No ratings yet
Tutorial: Cleaning Up Imported Meshes: Chapter 14 Working With Meshes and Polys
11 pages