0% found this document useful (0 votes)
36 views8 pages

Semi Global Matching

1) The document proposes a Semi-Global Matching (SGM) stereo matching method that aims to achieve accurate matching at object boundaries while being robust to recording/illumination changes and efficient to calculate. 2) SGM performs pixelwise matching based on Mutual Information and approximates a global smoothness constraint by combining local 1D constraints along different directions. 3) A key contribution is a hierarchical calculation of Mutual Information that is almost as fast as intensity-based matching and can handle complex intensity transformations.

Uploaded by

Josue Melong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Semi Global Matching

1) The document proposes a Semi-Global Matching (SGM) stereo matching method that aims to achieve accurate matching at object boundaries while being robust to recording/illumination changes and efficient to calculate. 2) SGM performs pixelwise matching based on Mutual Information and approximates a global smoothness constraint by combining local 1D constraints along different directions. 3) A key contribution is a hierarchical calculation of Mutual Information that is almost as fast as intensity-based matching and can handle complex intensity transformations.

Uploaded by

Josue Melong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual

Information

Heiko Hirschmüller
Institute of Robotics and Mechatronics Oberpfaffenhofen
German Aerospace Center (DLR)
P.O. Box 1116, 82230 Wessling, Germany
[email protected]

Abstract ronments. Robustness against recording differences and il-


lumination changes is vital, because this often cannot be
This paper considers the objectives of accurate stereo controlled. Finally, efficient (off-line) processing is neces-
matching, especially at object boundaries, robustness sary, because the images and disparity ranges are huge (e.g.
against recording or illumination changes and efficiency of several 100MPixel with 1000 pixel disparity range).
the calculation. These objectives lead to the proposed Semi-
Global Matching method that performs pixelwise matching
2. Related Literature
based on Mutual Information and the approximation of a
global smoothness constraint. Occlusions are detected and
disparities determined with sub-pixel accuracy. Addition- There is a wide range of dense stereo algorithms [8]
ally, an extension for multi-baseline stereo images is pre- with different properties. Local methods, which are based
sented. There are two novel contributions. Firstly, a hierar- on correlation can have very efficient implementations that
chical calculation of Mutual Information based matching is are suitable for real time applications [5]. However, these
shown, which is almost as fast as intensity based matching. methods assume constant disparities within a correlation
Secondly, an approximation of a global cost calculation is window, which is incorrect at discontinuities and leads to
proposed that can be performed in a time that is linear to blurred object boundaries. Certain techniques can reduce
the number of pixels and disparities. The implementation this effect [8, 5], but it cannot be eliminated. Pixelwise
requires just 1 second on typical images. matching [1] avoids this problem, but requires other con-
straints for unambiguous matching (e.g. piecewise smooth-
ness). Dynamic Programming techniques can enforce these
constraints efficiently, but only within individual scanlines
1. Introduction [1, 11]. This typically leads to streaking effects. Global ap-
proaches like Graph Cuts [7, 2] and Belief Propagation [10]
Accurate, dense stereo matching is an important require- enforce the matching constraints in two dimensions. Both
ment for many applications, like 3D reconstruction. Most approaches are quite memory intensive and Graph Cuts is
difficult are often the boundaries of objects and fine struc- rather slow. However, it has been shown [4] that Belief
tures, which can appear blurred. Additional practical prob- Propagation can be implemented very efficiently.
lems originate from recording and illumination differences The matching cost is commonly based on intensity dif-
or reflections, because matching is often directly based on ferences, which may be sampling insensitive [1]. Inten-
intensities that can have quite different values for corre- sity based matching is very sensitive to recording and il-
sponding pixels. Furthermore, fast calculations are often lumination differences, reflections, etc. Mutual Informa-
required, either because of real-time applications or because tion has been introduced in computer vision for matching
of large images or many images that have to be processed images with complex relationships of corresponding inten-
efficiently. sities, possibly even images of different sensors [12]. Mu-
An application were all of the three objectives come to- tual Information has already been used for correlation based
gether is the reconstruction of urban terrain, captured by an stereo matching [3] and Graph Cuts [6]. It has been shown
airborne pushbroom camera. Accurate matching at object [6] that it is robust against many complex intensity transfor-
boundaries is important for reconstructing structured envi- mations and even reflections.
3. Semi-Global Matching
Z 1
3.1. Outline HI = − PI (i) log PI (i)di (2)
0
Z 1Z 1
HI1 ,I2 = − PI1 ,I2 (i1 , i2 ) log PI1 ,I2 (i1 , i2 )di1 di2 (3)
The Semi-Global Matching (SGM) method is based on 0 0
the idea of pixelwise matching of Mutual Information and
approximating a global, 2D smoothness constraint by com- For well registered images the joined entropy HI1 ,I2 is
bining many 1D constraints. The algorithm is described in low, because one image can be predicted by the other, which
distinct processing steps, assuming a general stereo geom- corresponds to low information. This increases their Mu-
etry of two or more images with known epipolar geometry. tual Information. In the case of stereo matching, one image
Firstly, the pixelwise cost calculation is discussed in Sec- needs to be warped according to the disparity image D for
tion 3.2. Secondly, the implementation of the smoothness matching the other image, such that corresponding pixels
constraint is presented in Section 3.3. Next, the disparity is are at the same location in both images, i.e. I1 = Ib and
determined with sub-pixel accuracy and occlusion detection I2 = fD (Im ).
in Section 3.4. An extension for multi-baseline matching is Equation (1) operates on full images and requires the dis-
described in Section 3.5. Finally, the complexity and imple- parity image a priori. Both prevent the use of MI as match-
mentation is discussed in Section 3.6. ing cost. Kim et al. [6] transformed the calculation of the
joined entropy HI1 ,I2 into a sum of data terms using Taylor
expansion. The data term depends on corresponding inten-
3.2. Pixelwise Cost Calculation sities and is calculated individually for each pixel p.

The matching cost is calculated for a base image pixel p


from its intensity Ibp and the suspected correspondence Imq HI1 ,I2 = ∑ hI1 ,I2 (I1p , I2p ) (4)
p
at q = ebm (p, d) of the match image. The function ebm (p, d)
symbolizes the epipolar line in the match image for the base The data term hI1 ,I2 is calculated from the probability dis-
image pixel p with the line parameter d. For rectified im- tribution PI1 ,I2 of corresponding intensities. The number of
ages ebm (p, d) = [px − d, py ]T with d as disparity. corresponding pixels is n. Convolution with a 2D Gaussian
An important aspect is the size and shape of the area that (indicated by ⊗g(i, k)) effectively performs Parzen estima-
is considered for matching. The robustness of matching is tion [6].
increased with large areas. However, the implicit assump-
tion about constant disparity inside the area is violated at
1
discontinuities, which leads to blurred object borders and hI1 ,I2 (i, k) = − log(PI1 ,I2 (i, k) ⊗ g(i, k)) ⊗ g(i, k) (5)
n
fine structures. Certain shapes and techniques can be used
to reduce blurring, but it cannot be avoided [5]. Therefore, The probability distribution of corresponding intensities
the assumption of constant disparities in the vicinity of p is is defined with the operator T[], which is 1 if its argument is
discarded. This means that only the intensities Ibp and Imq true and 0 otherwise.
itself can be used for calculating the matching cost.
One choice of pixelwise cost calculation is the sampling 1
n∑
insensitive measure of Birchfield and Tomasi [1]. The cost PI1 ,I2 (i, k) = T[(i, k) = (I1p , I2p )] (6)
CBT (p, d) is calculated as the absolute minimum difference p

of intensities at p and q = ebm (p, d) in the range of half a Kim et al. argued that the entropy HI1 is constant and
pixel in each direction along the epipolar line. HI2 is almost constant as the disparity image merely redis-
Alternatively, the matching cost calculation is based tributes the intensities of I2 . Thus, hI1 ,I2 (I1p , I2p ) serves as
on Mutual Information (MI) [12], which is insensitive to cost for matching the intensities I1p and I2p . However, if
recording and illumination changes. It is defined from the occlusions are considered then some intensities of I1 and I2
entropy H of two images (i.e. their information content) as do not have a correspondence. These intensities should not
well as their joined entropy. be included in the calculation, which results in non-constant
entropies HI1 and HI2 . Therefore, it is suggested to calculate
these entropies analog to the joined entropy.
MII1 ,I2 = HI1 + HI2 − HI1 ,I2 (1)
1
The entropies are calculated from the probability distri- HI = ∑ hI (Ip ), hI (i) = − log(PI (i) ⊗ g(i)) ⊗ g(i) (7)
p n
butions P of intensities of the associated images.
The probability distribution PI must not be calculated ones, due to noise, etc. Therefore, an additional constraint
over the whole images I1 and I2 , but only over the corre- is added that supports smoothness by penalizing changes of
sponding parts (otherwise occlusions would be ignored and neighboring disparities. The pixelwise cost and the smooth-
HI1 and HI2 would be almost constant). That is easily done ness constraints are expressed by defining the energy E(D)
by just summing the corresponding rows and columns of the that depends on the disparity image D.
joined probability distribution, e.g. PI1 (i) = ∑k PI1 ,I2 (i, k).
The resulting definition of Mutual Information is, E(D) = ∑ C(p, Dp ) + ∑ P1 T[|Dp − Dq | = 1]
p q∈Np
(11)
MII1 ,I2 = ∑ miI1 ,I2 (I1p , I2p ) (8a) + ∑ P2 T[|Dp − Dq | > 1]
p q∈Np

miI1 ,I2 (i, k) = hI1 (i) + hI2 (k) − hI1 ,I2 (i, k). (8b) The first term is the sum of all pixel matching costs
for the disparities of D. The second term adds a constant
This leads to the definition of the MI matching cost. penalty P1 for all pixels q in the neighborhood Np of p,
for which the disparity changes a little bit (i.e. 1 pixel).
CMI (p, d) = −miIb , fD (Im ) (Ibp , Imq )with q = ebm (p, d) (9) The third term adds a larger constant penalty P2 , for all
larger disparity changes. Using a lower penalty for small
The remaining problem is that the disparity image is re- changes permits an adaptation to slanted or curved surfaces.
quired for warping Im , before mi() can be calculated. Kim The constant penalty for all larger changes (i.e. indepen-
et al. suggested an iterative solution, which starts with a dent of their size) preserves discontinuities [2]. Discontinu-
random disparity image for calculating the cost CMI . This ities are often visible as intensity changes. This is exploited
P2
cost is then used for matching both images and calculating by adapting P2 to the intensity gradient, i.e. P2 = |I −I .
bp bq |
a new disparity image, which serves as the base of the next However, it has always to be ensured that P2 ≥ P1 .
iteration. The number of iterations is rather low (e.g. 3), The problem of stereo matching can now be formulated
because even wrong disparity images (e.g. random) allow a as finding the disparity image D that minimizes the en-
good estimation of the probability distribution P. This solu- ergy E(D). Unfortunately, such a global minimization (2D)
tion is well suited for iterative stereo algorithms like Graph is NP-complete for many discontinuity preserving energies
Cuts [6], but it would increase the runtime of non-iterative [2]. In contrast, the minimization along individual image
algorithms unnecessarily. rows (1D) can be performed efficiently in polynomial time
Therefore, a hierarchical calculation is proposed, which using Dynamic Programming [1, 11]. However, Dynamic
recursively uses the (up-scaled) disparity image, that has Programming solutions easily suffer from streaking [8], due
been calculated at half resolution, as initial disparity. If the to the difficulty of relating the 1D optimizations of individ-
overall complexity of the algorithm is O(W HD) (i.e. width ual image rows to each other in a 2D image. The problem
× height × disparity range), then the runtime at half reso- is, that very strong constraints in one direction (i.e. along
lution is reduced by factor 23 = 8. Starting with a random image rows) are combined with none or much weaker con-
1
disparity image at a resolution of 16 th and initially calculat- straints in the other direction (i.e. along image columns).
ing 3 iterations increases the overall runtime by the factor, This leads to the new idea of aggregating matching
costs in 1D from all directions equally. The aggregated
1 1 1 1 (smoothed) cost S(p, d) for a pixel p and disparity d is cal-
1+ + 3 + 3 + 3 3 ≈ 1.14. (10)
2 3 4 8 16 culated by summing the costs of all 1D minimum cost paths
that end in pixel p at disparity d (Figure 1). It is noteworthy
Thus, the theoretical runtime of the hierarchically calcu-
that only the cost of the path is required and not the path
lated CMI would be just 14% slower than that of CBT , ig-
itself.
noring the overhead of MI calculation and image scaling. It
Let Lr be a path that is traversed in the direction r. The
is noteworthy that the disparity image of the lower resolu-
cost Lr (p, d) of the pixel p at disparity d is defined recur-
tion level is used only for estimating the probability distri-
sively as,
bution P and calculating the costs CMI of the higher reso-
lution level. Everything else is calculated from scratch to
avoid passing errors from lower to higher resolution levels. Lr (p, d) = C(p, d) + min(Lr (p − r, d),
Lr (p − r, d − 1) + P1, Lr (p − r, d + 1) + P1, (12)
3.3. Aggregation of Costs min Lr (p − r, i) + P2).
i
Pixelwise cost calculation is generally ambiguous and The pixelwise matching cost C can be either CBT or CMI .
wrong matches can easily have a lower cost than correct The remainder of the equation adds the lowest cost of the
(a) Minimum Cost Path Lr(p, d) (b) 16 Paths from all Directions r Using a quadratic curve is theoretically justified only for
x a simple correlation using the sum of squared differences.
However, is is used as an approximation due to the simplic-
y p
ity of calculation.
d The disparity image Dm that corresponds to the match
image Im can be determined from the same costs, by travers-
ing the epipolar line, that corresponds to the pixel q of the
x, y p match image. Again, the disparity d is selected, which cor-
responds to the minimum cost, i.e. mind S(emb (q, d), d).
Figure 1. Aggregation of costs. However, the cost aggregation step does not treat the base
and match images symmetrically. Therefore, better results
can be expected, if Dm is calculated from scratch. Outliers
previous pixel p − r of the path, including the appropriate are filtered from Db and Dm , using a median filter with a
penalty for discontinuities. This implements the behavior of small window (i.e. 3 × 3).
equation (11) along an arbitrary 1D path. This cost does not The calculation of Db as well as Dm permits the deter-
enforce the visibility or ordering constraint, because both mination of occlusions and false matches by performing a
concepts cannot be realized for paths that are not identi- consistency check. Each disparity of Db is compared with
cal to epipolar lines. Thus, the approach is more similar its corresponding disparity of Dm . The disparity is set to
to Scanline Optimization [8] than traditional Dynamic Pro- invalid (Dinv ) if both differ.
gramming solutions.
The values of L permanently increase along the path, 
which may lead to very large values. However, equation Dbp if |Dbp − Dmq | ≤ 1, q = ebm (p, Dbp ),
(12) can be modified by subtracting the minimum path cost Dp = (15)
Dinv otherwise.
of the previous pixel from the whole term.
The consistency check enforces the uniqueness con-
Lr (p, d) = C(p, d) + min(Lr (p − r, d), straint, by permitting one to one mappings only.
Lr (p − r, d − 1) + P1, Lr (p − r, d + 1) + P1, (13)
min Lr (p − r, i) + P2) − min Lr (p − r, k) 3.5. Extension for Multi-Baseline Matching
i k

This modification does not change the actual path The algorithm could be extended for multi-baseline
through disparity space, since the subtracted value is con- matching, by calculating a combined pixelwise matching
stant for all disparities of a pixel p. Thus, the position of the cost of correspondences between the base image and all
minimum does not change. However, the upper limit can match images. However, valid and invalid costs would be
now be given as L ≤ Cmax + P2 . mixed near discontinuities, depending on the visibility of a
The costs Lr are summed over paths in all directions r. pixel in a match image. The consistency check (Section 3.4)
The number of paths must be at least 8 and should be 16 for can only distinguish between valid (visible) and invalid (oc-
providing a good coverage of the 2D image. cluded or mismatched) pixels, but it can not separate valid
and invalid costs afterwards. Thus, the consistency check
would invalidate all areas that are not seen by all images,
S(p, d) = ∑ Lr (p, d) (14) which leads to unnecessarily large invalid areas. Without
r the consistency check, invalid costs would introduce match-
The upper limit for S is easily determined as S ≤ ing errors near discontinuities, which leads to fuzzy object
16(Cmax + P2 ). borders.
Therefore, it is better to calculate several disparity im-
3.4. Disparity Computation ages from individual image pairs, exclude all invalid pixels
by the consistency check and then combine the result. Let
The disparity image Db that corresponds to the base im- the disparity Dk be the result of matching the base image
age Ib is determined as in local stereo methods by selecting Ib against a match image Imk . The disparities of the images
for each pixel p the disparity d that corresponds to the min- Dk are scaled differently, according to some factor tk . For
imum cost, i.e. mind S(p, d). For sub-pixel estimation, a rectified images, this factor corresponds to the length of the
quadratic curve is fitted through the neighboring costs (i.e. baseline between Ib and Imk .
at the next higher or lower disparity) and the position of the The robust combination selects the median of all dispar-
D
minimum is calculated. ities tkkp for a certain pixel p. Additionally, the accuracy
is increased by calculating the weighted mean of all correct This solution allows processing of almost arbitrarily large
disparities (i.e. within the range of 1 pixel around the me- images.
dian). This is done by using tk as weighting factor.
4. Experimental Results
 
∑k∈Vp Dkp  Dkp Dip  1
Dp = , Vp = {k|  − med ≤ } (16)
∑k∈Vp tk tk i ti  tk 4.1. Stereo Images with Ground Truth

This combination is robust against matching errors in Three stereo image pairs with ground truth [8, 9] have
some disparity images and it also increases the accuracy. been selected for evaluation (first row of Figure 2). The
images 2 and 4 have been used from the Teddy and Cones
3.6. Complexity and Implementation image sequences. All images have been processed with a
disparity range of 32 pixel.
The calculation of the pixelwise cost CMI starts with col- The MWMF method is a local, correlation based, real
lecting all alleged correspondences (i.e. defined by an ini- time algorithm [5], which has been shown [8] to produce
tial disparity as described in Section 3.2) and calculating better object borders (i.e. less fuzzy) than many other local
PI1 ,I2 . The size of P is the square of the number of inten- methods. The second row of Figure 2 shows the resulting
sities, which is constant (i.e. 256 × 256). The subsequent disparity images. The blurring of object borders is typical
operations consist of Gaussian convolutions of P and calcu- for local methods. The calculation of Teddy has been per-
lating the logarithm. The complexity depends only on the formed in just 0.071s on a Xeon with 2.8GHz.
collection of alleged correspondences due to the constant Belief Propagation (BP) [10] minimizes a global cost
size of P. Thus, O(W H) with W as image width and H as function (e.g. equation (11)) by iteratively passing mes-
image height. sages in a graph that is defined by the four connected im-
The pixelwise matching costs for all pixels p at all dis- age grid. The messages are used for updating the nodes of
parities d are scaled to 11 bit integer values and stored in the graph. The disparity is in the end selected individually
a 16 bit array C(p, d). Scaling to 11 bit guarantees that all at each node. Similarly, SGM can be described as passing
aggregated costs do not exceed the 16 bit limit. A second 16 messages independently, from all directions along 1D paths
bit integer array of the same size is used for the aggregated for updating nodes. This is done sequentially as each mes-
cost values S(p, d). The array is initialized with 0. The sage depends on one predecessor only. Thus, messages are
calculation starts for each direction r at all pixels b of the passed through the whole image. In contrast, BP sends mes-
image border with Lr (b, d) = C(b, d). The path is traversed sages in a 2D graph. Thus, the schedule of messages that
in forward direction according to equation (13). For each reaches each node is different and BP requires an iterative
visited pixel p along the path, the costs Lr (p, d) are added solution. The number of iterations determines the distance
to S(p, d) for all disparities d. The calculation of equation from which information is passed in the image.
(13) requires O(D) steps at each pixel, since the minimum The efficient BP algorithm2 [4] uses a hierarchical ap-
cost of the previous pixel (e.g. mink Lr (p − r, k)) is constant proach and several optimizations for reducing the complex-
for all disparities and can be pre-calculated. Each pixel is ity. The complexity and memory requirements are very sim-
visited exactly 16 times, which results in a total complexity ilar to SGM. The third row of Figure 2 shows good results
of O(W HD). The regular structure and simple operations of Tsukuba. However, the results of Teddy and especially
(i.e. additions and comparisons) permit parallel calculations Cones are rather blocky, despite attempts to get the best re-
using integer based SIMD1 assembler instructions. sults by parameter tuning. The calculation of Teddy took
The disparity computation and consistency check re- 4.5s on the same computer.
quires visiting each pixel at each disparity a constant num- The Graph Cuts method3 [7] iteratively minimizes a
ber of times. Thus, the complexity is O(W HD) as well. global cost function (e.g. equation (11) with P1 = P2 ) as
The 16 bit arrays C and S have a size of W × H × D, well. The fourth row of Figure 2 shows the results, which
which can exceed the available memory for larger images are much better for Teddy and Cones, especially near ob-
and disparity ranges. The suggested remedy is to split the ject borders and fine structures like the leaves. However,
input image into tiles, which are processed individually. the complexity of the algorithm is much higher. The calcu-
The tiles overlap each other by a few pixels, since the pix- lation of Teddy has been done in 55s on the same computer.
els at the image border receive support by the global cost The results of SGM with CBT as matching cost (i.e. the
function only from one side. The overlapping pixels are ig- same as for BP and GC) are shown in the fifth row of Figure
nored for combining the tiles to the final disparity image.
2 http://people.cs.uchicago.edu/˜pff/bp/
1 Single Instruction, Multiple Data 3 http://www.cs.cornell.edu/People/vnk/software.html
Tsukuba (384 x 288, Disp. 32) Teddy (450 x 375, Disp. 32) Cones (450 x 375, Disp. 32)

Left Images

Local, correlation
(MWMF)

Belief Propagation
(BP)

Graph Cuts (GC)

SGM with BT
(i.e. intensity based
matching cost)

SGM with HMI,


(i.e. hierarchical
Mutual Information
as matching cost)

Figure 2. Comparison of different stereo methods.


Left Image Modified Right Image Resulting Disparity Image (SGM HMI)

Figure 3. Result of matching modified Teddy images with SGM (HMI).

2, using the best parameters for the set of all images. The The power of MI based matching can be demonstrated by
quality of the result comes close to Graph Cuts. Only, the manually modifying the right image of Teddy by dimming
textureless area on the right of the Teddy is handled worse. the upper half and inverting the intensities of the lower half
Slanted surfaces appear smoother than with Graph Cuts, due (Figure 3). Such an image pair cannot be matched by inten-
to sub pixel interpolation. The calculation of Teddy has sity based costs. However, the MI based cost handles this
been performed in 1.0s. Cost aggregation requires almost situation easily as shown on the right. More examples about
half of the processing time. the power of MI based stereo matching are shown by Kim
The last row of Figure 2 shows the result of SGM with et al. [6].
the hierarchical calculation of CMI as matching cost. The
disparity image of Tsukuba and Teddy appear equally well 4.2. Stereo Images of an Airborne Pushbroom Cam-
and Cones appears much better. This is an indication that era
the matching tolerance of MI is beneficial even for carefully
The SGM (HMI) method has been tested on huge im-
captured images. The calculation of Teddy took 1.3s. This
ages (i.e. several 100MPixel) of an airborne pushbroom
is just 30% slower than the non-hierarchical, intensity based
camera, which records 5 panchromatic images in different
version.
angles. The appropriate camera model and non-linearity of
The disparity images have been compared to the ground the flight path has been taken into account for calculating
truth. All disparities that differ by more than 1 are treated the epipolar lines.
as errors. Occluded areas (i.e. identified using the ground A difficult test object is Neuschwanstein castle (Figure
truth) have been ignored. Missing disparities (i.e. black 5a), because of high walls and towers, which result in high
areas) have been interpolated by using the lowest neighbor- disparity changes and large occluded areas. The castle has
ing disparities. Figure 4 presents the resulting graph. This been recorded 4 times using different flight paths. Each
quantitative analysis confirms that SGM performs as well as flight path results in a multi-baseline stereo image from
other global approaches. Furthermore, MI based matching which the disparity has been calculated. All disparity im-
results in even better disparity images. ages have been combined for increasing robustness.
Figure 5b shows the end result, using a hierarchical,
correlation based method [13]. The object borders appear
Errors of different methods fuzzy and the towers are mostly unrecognized. The result
18 of the SGM (HMI) method is shown in Figure 5c. All ob-
MWMF
16 BP ject borders and towers have been properly detected. Stereo
14 GC methods with intensity based pixelwise costs (e.g. Graph
SGM (BT)
12 SGM (HMI) Cuts and SGM (BT)) failed on these images completely,
Errors [%]

10 because of large intensity differences of correspondences.


8 This is caused by recording differences as well as unavoid-
6 able changes of lighting and the scene during the flight (i.e.
4 corresponding points are recorded at different times on the
2 flight path). Nevertheless, the MI based matching cost han-
0 dles the differences easily.
Tsukuba Teddy Cones
The processing time is one hour on a 2.8GHz Xeon for
matching 11MPixel of a base image against 4 match images
Figure 4. Errors of different stereo methods. with an average disparity range of 400 pixel.
(a) Top View of Neuschwanstein (b) Result of Correlation Method (c) Result of SGM (HMI)

Figure 5. Neuschwanstein castle (Germany), recorded by an airborne pushbroom camera.

5. Conclusion [3] G. Egnal. Mutual information as a stereo correspondence


measure. Technical Report MS-CIS-00-20, Computer and
Information Science, University of Pennsylvania, Philadel-
It has been shown that a hierarchical calculation of a
phia, USA, 2000.
Mutual Information based matching cost can be performed [4] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient belief
at almost the same speed as an intensity based matching propagation for early vision. In IEEE Conference on Com-
cost. This opens the way for robust, illumination insensitive puter Vision and Pattern Recognition, 2004.
stereo matching in a broad range of applications. Further- [5] H. Hirschmüller, P. R. Innocent, and J. M. Garibaldi.
more, it has been shown that a global cost function can be Real-time correlation-based stereo vision with reduced bor-
approximated efficiently in O(W HD). der errors. International Journal of Computer Vision,
47(1/2/3):229–246, April-June 2002.
The resulting Semi-Global Matching (SGM) method [6] J. Kim, V. Kolmogorov, and R. Zabih. Visual correspon-
performs much better matching than local methods and is dence using energy minimization and mutual information.
almost as accurate as global methods. However, SGM is In International Conference on Computer Vision, 2003.
much faster than global methods. A near real-time perfor- [7] V. Kolmogorov and R. Zabih. Computing visual correspon-
mance on small images has been demonstrated as well as an dence with occlusions using graph cuts. In International
efficient calculation of huge images. Conference for Computer Vision, pages 508–515, 2001.
[8] D. Scharstein and R. Szeliski. A taxonomy and evaluation
of dense two-frame stereo correspondence algorithms. Inter-
6. Acknowledgments national Journal of Computer Vision, 47(1/2/3):7–42, April-
June 2002.
[9] D. Scharstein and R. Szeliski. High-accuracy stereo depth
I would like to thank Klaus Gwinner, Johann Heindl, maps using structured light. In IEEE Conference for Com-
Frank Lehmann, Martin Oczipka, Sebastian Pless, Frank puter Vision and Pattern Recognition, volume 1, pages 195–
Scholten and Frank Trauthan for inspiring discussions and 202, Madison, Winsconsin, USA, June 2003.
Daniel Scharstein and Richard Szeliski for makeing stereo [10] J. Sun, H. Y. Shum, and N. N. Zheng. Stereo matching using
images with ground truth available. belief propagation. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 25(7):787–800, July 2003.
[11] G. Van Meerbergen, M. Vergauwen, M. Pollefeys, and
References L. Van Gool. A hierarchical symmetric stereo algorithm us-
ing dynamic programming. International Journal of Com-
[1] S. Birchfield and C. Tomasi. Depth discontinuities by pixel- puter Vision, 47(1/2/3):275–285, April-June 2002.
[12] P. Viola and W. M. Wells. Alignment by maximization of
to-pixel stereo. In Proceedings of the Sixth IEEE Interna-
mutual information. International Journal of Computer Vi-
tional Conference on Computer Vision, pages 1073–1080,
sion, 24(2):137–154, 1997.
Mumbai, India, January 1998. [13] F. Wewel, F. Scholten, and K. Gwinner. High resolution
[2] Y. Boykov, O. Veksler, and R. Zabih. Efficient approximate stereo camera (hrsc) - multispectral 3d-data acquisition and
energy minimization via graph cuts. IEEE Transactions on photogrammetric data processing. Canadian Journal of Re-
Pattern Analysis and Machine Intelligence, 23(11):1222– mote Sensing, 26(5):466–474, 2000.
1239, 2001.

You might also like