Semi Global Matching
Semi Global Matching
Information
Heiko Hirschmüller
Institute of Robotics and Mechatronics Oberpfaffenhofen
German Aerospace Center (DLR)
P.O. Box 1116, 82230 Wessling, Germany
[email protected]
of intensities at p and q = ebm (p, d) in the range of half a Kim et al. argued that the entropy HI1 is constant and
pixel in each direction along the epipolar line. HI2 is almost constant as the disparity image merely redis-
Alternatively, the matching cost calculation is based tributes the intensities of I2 . Thus, hI1 ,I2 (I1p , I2p ) serves as
on Mutual Information (MI) [12], which is insensitive to cost for matching the intensities I1p and I2p . However, if
recording and illumination changes. It is defined from the occlusions are considered then some intensities of I1 and I2
entropy H of two images (i.e. their information content) as do not have a correspondence. These intensities should not
well as their joined entropy. be included in the calculation, which results in non-constant
entropies HI1 and HI2 . Therefore, it is suggested to calculate
these entropies analog to the joined entropy.
MII1 ,I2 = HI1 + HI2 − HI1 ,I2 (1)
1
The entropies are calculated from the probability distri- HI = ∑ hI (Ip ), hI (i) = − log(PI (i) ⊗ g(i)) ⊗ g(i) (7)
p n
butions P of intensities of the associated images.
The probability distribution PI must not be calculated ones, due to noise, etc. Therefore, an additional constraint
over the whole images I1 and I2 , but only over the corre- is added that supports smoothness by penalizing changes of
sponding parts (otherwise occlusions would be ignored and neighboring disparities. The pixelwise cost and the smooth-
HI1 and HI2 would be almost constant). That is easily done ness constraints are expressed by defining the energy E(D)
by just summing the corresponding rows and columns of the that depends on the disparity image D.
joined probability distribution, e.g. PI1 (i) = ∑k PI1 ,I2 (i, k).
The resulting definition of Mutual Information is, E(D) = ∑ C(p, Dp ) + ∑ P1 T[|Dp − Dq | = 1]
p q∈Np
(11)
MII1 ,I2 = ∑ miI1 ,I2 (I1p , I2p ) (8a) + ∑ P2 T[|Dp − Dq | > 1]
p q∈Np
miI1 ,I2 (i, k) = hI1 (i) + hI2 (k) − hI1 ,I2 (i, k). (8b) The first term is the sum of all pixel matching costs
for the disparities of D. The second term adds a constant
This leads to the definition of the MI matching cost. penalty P1 for all pixels q in the neighborhood Np of p,
for which the disparity changes a little bit (i.e. 1 pixel).
CMI (p, d) = −miIb , fD (Im ) (Ibp , Imq )with q = ebm (p, d) (9) The third term adds a larger constant penalty P2 , for all
larger disparity changes. Using a lower penalty for small
The remaining problem is that the disparity image is re- changes permits an adaptation to slanted or curved surfaces.
quired for warping Im , before mi() can be calculated. Kim The constant penalty for all larger changes (i.e. indepen-
et al. suggested an iterative solution, which starts with a dent of their size) preserves discontinuities [2]. Discontinu-
random disparity image for calculating the cost CMI . This ities are often visible as intensity changes. This is exploited
P2
cost is then used for matching both images and calculating by adapting P2 to the intensity gradient, i.e. P2 = |I −I .
bp bq |
a new disparity image, which serves as the base of the next However, it has always to be ensured that P2 ≥ P1 .
iteration. The number of iterations is rather low (e.g. 3), The problem of stereo matching can now be formulated
because even wrong disparity images (e.g. random) allow a as finding the disparity image D that minimizes the en-
good estimation of the probability distribution P. This solu- ergy E(D). Unfortunately, such a global minimization (2D)
tion is well suited for iterative stereo algorithms like Graph is NP-complete for many discontinuity preserving energies
Cuts [6], but it would increase the runtime of non-iterative [2]. In contrast, the minimization along individual image
algorithms unnecessarily. rows (1D) can be performed efficiently in polynomial time
Therefore, a hierarchical calculation is proposed, which using Dynamic Programming [1, 11]. However, Dynamic
recursively uses the (up-scaled) disparity image, that has Programming solutions easily suffer from streaking [8], due
been calculated at half resolution, as initial disparity. If the to the difficulty of relating the 1D optimizations of individ-
overall complexity of the algorithm is O(W HD) (i.e. width ual image rows to each other in a 2D image. The problem
× height × disparity range), then the runtime at half reso- is, that very strong constraints in one direction (i.e. along
lution is reduced by factor 23 = 8. Starting with a random image rows) are combined with none or much weaker con-
1
disparity image at a resolution of 16 th and initially calculat- straints in the other direction (i.e. along image columns).
ing 3 iterations increases the overall runtime by the factor, This leads to the new idea of aggregating matching
costs in 1D from all directions equally. The aggregated
1 1 1 1 (smoothed) cost S(p, d) for a pixel p and disparity d is cal-
1+ + 3 + 3 + 3 3 ≈ 1.14. (10)
2 3 4 8 16 culated by summing the costs of all 1D minimum cost paths
that end in pixel p at disparity d (Figure 1). It is noteworthy
Thus, the theoretical runtime of the hierarchically calcu-
that only the cost of the path is required and not the path
lated CMI would be just 14% slower than that of CBT , ig-
itself.
noring the overhead of MI calculation and image scaling. It
Let Lr be a path that is traversed in the direction r. The
is noteworthy that the disparity image of the lower resolu-
cost Lr (p, d) of the pixel p at disparity d is defined recur-
tion level is used only for estimating the probability distri-
sively as,
bution P and calculating the costs CMI of the higher reso-
lution level. Everything else is calculated from scratch to
avoid passing errors from lower to higher resolution levels. Lr (p, d) = C(p, d) + min(Lr (p − r, d),
Lr (p − r, d − 1) + P1, Lr (p − r, d + 1) + P1, (12)
3.3. Aggregation of Costs min Lr (p − r, i) + P2).
i
Pixelwise cost calculation is generally ambiguous and The pixelwise matching cost C can be either CBT or CMI .
wrong matches can easily have a lower cost than correct The remainder of the equation adds the lowest cost of the
(a) Minimum Cost Path Lr(p, d) (b) 16 Paths from all Directions r Using a quadratic curve is theoretically justified only for
x a simple correlation using the sum of squared differences.
However, is is used as an approximation due to the simplic-
y p
ity of calculation.
d The disparity image Dm that corresponds to the match
image Im can be determined from the same costs, by travers-
ing the epipolar line, that corresponds to the pixel q of the
x, y p match image. Again, the disparity d is selected, which cor-
responds to the minimum cost, i.e. mind S(emb (q, d), d).
Figure 1. Aggregation of costs. However, the cost aggregation step does not treat the base
and match images symmetrically. Therefore, better results
can be expected, if Dm is calculated from scratch. Outliers
previous pixel p − r of the path, including the appropriate are filtered from Db and Dm , using a median filter with a
penalty for discontinuities. This implements the behavior of small window (i.e. 3 × 3).
equation (11) along an arbitrary 1D path. This cost does not The calculation of Db as well as Dm permits the deter-
enforce the visibility or ordering constraint, because both mination of occlusions and false matches by performing a
concepts cannot be realized for paths that are not identi- consistency check. Each disparity of Db is compared with
cal to epipolar lines. Thus, the approach is more similar its corresponding disparity of Dm . The disparity is set to
to Scanline Optimization [8] than traditional Dynamic Pro- invalid (Dinv ) if both differ.
gramming solutions.
The values of L permanently increase along the path,
which may lead to very large values. However, equation Dbp if |Dbp − Dmq | ≤ 1, q = ebm (p, Dbp ),
(12) can be modified by subtracting the minimum path cost Dp = (15)
Dinv otherwise.
of the previous pixel from the whole term.
The consistency check enforces the uniqueness con-
Lr (p, d) = C(p, d) + min(Lr (p − r, d), straint, by permitting one to one mappings only.
Lr (p − r, d − 1) + P1, Lr (p − r, d + 1) + P1, (13)
min Lr (p − r, i) + P2) − min Lr (p − r, k) 3.5. Extension for Multi-Baseline Matching
i k
This modification does not change the actual path The algorithm could be extended for multi-baseline
through disparity space, since the subtracted value is con- matching, by calculating a combined pixelwise matching
stant for all disparities of a pixel p. Thus, the position of the cost of correspondences between the base image and all
minimum does not change. However, the upper limit can match images. However, valid and invalid costs would be
now be given as L ≤ Cmax + P2 . mixed near discontinuities, depending on the visibility of a
The costs Lr are summed over paths in all directions r. pixel in a match image. The consistency check (Section 3.4)
The number of paths must be at least 8 and should be 16 for can only distinguish between valid (visible) and invalid (oc-
providing a good coverage of the 2D image. cluded or mismatched) pixels, but it can not separate valid
and invalid costs afterwards. Thus, the consistency check
would invalidate all areas that are not seen by all images,
S(p, d) = ∑ Lr (p, d) (14) which leads to unnecessarily large invalid areas. Without
r the consistency check, invalid costs would introduce match-
The upper limit for S is easily determined as S ≤ ing errors near discontinuities, which leads to fuzzy object
16(Cmax + P2 ). borders.
Therefore, it is better to calculate several disparity im-
3.4. Disparity Computation ages from individual image pairs, exclude all invalid pixels
by the consistency check and then combine the result. Let
The disparity image Db that corresponds to the base im- the disparity Dk be the result of matching the base image
age Ib is determined as in local stereo methods by selecting Ib against a match image Imk . The disparities of the images
for each pixel p the disparity d that corresponds to the min- Dk are scaled differently, according to some factor tk . For
imum cost, i.e. mind S(p, d). For sub-pixel estimation, a rectified images, this factor corresponds to the length of the
quadratic curve is fitted through the neighboring costs (i.e. baseline between Ib and Imk .
at the next higher or lower disparity) and the position of the The robust combination selects the median of all dispar-
D
minimum is calculated. ities tkkp for a certain pixel p. Additionally, the accuracy
is increased by calculating the weighted mean of all correct This solution allows processing of almost arbitrarily large
disparities (i.e. within the range of 1 pixel around the me- images.
dian). This is done by using tk as weighting factor.
4. Experimental Results
∑k∈Vp Dkp Dkp Dip 1
Dp = , Vp = {k| − med ≤ } (16)
∑k∈Vp tk tk i ti tk 4.1. Stereo Images with Ground Truth
This combination is robust against matching errors in Three stereo image pairs with ground truth [8, 9] have
some disparity images and it also increases the accuracy. been selected for evaluation (first row of Figure 2). The
images 2 and 4 have been used from the Teddy and Cones
3.6. Complexity and Implementation image sequences. All images have been processed with a
disparity range of 32 pixel.
The calculation of the pixelwise cost CMI starts with col- The MWMF method is a local, correlation based, real
lecting all alleged correspondences (i.e. defined by an ini- time algorithm [5], which has been shown [8] to produce
tial disparity as described in Section 3.2) and calculating better object borders (i.e. less fuzzy) than many other local
PI1 ,I2 . The size of P is the square of the number of inten- methods. The second row of Figure 2 shows the resulting
sities, which is constant (i.e. 256 × 256). The subsequent disparity images. The blurring of object borders is typical
operations consist of Gaussian convolutions of P and calcu- for local methods. The calculation of Teddy has been per-
lating the logarithm. The complexity depends only on the formed in just 0.071s on a Xeon with 2.8GHz.
collection of alleged correspondences due to the constant Belief Propagation (BP) [10] minimizes a global cost
size of P. Thus, O(W H) with W as image width and H as function (e.g. equation (11)) by iteratively passing mes-
image height. sages in a graph that is defined by the four connected im-
The pixelwise matching costs for all pixels p at all dis- age grid. The messages are used for updating the nodes of
parities d are scaled to 11 bit integer values and stored in the graph. The disparity is in the end selected individually
a 16 bit array C(p, d). Scaling to 11 bit guarantees that all at each node. Similarly, SGM can be described as passing
aggregated costs do not exceed the 16 bit limit. A second 16 messages independently, from all directions along 1D paths
bit integer array of the same size is used for the aggregated for updating nodes. This is done sequentially as each mes-
cost values S(p, d). The array is initialized with 0. The sage depends on one predecessor only. Thus, messages are
calculation starts for each direction r at all pixels b of the passed through the whole image. In contrast, BP sends mes-
image border with Lr (b, d) = C(b, d). The path is traversed sages in a 2D graph. Thus, the schedule of messages that
in forward direction according to equation (13). For each reaches each node is different and BP requires an iterative
visited pixel p along the path, the costs Lr (p, d) are added solution. The number of iterations determines the distance
to S(p, d) for all disparities d. The calculation of equation from which information is passed in the image.
(13) requires O(D) steps at each pixel, since the minimum The efficient BP algorithm2 [4] uses a hierarchical ap-
cost of the previous pixel (e.g. mink Lr (p − r, k)) is constant proach and several optimizations for reducing the complex-
for all disparities and can be pre-calculated. Each pixel is ity. The complexity and memory requirements are very sim-
visited exactly 16 times, which results in a total complexity ilar to SGM. The third row of Figure 2 shows good results
of O(W HD). The regular structure and simple operations of Tsukuba. However, the results of Teddy and especially
(i.e. additions and comparisons) permit parallel calculations Cones are rather blocky, despite attempts to get the best re-
using integer based SIMD1 assembler instructions. sults by parameter tuning. The calculation of Teddy took
The disparity computation and consistency check re- 4.5s on the same computer.
quires visiting each pixel at each disparity a constant num- The Graph Cuts method3 [7] iteratively minimizes a
ber of times. Thus, the complexity is O(W HD) as well. global cost function (e.g. equation (11) with P1 = P2 ) as
The 16 bit arrays C and S have a size of W × H × D, well. The fourth row of Figure 2 shows the results, which
which can exceed the available memory for larger images are much better for Teddy and Cones, especially near ob-
and disparity ranges. The suggested remedy is to split the ject borders and fine structures like the leaves. However,
input image into tiles, which are processed individually. the complexity of the algorithm is much higher. The calcu-
The tiles overlap each other by a few pixels, since the pix- lation of Teddy has been done in 55s on the same computer.
els at the image border receive support by the global cost The results of SGM with CBT as matching cost (i.e. the
function only from one side. The overlapping pixels are ig- same as for BP and GC) are shown in the fifth row of Figure
nored for combining the tiles to the final disparity image.
2 http://people.cs.uchicago.edu/˜pff/bp/
1 Single Instruction, Multiple Data 3 http://www.cs.cornell.edu/People/vnk/software.html
Tsukuba (384 x 288, Disp. 32) Teddy (450 x 375, Disp. 32) Cones (450 x 375, Disp. 32)
Left Images
Local, correlation
(MWMF)
Belief Propagation
(BP)
SGM with BT
(i.e. intensity based
matching cost)
2, using the best parameters for the set of all images. The The power of MI based matching can be demonstrated by
quality of the result comes close to Graph Cuts. Only, the manually modifying the right image of Teddy by dimming
textureless area on the right of the Teddy is handled worse. the upper half and inverting the intensities of the lower half
Slanted surfaces appear smoother than with Graph Cuts, due (Figure 3). Such an image pair cannot be matched by inten-
to sub pixel interpolation. The calculation of Teddy has sity based costs. However, the MI based cost handles this
been performed in 1.0s. Cost aggregation requires almost situation easily as shown on the right. More examples about
half of the processing time. the power of MI based stereo matching are shown by Kim
The last row of Figure 2 shows the result of SGM with et al. [6].
the hierarchical calculation of CMI as matching cost. The
disparity image of Tsukuba and Teddy appear equally well 4.2. Stereo Images of an Airborne Pushbroom Cam-
and Cones appears much better. This is an indication that era
the matching tolerance of MI is beneficial even for carefully
The SGM (HMI) method has been tested on huge im-
captured images. The calculation of Teddy took 1.3s. This
ages (i.e. several 100MPixel) of an airborne pushbroom
is just 30% slower than the non-hierarchical, intensity based
camera, which records 5 panchromatic images in different
version.
angles. The appropriate camera model and non-linearity of
The disparity images have been compared to the ground the flight path has been taken into account for calculating
truth. All disparities that differ by more than 1 are treated the epipolar lines.
as errors. Occluded areas (i.e. identified using the ground A difficult test object is Neuschwanstein castle (Figure
truth) have been ignored. Missing disparities (i.e. black 5a), because of high walls and towers, which result in high
areas) have been interpolated by using the lowest neighbor- disparity changes and large occluded areas. The castle has
ing disparities. Figure 4 presents the resulting graph. This been recorded 4 times using different flight paths. Each
quantitative analysis confirms that SGM performs as well as flight path results in a multi-baseline stereo image from
other global approaches. Furthermore, MI based matching which the disparity has been calculated. All disparity im-
results in even better disparity images. ages have been combined for increasing robustness.
Figure 5b shows the end result, using a hierarchical,
correlation based method [13]. The object borders appear
Errors of different methods fuzzy and the towers are mostly unrecognized. The result
18 of the SGM (HMI) method is shown in Figure 5c. All ob-
MWMF
16 BP ject borders and towers have been properly detected. Stereo
14 GC methods with intensity based pixelwise costs (e.g. Graph
SGM (BT)
12 SGM (HMI) Cuts and SGM (BT)) failed on these images completely,
Errors [%]