0% found this document useful (0 votes)
115 views9 pages

Real-Time Disparity Estimation Algorithm For Stereo Camera Systems

Uploaded by

thomas cyriac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views9 pages

Real-Time Disparity Estimation Algorithm For Stereo Camera Systems

Uploaded by

thomas cyriac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1018 IEEE Transactions on Consumer Electronics, Vol. 57, No.

3, August 2011

Real-Time Disparity Estimation Algorithm


for Stereo Camera Systems
Sang Hwa Lee and Siddharth Sharma

Abstract — This paper proposes a real-time stereo Stereo matching is one of the most active research topics in
matching algorithm using GPU programming. The likelihood computer vision. It is a subject to estimate disparity information
model is implemented using GPU programming for real-time between two or more images obtained from slightly different
operation. And the prior model is proposed to improve the viewpoints. As the interest and need of 3-D display and contents
accuracy of disparity estimation. First, the likelihood are recently increased, the stereo camera and stereo matching
matching based on rank transform is implemented in GPU become more important in the visual systems. Excellent
programming. The shared memory handling in graphic overview of the various issues involved in stereo matching is
hardware is introduced in calculating the likelihood model. presented by Scharstein and Szeliski [1], and Redert [2]. The
The prior model considers the smoothness of disparity map algorithms for the stereo matching can be broadly classified into
and is defined as a pixel-wise energy function using adaptive
two categories, local and global algorithms.
interaction among neighboring disparities. The disparity is
The local algorithms estimate disparity at a pixel using only
determined by minimizing the joint energy function which
image observations available in a finite neighboring window.
combines the likelihood model with prior model. These
processes are performed in the multi-resolution approach. The The local algorithms are sometimes called maximum
disparity map is interpolated using the reliability of likelihood likelihood estimation. Kanade and Okutomi used an adaptive
model and color-based similarity in the neighborhood. This window mechanism to take care of both textureless and
paper evaluates the proposed approach with the Middlebury disparity discontinuity regions [3]. Their work has been the
stereo images. According to the experiments, the proposed basis on the approaches using the variable windows to
algorithm shows good estimation accuracy over 30 calculate matching costs. Boykov tried to find the optimal
frames/second for 640×480 image and 60 disparity range. window by plausibility test [4]. Veksler determined the proper
The proposed disparity estimation algorithm is applied to window sizes and shapes from cost evaluation and user-
real-time stereo camera system such as 3-D image display, defined parameters [5], [6]. Some researchers proposed
depth-based object extraction, 3-D rendering, and so on. 1 multiple windowing, which selected the optimal windows out
of the predefined window set [7], [8]. The above approaches
Index Terms — Real-time stereo matching, GPU programming, usually exploited some candidate matching windows with
prior model, multi-resolution, object extraction, 3D rendering. variable sizes and center locations, and searched for the
optimal window from cost evaluation. Recently, there are
some methods to find adaptive support weights in a matching
I. INTRODUCTION window. Darrel proposed a radial similarity transform [9], and
Recently, industrial demand and interest of stereoscopic Xu developed the radial adaptive window [10]. Heo proposed
image systems are increased due to 3D movies and TV. The an adaptive normalized cross correlation matching, which is
depth information is the main element in 3D image systems. especially suitable for luminance changes [11]. Yoon and
We model 3D space and render 3D object using depth Kwon proposed a general local matching method, which did
information [42]. Also, we extract the object using depth not assume some candidate windows. They used color
information in the scene, which is applicable to various object- similarity and geometric proximity to form an adaptive
based systems. Stereo matching or disparity estimation is to support weights. These adaptive weights can form arbitrarily
find the depth information from stereoscopic images. The deformable shapes and sizes of matching windows [12].
goals of this paper are to propose a real-time algorithm for Tombari modified the adaptive support weights defined in [12]
disparity estimation and to show application systems based on using color segmentation information [13]. The adaptive
real-time stereo camera system. The proposed stereo camera support weights generalized the various local matching
system operates in the usual PC environment with graphic methods and showed the best performance.
hardware, so it is possible for general users to make 3D Unlike the local algorithms focus on the likelihood
contents or various object-based application systems by matching in the windows, the global methods consider the
themselves. correlation of disparities in the neighborhood. The correlation
is usually modeled as smoothness prior function, which is
1 combined with the likelihood matching function. And the joint
Sang Hwa Lee and Siddharth Sharma are with Department of Electrical
Engineering and Computer Science, BK21 Information Technology, INMC, energy function is globally minimized to find the disparity.
Seoul National University, Kwanak-gu, Seoul, 151-742, South Korea (emails: Cooperative stereo algorithms construct 3-D energy space for
[email protected], [email protected]). each pixel and disparity, and update the energy space
Contributed Paper
Manuscript received 07/14/11
Current version published 09/19/11
Electronic version published 09/19/11. 0098 3063/11/$20.00 © 2011 IEEE
S. H. Lee and S. Sharma: Real-Time Disparity Estimation Algorithm For Stereo Camera Systems 1019

iteratively using support and inhibition regions. Li exploited a Section IV. The multi-resolution approach of proposed
linear update rule [14], and Zitnick proposed the nonlinear algorithm is explained in Section V. We show the experimental
update formula [15]. The cooperative stereo algorithms do not results of disparity estimation, and apply the real-time stereo
consider the correlation of disparities directly, but exploit the system to depth-based object extraction in Section VI. Finally,
interaction implicitly in energy space. More advanced we conclude this paper in Section VII.
approaches are related to Markov random field (MRF) models
and energy minimization methods. The MRF model based II. LIKELIHOOD MATCHING
algorithms rely on energy minimization methods such as graph Likelihood matching is the basic process to find
cuts [16], [17], belief propagation [18], [19], nonlinear correspondences. As we described in Section I, SSD and
diffusion [20], [21], and dynamic programming [22], [23], SAD show fast but inaccurate performances. Adaptive
[24]. In those energy minimization methods, the 3-D energy windows and support weights show accurate but slow
fields are generated using likelihood and prior models for performances. This paper exploits rank transform as the
every disparity at each pixel, and the disparity fields are likelihood matching function. The rank transform has shown
estimated in the iterative process of energy minimization. good matching results compared with the various matching
Global energy minimization methods are usually combined schemes [39], [40]. We implements the rank transform in
with Maximum a Posteriori (MAP) estimator to create the GPU programming, and propose some implementation
mathematical framework for stereo matching. Various techniques for real-time operation. The rank transform at a
modifications of global algorithms have been reported in the pixel (i, j) is expressed as
literature [1], [19], [26], [27], [28], [29], [30].
The algorithms mentioned above concentrate on improving
the accuracy in disparity estimation. The MRF modeling and
r (i, j )  M  
( x , y )W
U  I (i  x, j  y )  I (i, j )  , (1)

energy minimization processes need too much computational


load to be implemented in real-time stereo systems. As the where U () is the unit step function, M is the number of
interest in 3-D visual systems is increased, the real-time stereo
pixels in the window W. The matching is performed by the
matching is also required in consumer applications. This paper
sum of absolute differences (SAD) for rank transformed
deals with real-time stereo matching using graphic hardware
images,
and GPU programming. Our goal is to estimate dense
disparity maps faster than 30 frames/second for 640×480
image and 60 pixels disparity range. Thanks to the advance in SAD(dij )  
( x ', y ') Bij
rr ( x ', y ')  rl ( x ' dij , y ') , (2)
graphic hardware, fast image processing and parallelization
are possible for real-time application systems. Some
researchers used the graphic hardware and GPU programming where Bij is a matching window centered at (i, j ) , and
to improve the speed [31], [32], [33]. They usually
implemented the likelihood matching function in GPU rr (i, j ) and rl (i, j ) are the rank transformed values of right
programming, and optimized the disparity field using dynamic and left images respectively. In (2), dij is the disparity value
programming. These approaches showed fast (near real-time)
operation, but suffered from the artifact in dynamic on the horizontal epipolar line.
programming. Hosni approximated the adaptive support We implement sliding window method in calculating (2).
weight in [12] using GPU programming [36]. Liang [37] and When we consider the SAD values for the adjacent blocks in
Yang [38] tried to reduce the computational load in belief the left image, the block locations in the left image are shifted
propagation. Even though their works showed good estimation by one pixel in the horizontal direction as shown in Fig. 1.
accuracy with near real-time speed, it is not suitable for real- Thus, the leftmost column of pixels in the block centered at
time applications. When we focus on the speed, the accuracy (i, j ) slips out of the block, and the rightmost column of
is a bit lower. On the other hand, when we focus on improving pixels slips into the block centered at (i  1, j ) to calculate
the accuracy, the speed is lower. Thus, we need some tradeoff
(2). When the matching block in the right image is the same
between speed and accuracy. This paper implements fast
for two adjacent blocks in the left image, the whole calculation
likelihood matching in GPU programming, and proposes a
of (2) is to subtract the SAD value of leftmost pixels and to
global optimization scheme to improve the accuracy within a
add the SAD value of rightmost pixels to the previous total
small number of iterations. Memory allocation problem and
correlation of disparity field are proposed for fast and accurate SAD value. In this case, the order of calculating SAD (dij ) is
disparity estimation. changed. We usually calculate the SAD values of all
The rest of paper is organized as follows. Section II disparities for a block in the left image. On the other hand, we
describes the likelihood matching based on the rank transform. calculate first SAD (dij ) and then calculate SAD ( di 1, j ) of
Section III explains the proposed GPU programming and
memory handling technique in detail. The proposed prior adjacent block using the sliding window method. We should
modeling to improve disparity accuracy is explained in note that the disparities dij at (i, j ) and di 1, j at (i  1, j )
1020 IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

have the following relation in the sliding window method, prior model and for color guided image interpolation from the
disparity map in the lower level, we include these two
dij  di 1, j  1 . processes in the RT kernel. These utilize the image data
already stored in shared memory, thus greatly reducing the
memory read costs.
Furthermore, if the SAD value of one-column pixels is
implemented by GPU parallel processing, the calculation of
(2) becomes much faster. This paper first aggregate the B. Cost Aggregation Kernel
likelihood matching cost for each column of pixels in GPU Implementation of sliding window method suffers greatly
programming, and exploits the sliding window concept to from inefficient memory management. Device memory access
calculate the whole SAD value for each disparity. This reduces is one of the primary bottlenecks of any data intensive
the calculation time in the SAD aggregation which is usually application on GPU. It was observed that the conventional
main part of time consumption. implementations had two drawbacks:

a) For aggregation, the image data is directly read from


device memory, thus threads continuously and alternatively
read data from left and right images. This temporal
inconsistency forces the texture cache to flush after each read,
which makes it ineffective.

b) To make best use of sliding window technique, whole


image is evaluated for each disparity one by one. These results
in numerous overlapped memory access for images spread
across the disparity range.
Fig. 1. Sliding window method. Two adjacent blocks are depicted in blue
We propose an efficient high speed design for cost
and red squares. For the same matching block in the right image, the
leftmost pixels are slipped out and the rightmost pixels are slipped in for aggregation kernel named as grouped disparity method which
calculating the SAD values. In this case, the disparity values of blocks are can be used for any separable cost aggregation kernel. We
different by 1 pixel. perform the disparity evaluation by groups of pixels called
disp_grp (16 in the paper). That is to say, we calculate the
SAD for disp_grp number of disparity range for each read
III. GPU PROGRAMMING data. Basically, each block is responsible for calculating
The GPU implementation consists of several parts: disparities of N number of rows, thus each thread for N pixels
facilitates the sliding window approach. First, image window
a) Image pyramid generation with width extended by disp_grp is stored in shared memory
b) Rank transform for each level through texture reads. Since box filter is separable, threads
c) Block based aggregation of initial level sum and store the vertical column SAD for disp_grp number
d) Hierarchical aggregation for higher pyramidal levels of disparity range in different arrays. Then, each thread
e) Color similarity based disparity interpolation collaboratively uses column sums calculated by other threads
f) Prior model post-processing and performs horizontal aggregation. By evaluating multiple
disparities (disp_grp) in an iteration, we increase the memory
We employ the pinned memory reads for CPU/GPU data reuse, which reduces the device memory reads. Next, in order
exchange for efficient memory cache usage. Image pyramids to calculate disparity for next image row, threads flush the first
(mip-maps) are generated using GPU's texture filtering row of previous aggregation window and read in the new
hardware which efficiently generates and stores the required bottom row for the current aggregation window into the shared
number of image mip-maps. memory. The same aggregation procedure is then repeated.

A. RT Kernel C. Cost Aggregation Kernel in the Multi-Resolution


Rank transform is computed by separate kernels for left and In the multi-resolution scheme, the disparity evaluation for
right images for each pyramid level. Each thread of kernel a level is guided by disparity estimate map generated from
calculates rank transform on one pixel. For this, each thread lower level. This implies that a normal sliding window
block first copies the data required by its threads for the rank technique cannot be used as an efficient cost aggregation
transform using texture cache onto its shared memory. Each scheme since all disparity range is not evaluated for each pixel.
thread then reads pixel data in the window of rank transform Evaluating each pixel separately according to its disparity
from low latency shared memory. Since the same offset results in irregular memory access pattern unsuitable for
neighborhood is also required to generate color weights for a GPU platform. To address these issues, we propose a
S. H. Lee and S. Sharma: Real-Time Disparity Estimation Algorithm For Stereo Camera Systems 1021

variation in the cost_aggr_kernel. We propose that each thread The proposed prior modeling is similar to the MRF
block first evaluate the range of initial disparities (offset modeling, and the joint energy function is minimized
range) for its sub-image. For example, if the initial disparity iteratively. The joint energy function is defined as
map in the subimage window varies from [30, 40], then [30-d,
40+d] becomes the disparity range for the thread block. After E (dij )   ij SAD(dij )
the required data are read and stored into shared memory as in
, (3)
the previous kernel, the kernel employs the proposed grouped  (1   ij )   n wc (d n ) dij  d n
disparity method for this small range of disparity. While the nN
block evaluates sliding window aggregation for the whole
subimage, the cost for each pixel is calculated only in range as where SAD (dij ) is the likelihood matching error at a
below,
pixel (i, j ) and disparity dij . d n is the disparity value in
[dispoff (i, j )  d , dispoff (i, j )  d ] . the neighborhood N, which is the first-order MRF window
with four elements. A parameter  ij is a weighting factor
Entire disparity evaluation for a pixel is done in an iteration, of smoothness prior, and is defined by reliability of
and we store the final disparity result in global memory once.
likelihood matching at (i, j ) . If the disparity is correctly
The structure and data flow of GPU programming is shown in
Fig. 2. estimated with only the likelihood matching, the
reliability is large, and vice versa. In the paper,  ij is
adaptively changed with respect to the reliability of
likelihood matching. When  ij is small, the disparity map
becomes much smoother and the effect of likelihood
matching is reduced. On the other hand, the larger values
of  ij decrease the effect of smoothness prior and
increase the effect of likelihood matching. In the case of
bad likelihood matching and occlusion,  ij is small so
that the effect of unreliable likelihood matching is
reduced in determining the disparity. We compared some
disparity maps by the likelihood matching with the ground
truths, and observed the relation between likelihood
matching and reliability. Based on our observation, we
propose a method to define the reliability R (i, j ) by the
ratio between the first and second minimum matching
error,

SAD (dij2 )
 ij  R (i, j )  , (4)
SAD (dij1 )

Fig. 2. Structure of GPU programming and data flow. 1 2


where SAD ( d ij ) and SAD ( d ij ) are the first and second
minimum matching errors, respectively.  ij is determined by

IV. PRIOR MODELING R(i, j ) ,


This paper exploits the smoothness prior of disparity field  R (i, j )  ,
to improve the accuracy. Since the processing of prior
 ij  exp   (5)
 C 
modeling should be simple and parallelized for real-time
operation, we introduce the prior modeling as a kind of
postprocessing after finding the initial disparity map by the where C is a constant to control  ij value. The examples of
fast likelihood matching. Dynamic programming is popular well-matched and ill-matched pixels by the likelihood
for fast prior modeling, but it suffers from some artifacts on matching are shown in Fig. 3. In our observation, the pixels
the scan-lines [32], [33], [34], [35]. Adaptive dynamic that have large ratios between the first and second minimum
programming and belief propagation do not show the real-time matching errors show reliable disparity estimation only using
performance [23], [24], [37], [38]. the likelihood matching.
1022 IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

textureless regions, since the likelihood matching is unreliable


and unstable. And it prevents the unreliable disparities from
propagating in the prior models by using color-based
weighting and reliability.

(a) (b)
Fig. 3. Examples of likelihood matching. (a) reliable likelihood matching
with large ratio between the first and second minimum errors, (b)
unreliable likelihood matching with small ratio.

Fig. 4. The prior modeling based on color similarity. The interaction and
And wc ( d n ) in (3) is a weight based on color similarity propagation of neighboring disparities are determined by the color-based
between (i, j ) and neighborhood n. Fig. 4 shows the similarity.

neighborhood for prior modeling and color-based interaction.


When the color of neighboring pixel n is similar to that of
V. MULTI-RESOLUTION APPROACH
center pixel (i, j ) , the weight is increased so that the effect of
The proposed algorithm is performed in the multi-resolution
neighborhood with similar color is emphasized. On the other scheme. That is to say, the stereo images are decomposed into
hand, the weight is decreased to stop the interaction of the Gaussian pyramid structure and disparity maps are
neighborhood when the color is different. The color-based estimated by successive refinement from the lowest resolution
weight preserves the disparity discontinuity in processing the to the highest one. From the lowest resolution, the disparity
smoothness prior modeling. We define the color-based weight map is estimated by the algorithm in Section II and III. Then,
using color distance between I (i, j ) and I ( n) , the disparity values are doubled and interpolated to the upper
resolution. The disparity map in the lower resolution is zero-
 I (n)  I (i, j )  order interpolated by the neighboring disparity with the most
wc (d n )  A exp  , (6) similar color. The color-based interpolation preserves the
 c  discontinuity of disparity map. Since the color similarity has
been calculated in the processing of prior modeling, we just
exploit the results in disparity interpolation. In the upper
where A is a constant, and c is the mean of standard
resolution, the estimated disparity map in the lower resolution
deviation of color components. is refined with small search range. For the whole disparity
Note that the reliability parameter n of neighborhood is range, 0  d  S , and the number of pyramid layers L, the
also exploited in the prior modeling. This means that the search range for the lowest resolution is set as
smoothness prior works with respect to the reliability of
neighboring pixel n. When the reliability of pixel n is high, the 0  d   S / 2 L  , (8)
disparity d n effects much on dij by the weighted smoothness
prior. It is desirable not to use the unreliable disparities in the where  is the ceiling function. For the next layers, we set
neighborhood by reducing the relative interactivity. This
prevents the erroneous disparities from propagating in the the search ranges to prevent estimation errors in the lower
prior modeling. resolutions from propagating into the higher resolutions. We
From the initial disparity map by the likelihood matching, define search ranges with negative and positive directions,
the disparity map is recursively updated as below, which are not usual in stereoscopic ordering constraints,

 
dij( k 1)  min E dij | d n( k ) ,
dij
(7)  S   S 
  L   d   L  , (9)
 2   2 
(k )
where d n is the disparity value estimated in the k-th iteration. where a parameter  adjusts the search ranges. The reduced


E dij | d n( k )  means that the neighboring disparity values
search range improves the speed of estimation process. The
multi-resolution approach improves both estimation accuracy
estimated in the k-th iteration are exploited in (3). and speed. The proposed color-based interpolation preserves
Consequently, the proposed prior modeling makes the the discontinuity of disparity maps, and the reduced search
disparity map smoother especially in the occlusion and range in the lower resolution decreases the occlusion area.
S. H. Lee and S. Sharma: Real-Time Disparity Estimation Algorithm For Stereo Camera Systems 1023

TABLE I
Parameters in the experiments. For real-time applications of VGA (640×480) image and 60
Parameter Value pixels of disparity range, we need 553 MDES or 30
frames/second (fps). We restrict the number iterations in
RT window size 7x7 processing the prior modeling to keep 30 fps. We evaluate the
Block aggregation size 9x9 proposed algorithm in the aspect of both accuracy and speed.
2:1 pyramid layers 4 However, it is a bit difficult to compare the accuracy since
Gaussian filter 5 x 5, variance=1.0 most reports of fast algorithms are focused on the operation
 in (9)  ij 1.1 speed. There is no enough benchmark to compare the accuracy
of fast algorithms.

TABLE II
TIME CONSUMPTION FOR TEDDY AND CONES.
Process Time (ms) fps
Rank Transform (left/right) 1.8 313.3
Cost aggregation (9x9 window) 6.4 88.1
Prior model processing 7.2 78.3
Total time consumption 15.4 36.6

First, we decompose the time consumption of proposed


method in the original resolution. Table II shows the time
consumption of each process. The cost aggregation
consumes much time in GPU programming since it needs
much calculation. The prior modeling is usually processed
in CPU-based operation, thus it takes much time, too. When
we implement the proposed method in the multi-resolution
Fig. 5. The test stereo left images and their true disparity maps. (top) approach, the time consumption for cost aggregation and
Cones, (bottom) Teddy.
prior modeling is decreased since the image size and
disparity range are also reduced. However, we need another
computation for Gaussian pyramid construction and
VI. EXPERIMENTAL RESULTS disparity interpolation. We limit our final results to 30fps.
A. Disparity Estimation in Real-Time Table III summarizes some results of the proposed
For test of real-time stereo matching, the proposed algorithm and fast methods. As we expect, the algorithms
algorithm has been implemented in the usual PC environment for real-time operation is not so good in accuracy compared
with general graphic hardware. We evaluated the proposed with the state-of-the-art. However, the proposed algorithm
algorithm using the Middlebury stereo images, Cones and is competitive with the other algorithms for both real-time
Teddy as shown in Fig. 5. Some optimized parameters in the operation and accuracy. Fig. 6 shows the disparity maps by
proposed algorithm are summarized in Table I. We counted the proposed algorithm. As we can see, the erroneous
time consumption from image loading to disparity map regions in the disparity maps become smoother, and
generation. The accuracy of disparity maps are evaluated in discontinuity is well preserved by processing the prior
[1], and the speed is evaluated by DES (disparity estimation modeling and color-based interpolation. The unreliable
per second), disparities of occlusion and textureless regions are
corrected by the prior modeling. These results prove that
the prior modeling and color-based interpolation in the
(image size)  (disparity range)
DES  . multi-resolution approach improve the accuracy of
Time Consumption disparity estimation.
According to the experiments, the proposed algorithm
The frame rates are also calculated as estimates the disparity map in real-time with reasonable
accuracy. Since there is room for further processing within
DES the time limitation (30 fps), it is possible to improve the
fps  . accuracy in the proposed algorithm.
640  480  60
1024 IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

TABLE III
Comparison of accuracy and speed. SAD in the fifth row is block matching of color components.
Algorithm fps (DES) Teddy Cones
nonoc all disc nonoc all Disc
Rank Transform (RT Only) 68.7 (1268 M) 14.4 23.2 32.1 7.0 17.1 18.7
RT+ prior in a resolution 36.2 (675 M) 13.3 22.2 30.3 6.2 16.4 17.2
RT in the multi-resolution 50.3 (928 M) 9.2 18.2 21.3 5.8 15.1 14.9
RT + prior in the multi-resolution 31.2 (583 M) 7.7 16.7 18.2 5.0 14.2 13.5
SAD (11x11) 67.9 (1252 M) 19.6 27.2 34.0 13.5 22.6 22.0
LSTDP [34] 0.7 (14 M) 11.1 16.4 23.4 6.39 11.8 13.5
GORDP [35] 3.3 (60 M) 9.03 16.8 18.4 13.1 20.1 20.1
HierarBP [29] accuracy 5.05 8.45 13.1 2.56 7.27 6.59
AdaptBP [19] accuracy 4.22 7.06 11.8 2.48 7.92 7.32

(a) (b) (c) (d)


Fig. 6. Disparity maps by the proposed algorithm for Cones and Teddy images. (a) Proposed method (Rank Transform (RT)+prior model in the multi-
resolution), (b) RT in the multi-resolution, (c) RT + prior model in a resolution, (d) RT only in a resolution.

B. Stereo Camera Systems images can be directly displayed as the anaglyphs, we exploit
The proposed method is applicable to stereo camera the disparity information for 3D display. Since the disparity
systems for 3D display and object-based systems. Even though information can be adjusted, we can make various depth fields
we have experienced 3-D images, there have not existed for further spatial reality in the scene. The manipulation of
stereoscopic camera systems to get depth information depth fields from stereo matching is a main advantage to
(disparity maps) in real-time. This has delayed 3-D camera display 3D scene. The users generate 3D video contents in
system to be popular. Since the proposed stereo matching is real-time using depth-based anaglyph images.
designed for the usual PC environment, it is expected that Fig. 8 is an example of depth-based object separation. Two
various 3-D application systems with coupled webcams can be human objects are extracted and separated by using depth
developed using the proposed algorithm. information from disparity estimation. In the usual
To setup the stereo camera systems, we use two usual PC segmentation methods like background subtraction, it is
cameras. We first calibrate and rectify two camera images so almost impossible to separate multiple objects in a scene. The
that two cameras are virtually parallel to each other. This disparity information is exploited to extract the multiple
means that the epipolar geometry is horizontal line on the objects in real-time. When we combine edge and background
same scanline. Descriptor-based Features such as SIFT or information with depth fields, the objects are more clearly
SURF [43] are exploited to find the geometric transform extracted. The extracted objects are separated from the
between two images, and each image is warped to parallel background and can be rendered with new background images
alignment. Then, stereo matching is performed for the rectified for virtual reality system [41], [42].
stereo images to estimate depth information in the scene. As we show some examples of stereoscopic systems, the
Fig. 7 shows the examples of anaglyph for 3D display on real-time stereo matching and stereoscopic cameras can be
the usual monitors. Red and blue components are separated implemented to various applications. The users exploit depth
based on the disparity information. Though the rectified two information for 3D contents generation and object-based
S. H. Lee and S. Sharma: Real-Time Disparity Estimation Algorithm For Stereo Camera Systems 1025

interaction. In this sense, the proposed real-time stereo handling in graphic hardware is introduced in aggregating
matching algorithm is very useful for usual PC-based the matching errors, which improves the speed. The prior
applications. Our further works will be focused on developing model reflects the smoothness of disparity field and is
the application systems using real-time stereo camera and implemented by a pixel-wise energy function. The disparity
depth information. Depth-based 3D contents generation and at each pixel is finally determined by minimizing the joint
object-based manipulation are good applications of real-time energy function which combines the likelihood matching
stereoscopic system in the usual PC environment. error with the prior energy model. This processing is
performed in the hierarchical approach. The disparity
interpolation is proposed using reliability of likelihood
model and color-based similarity of neighborhood, which
preserve discontinuity of disparity map. According to the
experiments with Middlebury stereo images, the proposed
method shows good estimation accuracy with more than 30
frames per second for 640x480 images. The proposed
method is suitable for real-time stereo system in the usual PC
environment.

ACKNOWLEDGMENT
This research was supported by Basic Science Research
Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Education, Science and
Technology (2010-0007200).

REFERENCES
[1] D. Scharstein and R. Szeleiski, “A taxonomy and evaluation of dense
(a) (b) two frame stereo correspondence algorithm,” International Journal of
Fig. 7. Anaglyph for 3D display by separating red and blue components Computer Vision, vol. 47, no. 1/2/3, pp. 7-42, 2002
using disparity information. (a) Cones, (b) Teddy. [2] A. Redert, E. Hendricks, and J. Biemond, “Correspondence estimation
in image pairs,” IEEE Signal Proc. Magazine, vol. 16, no. 3, pp. 29-46,
1999.
[3] T. Kanade and M. Okutomi, “A stereo matching algorithm with an
adaptive window: Theory and experiments,” IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920-932, 1994.
[4] Y. Boykov, O. Veksler, and R. Zabih, “A Variable Window Approach to
Early Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 20, no. 12, pp. 1283-1294, 1998.
[5] O. Veksler, “Stereo correspondence with compact windows via
minimum ratio cycle,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 24, no. 12, pp. 1654-1660, 2002.
[6] O. Veksler, “Fast Variable Window for Stereo Correspondence using
Integral Images,” Proc. of IEEE Conf. Computer Vision and Pattern
Recognition, vol. 1, pp. 556-561, 2003.
[7] A. Fusiello, V. Roberto, and E. Trucco, “Efficient Stereo with Multiple
Windowing,” Proc. of IEEE Conf. Computer Vision and Pattern
Recognition, pp. 856-863, 1997.
[8] S. B. Kang, R. Szeliski, and C. Jinxjang, “Handling occlusions in dense
multi-view stereo,” Proc. of IEEE Conf. Computer Vision and Pattern
Recognition, vol. 1, pp. 103-110, 2001.
[9] T. Darrel, “A radial cumulative similarity transform for robust image
correspondence,” Proc. of IEEE Conf. Computer Vision and Pattern
Recognition, pp. 656-662, 1998.
[10] Y. Xu, D. Wang, T. Feng, and H.-Y. Shum, “Stereo computation using
Fig. 8. Examples of depth-based object separation. The objects are
radial adaptive windows,” Proc. of Int'l Conf. Pattern Recognition, vol.
extracted in real-time using depth and background information.
3, pp. 595-598, 2002.
[11] Yong Seok Heo, Kyong Mu Lee, Sang Uk Lee, “Robust Stereo
Matching Using Adaptive Normalized Cross Correlation,” IEEE Trans.
VII. CONCLUSION Pattern Analysis and Machine Intelligence, vol. 32, 2010.
[12] K. J. Yoon and I. S. Kweon, “Adaptive support weight approach for
This paper has proposed a real-time stereo matching correspondence search,” IEEE Trans. Pattern Analysis and Machine
algorithm using GPU programming. The proposed approach Intelligence, vol. 28, no.4, pp. 650-656, 2006.
first calculates the likelihood matching error based on rank [13] F. Tombari, S. Mattoccia, L. D. Stefano, “Segmentation-based adaptive
support for accurate stereo correspondence,” Proc. of IEEE Pacific-Rim
transform. The likelihood matching error is parallelized and Symposium on Image and Video Technology, vol. LNCS 4872, no.4, pp.
implemented in GPU programming. The adaptive memory 427-438, 2007.
1026 IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

[14] Ze-Nian Li and Gongzhu Hu, “Analysis of disparity gradient based [33] L. Wang, M. Liao, M. Gong, R. Yang, and D. Nister, “High
cooperative stereo,” IEEE Trans. on Image Processing, vol. 5, no. 11, quality real-time stereo using adaptive cost aggregation and
pp. 1493-1506, 1996. dynamic programming,” Proc. of 3DPVT, pp. 798-805, 2006.
[15] C. L. Zitnick and T. Kanade, “A cooperative algorithm for stereo [34] Yi Deng and Xueyin Lin, “A fast line segment based dense stereo
matching and occlusion detection,” IEEE Trans. on Pattern Analysis and algorithm using tree dynamic programming,” Proc. European.
Machine Intelligence, vol. 22, no. 7, pp. 675-684, 2000. Conf. Computer Vision, LNCS 3953, pp. 201-212, 2006.
[16] Y. Boykov, O. Veksler, and R.Zabih, “Fast approximate energy [35] M. Gong and Y.-H. Yang, “Real-time stereo matching using
minimization with graph cuts,” IEEE Trans. on Pattern Analysis and orthogonal reliability dynamic programming,” IEEE Trans. Image
Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, 2001. Processing, vol. 16, no. 3, pp. 879-885, 2007.
[17] V. Kolmogorov and R. Zabih, “Computing visual correspondence with [36] A. Hosni, M. Bleyer, and M. Gelautz, “Near real-time stereo with
occlusions using graph cuts,” Proc. of IEEE Conf. Computer Vision, vol. adaptive support weight approaches,” Proc. of 3DPVT, 2010.
1, pp. 508-515, 2001. [37] C. K. Liang, C. C. Cheng, Y. C. Lai, L. G. Chen, and H. H. Chen,
[18] J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief “Hardware-efficient belief propagation,” Proc. of IEEE Conf.
propagation,” IEEE Trans. on Pattern Analysis and Machine Computer Vision and Pattern Recognition, pp. 80-87, 2009.
Intelligence, vol. 25, no.7, pp. 399-406, 2003. [38] Q. Yang, K.-H. Tan, and N. Ahuja, “A constant-space belief
[19] A. Klaus, M. Sormann, and K. Karner, “Segment-based stereo matching propagation algorithm for stereo matching,” Proc. of IEEE Conf.
using belief propagation and a self-adapting dissimilarity measure,” Computer Vision and Pattern Recognition, 2010.
Proc. of Int'l Conf. Pattern Recognition, vol. 3, pp. 15-18, 2006. [39] R. Zabih and J. Woodfill, “Non-parametric local transforms for
[20] D. Scharstein and R. Szeliski, “Stereo matching with non-linear computing visual correspondence,” Proc. ECCV, pp. 151-158,
diffusion,” International Journal of Computer Vision, vol. 28, no. 2, pp. 1994.
155-174, 1998. [40] J. Banks and M. Bennamoun, “Reliability analysis of the rank
[21] S. H. Lee, Y. Kanatsugu, and J.-I. Park, “Stochastic diffusion for stereo transform for stereo matching,” IEEE Trans. on Systems, Man, and
matching and line fields estimation,” International Journal of Computer Cybernetics, Part B: Cybernetics, vol. 31, no. 6, 2001.
Vision, vol. 47, no. 1/2/3, pp. 195-218, 2002. [41] Sae Woon Ryu, Sang Hwa Lee, Sang Chul Ahn, and Jong-Il Park,
[22] M. Gong and Y. Yang, “Fast stereo matching using reliability-based “Tangible video teleconference system using real-time image-
dynamic programming and consistency constrains,” Proc. of IEEE Conf. based relighting,” IEEE Trans. on Consumer Electronics, vol. 55,
Computer Vision, vol. 1, pp. 610-607, 2003. no. 3, pp. 1162-1168, Aug. 2009.
[23] O. Veksler, “Stereo correspondence by dynamic programming on a [42] Sae Woon Ryu, Sang Hwa Lee, and Jong-Il Park, “Real-time 3D
tree,” Proc. of IEEE Conf. Computer Vision and Pattern Recognition, Surface modeling for image-based relighting,” IEEE Trans. on
vol. 2, pp. 384-390, 2005. Consumer Electronics, vol. 55, no. 4, pp. 2431-2435, Nov. 2009.
[24] C. Kim, K. Lee, and S. Lee, “A dense stereo matching using two-pass [43] D. G. Lowe, “Distinctive image features from scale-invariant
DP with generalized ground control points,” Proc. of IEEE Conf. keypoints,” International Journal of Computer Vision, vol. 60, pp.
Computer Vision and Pattern Recognition, vol. 2, pp. 1075-1082, 2005. 91-110, 2004.
[25] J. Sun, Y. Li, S. B. Kang, and H. Y. Shum, “Symmetric stereo matching
for occlusion handling,” Proc. of IEEE Conf. Computer Vision and
Pattern Recognition, vol. 2, pp. 399-406, 2005. BIOGRAPHIES
[26] Z. Wang and Z. Zheng, “A region based stereo matching algorithm
using cooperative optimization,” Proc. of IEEE Conf. Computer Vision Sang Hwa Lee received the B.S., M.S., and Ph.D.
and Pattern Recognition, vol. 1, pp. 1-8, 2008. degrees in electrical engineering from Seoul National
[27] Z. L. Xu and J. Jia, “Stereo matching: an outlier confidence approach,” University, Seoul, Korea, in 1994, 1996, and 2000,
Proc. of European Conf. Computer Vision, vol. LNCS 5305, pp. 775- respectively. He had been a visiting researcher in NHK
787, 2008. STRL in Tokyo, Japan from 2000 to 2002. He has joined
[28] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister, “Stereo BK21 information technology, Department of Electrical
matching with color-weighted correlation, hierarchical belief Engineering, Seoul National University, since 2003,
propagation and occlusion handling,” IEEE Trans. on Pattern Analysis where he is currently a BK research professor. His research interests include
and Machine Intelligence, vol. 31, no. 3, pp. 492-504, 2009. image processing, stereoscopic system, HCI, pattern recognition, and
[29] S. Srivatava, S. J. Ha, S. H. Lee, N. I. Cho, and S. U. Lee, “Stereo computer vision.
matching using hierarchical belief propagation along ambiguity
gradient,” IEEE Int'l Conf. Image Processing, pp. 2085-2088, 2009. Siddharth Sharma received his BE degree from MIT,
[30] Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth super Manipal University in 2009. He was previously a research
resolution for range images,” Proc. of IEEE Conf. Computer Vision and trainee at Wise-Automotive, Korea where he worked on
Pattern Recognition, vol. 3, pp. 1-8, 2007. automotive vision systems. He is currently an assistant
[31] Z. Wang and Z. Zheng, “Near real-time reliable stereo matching using engineer at Institute of New Media and Communication,
programmable graphics hardware,” Proc. of IEEE Conf. Computer Seoul National University, Korea. His research interests
Vision and Pattern Recognition, vol. 1, pp. 924-931, 2005. reside in real-time vision system design.
[32] J. Congote, J. Barandiaran, I. Barandiaran, and O. Ruiz, “Real-time
dense stereo matching with dynamic programming,” Proc. of CEIG,
2009.

You might also like