Prashan 2008
Prashan 2008
Research Online
1-1-2008
Farzad Safaei
University of Wollongong, [email protected]
Recommended Citation
Premaratne, Prashan and Safaei, Farzad: Feature based stereo correspondence using moment invariant
2008, 104-108.
https://ro.uow.edu.au/infopapers/1667
Research Online is the open access institutional repository for the University of Wollongong. For further information
contact the UOW Library: [email protected]
Feature based stereo correspondence using moment invariant
Abstract
Autonomous navigation is seen as a vital tool in harnessing the enormous potential of Unmanned Aerial
Vehicles (UAV) and small robotic vehicles for both military and civilian use. Even though, laser based
scanning solutions for Simultaneous Location And Mapping (SLAM) is considered as the most reliable for
depth estimation, they are not feasible for use in UAV and land-based small vehicles due to their
physicalsize and weight. Stereovision is considered as the best approach for any autonomous navigation
solution as stereo rigs are considered to be lightweight and inexpensive. However, stereoscopy which
estimates the depth information through pairs of stereo images can still be computationally expensive
and unreliable. This is mainly due to some of the algorithms used in successful stereovision solutions
require high computational requirements that cannot be met by small robotic vehicles. In our research, we
implement a feature-based stereovision solution using moment invariants as a metric to find
corresponding regions in image pairs that will reduce the computational complexity and improve the
accuracy of the disparity measures that will be significant for the use in UAVs and in small
roboticvehicles.
Keywords
Feature, based, stereo, correspondence, using, moment, invariant
Disciplines
Physical Sciences and Mathematics
Publication Details
Premaratne, P. & Safaei, F. 2008, ''Feature based stereo correspondence using moment invariant'', 4th
International Conference on Information and Automation for Sustainability. Sustainable Development
Through Effective Man-Machine Co-Existence, IEEE Region 10 and ICIAFS, Colombo, Sri Lanka, pp.
104-108.
Abstract—Autonomous navigation is seen as a vital tool in Our approach is very much aimed at controlling small
harnessing the enormous potential of Unmanned Aerial Vehicles robotic vehicles using stereovision for depth calculation. This
(UAV) and small robotic vehicles for both military and civilian depth information will be used in control algorithms to detect
use. Even though, laser based scanning solutions for and avoid obstacles. If this depth information is to be useful,
Simultaneous Location And Mapping (SLAM) is considered as they need to be estimated in realtime which requires any
the most reliable for depth estimation, they are not feasible for stereovision algorithm to be less computationally expensive.
use in UAV and land-based small vehicles due to their physical
size and weight. Stereovision is considered as the best approach With our great success in using moment invariants for
for any autonomous navigation solution as stereo rigs are recognizing hand gestures, moment invariants can be of great
considered to be lightweight and inexpensive. However, use in finding corresponding matching regions in stereo pairs
stereoscopy which estimates the depth information through pairs [7-8]. Moment invariants are invariant to rotation, scale and
of stereo images can still be computationally expensive and shift and the rotation invariant property is especially beneficial
unreliable. This is mainly due to some of the algorithms used in to the stereo correspondence problem as any misalignment or
successful stereovision solutions require high computational non-flat ground conditions can create slightly rotated versions
requirements that cannot be met by small robotic vehicles. In our of any scene in any one of the cameras. In our approach, we
research, we implement a feature-based stereovision solution rely on edge-corner detection algorithms such as Harris corner
using moment invariants as a metric to find corresponding
detection [8] to produce reliable feature points. This will result
regions in image pairs that will reduce the computational
in fewer points of interest compared to area-based techniques
complexity and improve the accuracy of the disparity measures
that will be significant for the use in UAVs and in small robotic
[9-18]. An image can be separated to a collection of blocks and
vehicles. and can be marked as candidates or not depending on whether
they occupy corners (or edges). These blocks can be matched
Keywords—moment invariants, feature-based stereo with the help of moment invariants and disparity of the
correspondence, sum of squared difference (SSD) identified features can be simply calculated.
In this paper, section II details the general stereo matching
I. INTRODUCTION approaches and the moment invariant based technique is
Stereo vision is a mechanism for obtaining depth presented in detail in section III. This is followed by our
information from digital images. The challenge in stereovision experimental results and the conclusion.
is how to find corresponding points in the left image and the
right image, known as the correspondence problem. Once a II. STEREO MATCHING APPROACHES
pair of corresponding points is found the depth can be In area based stereo matching, for a given pair of stereo
computed using triangulation. There are two prominent images, the corresponding points are supposed to lie on the
approaches to finding such corresponding pairs namely, area- epipolar lines [19]. Area based techniques rely on the
based and feature-based techniques. In the area based assumption of surface continuity, and often involve some
techniques, every pixel in a designated area of one image is correlation-measure to construct a disparity map with an
compared with the pixels in the same row of the other image. estimate of disparity for each point visible in the stereo pair.
This is done with few constraints such as maximum disparity to Area based techniques produce much denser disparity maps,
avert any false matches. Some of the well-known techniques in which is critical in obstacle detection and avoidance.
this approach are Hierarchical Block Matching [1], Census [2],
Correlation Matching [3-4] and Zitnick-Kanade (Cooperative Since corresponding points are the same real point in the
Algorithm for Stereo Matching and Occlusion Detection) [5-6] captured scene projected into left and right images, we can
algorithms. The feature-based methods rely on finding special assume that their surroundings in both pictures would be quite
features in corresponding pairs and may result in fewer depth similar. Area-based methods use this similarity for
values lowering the computational complexity. corresponding points detection [9-13]. It is computed from the
difference in local neighborhoods (usually a constant size
Census Algorithm
Zitnick-Kanade
Stereo Algorithm
Figure 2. Left and Right stereo images with ‘corners’ marked using Harris corner detection
105
[20, 21]. Essentially, the algorithm derives a number of self-
characteristic properties from a binary image of an object.
These properties are invariant to rotation, scale and translation.
Let f(i,j) be a point of a digital image of size M×N (i = 1,2, …,
M and j = 1,2, …, N). The two dimensional moments and
central moments of order (p + q) of f(i,j), are defined as:
M N
m pq = ∑∑ i p j q f (i, j ) (1)
i =1 j =1
M N
U pq = ∑∑ (i − i ) p ( j − j ) q f (i, j ) (2)
i =1 j =1
m m
Figure 3. Left image is divided into blocks of size Where i = 10 and j = 01
m m
20x20 pixels 00 00
φ1 = η 20 + η 02 (3)
pairs with subpixel accuracy. They will also include object-
dependent constraints in the solution of the correspondence
problem such as ‘corners’ when using Harris corner detection φ 2 = (η 20 − η 02 ) 2 + 4η11 2 (4)
algorithm.
φ3 = (η 30 − 3η12 ) 2 + (3η 21 − η 03 ) 2 (5)
III. MOMENT INVARIANT BASED STEREO MATCHING
Using invariant moments to locate corresponding features φ 4 = (η 30 + η12 ) 2 + (η 21 + η 03 ) 2 (6)
in stereo pairs will be less computationally intensive as the
number of block comparisons will depend on the disparity
constraint as well as number of features. In our approach, Where η pq is the normalized central moments defined by:
‘corners’ will be used as features as shown in Fig. 2. Then the
left image is divided into 20x20 pixel blocks and will be
marked with occupying features (Fig. 3). Then the blocks U pq
containing features will be used to calculate the first 4 moments η pq =
using equations 3 to 6. Even though the moment invariants can U 00r
calculate upto 7 such moments, 4 moments will be adequate to
uniquely represent a square. Using epipolar and disparity B. Example of Invariant Properties
constraints, we can now evaluate the adjoining 20x20 pixel
blocks in the Right image for matches for the blocks containing Fig. 4 shows an image containing letter ‘A’, rotated and
the features. The ‘closeness’ of these moments will be decided scaled, translated and noisy versions of it. There respective
using a threshold that is dependent on the image scenery. When moment invariants calculated using formulas using Equations
such blocks are identified, the simple depth calculation formula (1) to (6) and 3 other equations not defined in this paper are
can be used to calculate the depth to the identified feature as shown in Table 1.
follows: It is obvious from Table 1 that the algorithm produces
the same result for the first three orientations of letter ‘A’
despite the different transformations applied upon them. There
f is only one value, i.e. Φ1, displays a small discrepancy of 5.7%
d =b
D due to the difference in scale. The other values of the three
figures are effectively the same for Φ2, Φ3, Φ4, Φ5, Φ6 and
Where d is the depth the object from the camera plane, f Φ7. The last letter, however, reveals the drawback of the
being the focus of the camera, D the disparity a b is the algorithm: it is susceptible to noise. Specifically, the added
baseline distance. Here the underlying assumption is that the noisy spot in the letter has changed the entire moment
epipolar lines run parallel to the image lines, so that invariants set. This drawback suggests that moment invariants
corresponding points lie on the same image lines. can only be applied on noise-free images in order to achieve
the best results. Since the algorithm is firmly effective against
A. Moment Invariants transformations, a simple classifier can exploit these moment
Moment invariants algorithm has been known as one of the invariants values to differentiate as well as recognize the letter
most effective methods to extract descriptive feature for object ‘A’ from other letters. In this paper, we have used these first 4
recognition applications. The algorithm has been widely moment invariants to compare similar regions rather than
applied in classification of aircrafts, ships, ground targets, etc calculating similarity between every pixel using correlation.
106
realtime applications that we are interested in. Fig. 5
summarizes the major steps in the proposed algorithm.
The major advantage of local approaches presented here is
speed and suitability for hardware implementation. Global
optimization algorithms commonly require 2 to 3 orders of
magnitude more time than even the software implementations
Figure 4. Letter ‘A’ in different orientations of local methods.
TABLE I. MOMENT INVARIANT VALUES FOR ‘A’
V. SUMMARY
In many respects feature based algorithms are established
as the most robust way to implement stereo vision algorithms
for the industrial-type stereo problems. The advantages offered
by using features are that feature-based representations contain
desirable statistical properties and provide algorithmic
flexibility to the programmer. The flexibility being that
algorithmic constraints can be applied explicitly to the data
structures rather than implicitly as with area based correlation
techniques. In particular the use of ‘corners’ leads to algorithms
which are as locally accurate as the precision to which the
edges can be extracted.
In the algorithm, each block containing at least one ‘feature’
will be compared with the other blocks in the same horizontal Even though, feature-based techniques do not produce
region of the other image. Since only 4 values are compared denser disparity map, their values are more accurate than the
per block against 400 such comparisons in the case of area-based techniques. Some of the reasons for this is that
correlation, the computational saving will be immense and will presence of shadows produce erroneous results; some surfaces
lead to faster depth computations. The algorithm is moderately were non-uniformly reflecting light from; backgrounds are
easy to implement and requires only a moderate computational usually flat single-colored surfaces and some parts of the first
effort from the CPU. Feature extraction, as a result, can be image were occluded in one of the images .
progressed rapidly and efficiently. The summary of the
proposed technique is presented in Figure 5.
If certain blocks do not contain any features, the search area
in the other image (Right) can be restricted using maximum
disparity constraint. This will further cut down the
computational requirements as opposed to area-based
correlation techniques.
107
[5] C. Zitnick and T. Kanade, “A Cooperative Algorithm for Stereo
Matching and Occlusion Detection,” tech. report CMU-RI-TR-99-35,
Robotics Institute, Carnegie Mellon University, October, 1999.
[6] C. Zitnick and T. Kanade, “A Cooperative Algorithm for Stereo
Matching and Occlusion Detection,” IEEE Trans. Pattern Analysis and
Machine Intellig., Vol. 22-7, pp. 675-684, 2000.
[7] P. Premaratne,. and Q. Nguyen, Consumer electronics control system
based on hand gesture moment invariants, IET Computer Vision, vol. 1-
1, pp. 35-41, 2007.
[8] P. Premaratne, F. Safaei and Q. Nguyen, Moment Invariant Based
Control System Using Hand Gestures: Book Intelligent Computing in
Signal Processing and Pattern recognition, Book Series Lecture Notes in
Control and Information Sciences, Publisher Springer Berlin /
Heidelberg, Vol. 345/2006, pp. 322-333, 2006.
[9] Harris C. and Stephens M. “A Combined Corner and Edge Detector”.
Proc. 4th Alvey Vision Conference, pp. 147-151, 1988
[10] L. Di Stefano, M. Marchionni, S. Mattoccia, G. Neri, “A Fast Area-
Based Stereo Matching Algorithm”, 15th IAPR/CIPRS International
Conference on Vision Interface, Calgary, Canada, May 27-29, 2002.
[11] A. Fusiello, V. Roberto, E. Trucco, “Experiments with a new Area-
Based Stereo Algorithm”, International Conference on Image Analysis
and Proceedings, Florence 1997
[12] Barnard S.T., Thompson W.B., “Disparity Analysis of Images”, IEEE
Trans. of PAMI, vol. PAMI-2, pp. 4, 1980.
Figure 6. Comparison of Moment Invariant technique with correlation
method using SSD and SAD metrics [13] Hannah M.J., “Bootstrap Stereo”, Proc. Image Understanding
Workshop, 1980.
We managed to demonstrate that feature-based techniques [14] Hannah M.J., “SRI’s Baseline Stereo System”, Proc. of DARPA Image
relying on moment invariants for matching can process a frame Understanding Workshop, pp. 149-155,1985.
in the order of tenths of seconds in software implementations. [15] Hannah M.J., “A System for Digital Stereo Image Matching”.
This implies that the algorithm can comfortably reach higher Photogrammatic Engineering and RemoteSensing pp. 1765-1770, 1989.
video rates using DSP and FPGA implementations [22]. At the [16] Peter Burt, Bela Julesz, “Modifications of the Classical notion of
moment, there is no technique for achieving simultaneously the Panum’s Fusional Area”, Perception, vol. 9, pp. 671-682, 1980.
high quality range obtained from global optimization with the [17] Lane R.A., Thacker N.A., Seed N.L., “Stretch-Correlation as a Real-
fast run-times of local schemes. Time Alternative to Feature Based Stereo Matching Algorithms”, Image
and Vision Comp. Journal, vol. 12 No. 4, May 1994.
[18] Lane R.A., Thacker N.A., Seed N.L., Ivey P.A., “A Stereo Vision
REFERENCES Processor”, Proc. of IEEE Custom IntegratedCircuits Conference, 1995.
[1] Q. Koschan, V. Rodehorst and K. Spiller, “Color stereo vision using [19] J. Weng, Camera calibration with distortion models and accuracy
hierarchical block matching and active color illumination”, Proc. 13th evaluation. IEEE Trans. Patt. Anal. Machine Intel., Vol. 14, pp. 965-980,
Int. Conf. Pattern Recog., Vol. 1, pp. 835-839, 1996. 1992.
[2] R. Zabih and J. Woodfill, “Non-parametric local transformers for [20] P. Premaratne, “ISAR ship classification; An alternative approach”,
computing visual correspondence”, Third Eurpean Conf. Computer CSSIP-DSTO Internal Publication, Australia, March, 2003.
Vision”, 1994. [21] Q. Zhongliang, and W. Wenjun, “Automatic ship classification by
[3] J. C. M. van Beek and J. J. Lukkien, “A parallel algorithm for stereo superstructure moment invariants and two-stage classifier”, ICCS/ISITA
vision based on correlation”, Proc. 3rd Int. conf. High Performance '92 Comm. on the Move, pp. 544-547, 1992.
Computing”, 1996. [22] J. van der Horst, R. van Leeuwen, H. Broers, R. Kleihorst, P. Jonker. A
[4] H. Hirschm¨uller, P.R Innocent, J.M. Garibaldi, “Real-Time Correlation- Real-Time Stereo SmartCam, using FPGA, SIMD and VLIW. Proc. Of
Based Stereo Vision with Reduced Border Errors”, Int. Journal of the 2nd Workshop on Applications of Computer Vision May 12, 2006.
Computer Vision, Vol. 47(1/2/3), pp. 229-246, 2002.
108