Wu 2014 Speed
Wu 2014 Speed
Received Jul 02, 2013. Revised Jan 07, 2014. Accepted Jan 07, 2014.
Tirui Wu [email protected]
Ford Motor Research & Engineering (Nanjing) Co., Ltd., Nanjing, China
Alexander Toet [email protected]
TNO, Kampweg 5, 3769DE, Soesterberg, The Netherlands
Abstract
Template matching is a widely used pattern recognition method, especially in industrial
inspection. However, the computational costs of traditional template matching increase
dramatically with both template-and scene imagesize. This makes traditional template
matching less useful for many (e.g. real-time) applications. In this paper, we present a
method to speed-up template matching. First, candidate match locations are determined
using a cascaded blockwise computation of integral image based binary test patterns.
Then, traditional template matching is applied at the candidate match locations to de-
termine the best overall match. The results show that the proposed method is fast and
robust.
Keywords: Template matching, integral images, binary test patterns.
www.jprr.org
1. Introduction
Matching a template sub-image into a given image is one of the most common techniques
used in signal and image processing [1], and widely used in many fields related to com-
puter vision and image processing, such as image retrieval [2], image recognition [3], image
registration [4], object detection [5] and stereo matching [6].
Traditional template-matching consists in sliding the template over the search area and,
at each position, calculating a correlation (ordistortion) measure estimating the degree of
(dis-) similarity between the template and the image. Then, the maximum correlation
(or minimum distortion) position is taken to represent the instance of the template into
the image under examination, with a threshold on the (dis-)similarity measure allowing
for rejection of poor matches. The typical distortion measures used in template matching
algorithms are the sum of absolute differences (SAD) and the sum of squared differences
(SSD), while normalized cross-correlation (NCC) is by far the most widely used correlation
measure.
The NCC value ρ representing the similarity of a template image T (i, j) of size m × n at
the location (x, y) in a scene image I(x, y) of size M × N is defined as
m−1
P n−1
P
(I(x + i, y + j) − µI ) · (T (i, j) − µT )
i=0 j=0
ρ(x, y) = s (1)
m−1
P n−1
)2 )2
P
(I(x + i, y + j) − µI · (T (i, j) − µT
i=0 j=0
c 2014 JPRR. All rights reserved. Permissions to make digital or hard copies of all or part of
this work for personal or classroom use may be granted by JPRR provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. To copy otherwise, or to republish, requires a fee and/or
special permission from JPRR.
Tirui Wu and Alexander Toet
Due to the complexity of the matching function and the large number of locations to check,
traditional template matching is computationally expensive. To speed-up the basic ap-
proach, two categories of approaches have been proposed: (i) efficient representation of
both the template and the image so that matching can be done quickly, and (ii) fast search
(reduced precision) techniques to reduce the number of matching operations between the
target and the image. A well-known example of the first approach is the use of the fast
Fourier transform (FFT) to simultaneously calculate the correlation between the template
and every part of the input image as the product of their Fourier transforms [7]. A typi-
cal example of the second approach is the use of multi-resolution schemes [8, 9] to locate
a coarse-resolution template into a coarse-resolution representation of the image and then
refining the search at higher resolution levels only at locations where the low-resolution sim-
ilarity is high. However, these methods still have limited practical value, due algorithmic
complexity, strict conditions for application, and sensitivity to local variations in luminance,
scale, rotation and distortion.
Here we propose a new template matching scheme that combines computational efficiency
with low sensitivity to local image variations, using blockwise computed binary test patterns
based on integral images to determine candidate match locations over the image support
[10]. Integral images have previously been applied to speed-up template matching, e.g.
to accelerate the computation of the NCC [11, 12], and to efficiently compute polynomial
approximations [13] or characteristic features [14, 15] of image patches. Also, it has been
demonstrated that template matching can be performed fast and robust by computing and
matching image features in a blockwise fashion from integral images [16]. The contribution
of the proposed approach is that it enables fast and robust assessment of candidate match
locations through a cascaded blockwise computation of weak integral image based binary
test patterns (classification ability of an individual binary test pattern is weak). This ap-
proach speeds up conventional template matching by quickly discarding background image
regions that are unlikely to contain the template. The use of integral images allows fast
computation of the individual weak block binary test patterns, while their cascaded compu-
tation allows early termination of the computational process if the initial estimates suggest
that the location corresponds to a poor match.
The rest of this paper is organized as follows. First we introduce the concept of integral
images. Next we describe our new fast template matching scheme. Then, we present the
results of some computational experiments that demonstrate the efficiency and robustness
of our new method. Finally, we end with some conclusions.
2. Integral images
This section describes the concept of integral images that was introduced by Viola and
Jones [10] in computer vision and which is based on prior work in computer graphics [17].
Integral images (also known as summed-area tables) allow fast computation of rectangular
image features since they enable the summation of image values over any rectangle image
region in constant time. In the next section we will show that integral images can also serve
2
Speed-up Template Matching through Integral Image based Weak Classifiers
to speed up template matching for rectangular shaped templates. Let an i(x, y) be the
original image value at location x, y. The integral image I(x, y) is an intermediate image
representation with a value at image location x, y that is equal to the sum of the pixel values
above and to the left of x, y including the value at the location x, y itself:
y
x X
X
I(x, y) = (x′ , y ′ ) (3)
x′ =0 y ′ =0
The integral image can be computed in a single pass over the original image, using the
following pair of recursive formulas [10]:
yb
xb X
X
I(x, y) = i(x, y) = I(xb , yb ) + I(xa − 1, ya − 1) − I(xa − 1, yb ) − I(xb , ya − 1) (5)
x=xa y=ya
Integral images have been widely used to speed up the computation of region-based statis-
tical measures, such as area sums [18], covariance [19, 20], and co-occurrence [21] and have
successfully been applied to texture mapping [17], the detection of features [22], faces [10],
humans [23], and objects [24], stereo correspondence [25], and adaptive thresholding [26].
(a) (b)
Fig. 1: (a) Original image with (b) its integral image representation.
3
Tirui Wu and Alexander Toet
Fig. 2: Using integral images it takes only three additions and four memory accesses to calculate the sum
of intensities over a rectangular image region of any size (S = A-B-C+D).
4
Speed-up Template Matching through Integral Image based Weak Classifiers
is computed over the current image window in the same order in which the weak binary
test patterns cTi were computed for the template image (i.e. by partitioning the M T × N T
local image window into k × l rectangular blocks). To reduce the overall computational
costs of the matching process, classification is performed in a cascaded fashion: each local
weak image block binary test pattern is first compared to the corresponding weak template
block binary test pattern cTi , before computing the next one (see Fig. 3). If ci = cTi then
the next local weak image block binary test pattern is computed, else the computation of
local weak image block binary test patterns is terminated and the position of the current
image window is shifted one step further. If all weak local image block binary test patterns
and template block binary test patterns are the same, then the current window passes the
overall (strong) classification and its position is added to a list of candidate target positions.
In practice, the requirement of identity between the complete set of template and image
weak block binary test patterns is too strict since there will typically be rotation, scale or
distortion differences between the template and the image content. The result would be
that no scan location would pass all binary test patterns and no candidate matches would
be found. To include some tolerance for small image variations in the matching process we
introduce a limit parameter LN E representing the maximal number (the limit) of instances
in which the corresponding weak image- and template binary test patterns are allowed to
be different (Not Equal ), i.e. the maximal number of weak binary test pattern rejections
that is allowed before final overall rejection occurs. Thus, the current position of the sliding
template window is added to the list of candidate matches if it is rejected by at most LN E
weak binary test patterns, or equivalently, if it passes at least k × l − LN E weak binary test
patterns. LN E value range is [0, k × l], when LN E equals k × l, the current scanned posi-
tion will pass all the tests which reduces the proposed method to the traditional template
matching.
In the second phase of the template matching method proposed here the overall best
match is determined by applying conventional template matching to the set of candidate
matches identified in the first phase.
In many practical situations like industrial and medical environments imagesareoften cap-
tured under stable illumination conditions. As a result, the difference between the mean
grayscale value of the template and the mean of the corresponding target area in the scene
will be small. This allows a further reduction of the computational effort: first compare
the mean of the template image with the mean of current image window, and skip further
inspection if this difference exceedsa prespecified intensity threshold T; else, test the current
location further by computing the cascade of weak block binary test patterns.
The implementation of the template matching process proposed in this study involves the
following steps:
1. Compute integral images for both the scene and template images.
2. Compute the set of weak image block binary test patterns cTi , i ∈ {1, k × l} for the
template image.
3. Slide an inspection window (with the size of the template image) to the next position
in the scene.
4. If the difference between the mean value of the local window and the mean value of
the template exceeds a given threshold T go to step 3, else continue.
5
Tirui Wu and Alexander Toet
5. Calculate the weak image block binary test patterns ci , i ∈ {1, k × l} over the window
support and compare them to the corresponding weak template block binary test
patterns, while the number of block rejections is less than or equal to a given limit
LN E .
6. If the total number of block rejections does not exceed LN E assign a candidate status
to the current image window, else reject the current window position as a candidate
target location.
8. Determine the overall best match by applying conventional template matching to the
set of candidate target locations.
Fig. 3: The computation of a strong template binary test pattern as a cascade of weak block binary test
patterns.
4. Experiments
In this section we investigate the sensitivity of the proposed template matching process
to variations in the limit parameter LN E representing the maximal allowed number of
block rejections, and we investigate the efficiency of the new method by comparing its
6
Speed-up Template Matching through Integral Image based Weak Classifiers
7
Tirui Wu and Alexander Toet
6. Efficiency
To give an impression of computational efficiency of the proposed method, we compare
its performance both with traditional template matching and with an FFT based tem-
plate matching method (http://docs.opencv.org), implemented by the OpenCV function
matchTemplate which has been optimized using Intel’s SSE (Streaming SIMD Extensions:
http://software.intel.com) instructions. Figure 5 shows a template image (Fig.5a) and the
image from which it was cropped (Fig.5b).
(a) (b)
Fig. 5: (a) A template image and (b) the scene from which is was taken.
This template was matched both to the scene from which it was taken (Fig.6a) and to
9 other images created by cropping windows with different orientations and positions from
the original scene (Fig.6b-j).
Table 2 lists the results of both template matching procedures for the 10 images shown
in Figure 6.
8
Speed-up Template Matching through Integral Image based Weak Classifiers
Table 2: Performance of the proposed method relative to traditional and FFT based template matching.
d (84, 199) (12, 270) (97, 161) 1293.91 23.51 1.20 1074.54 19.53
e (202, 241) (203, 224) (207, 221) 1293.09 23.50 1.44 895.29 16.27
f (216, 244) (211, 264) (202, 259) 1287.68 23.42 1.51 854.75 15.55
g (151, 243) (153, 235) (151, 234) 1292.35 23.46 1.59 810.47 14.71
h (154, 297) (150, 316) (145, 317) 1281.57 23.36 1.57 815.91 14.87
i (228, 293) (227, 294) (226, 293) 1278.37 23.60 1.59 801.51 14.8
j (110, 320) (4, 288) (143, 265) 1293.53 23.65 0.94 1368.84 25.03
Tirui Wu and Alexander Toet
Fig. 6: Match results of the proposed method (red) compared to conventional (green) and FFT based (blue)
template matching for the template from Fig.5a and for different orientations and sections of the original
scene (Fig.5b). Note that conventional and FFT based template matching yield exactly the same template
positions (i.e. the green and blue windows coincide).
10
Speed-up Template Matching through Integral Image based Weak Classifiers
The coordinates represent the position of the top-left corner of the template. Listed
are the actual position coordinates, and the results of the three template matching meth-
ods investigated (traditional matching, FFT based matching, and the proposed method).
Coordinates in bold indicate incorrect matches.These results show that restricting tradi-
tional template matching to a set of candidate target locations determined from integral
image based weak block binary test patterns that are computed in a cascaded fashion can
reduce computation time up to three orders of magnitude. Also, the proposed method
yields a speed-up of at least one order of magnitude relative to the FFT based method. In
some cases (Fig.6d and 6j) both traditional and FFT based template matching fail to find
reasonable matches (when the scene image is rotated there is no longer an element wise
correspondence to the template image that yields a global maximum of the NCC and local
maxima at other locations will be taken as matches), while the new approach yieldsthe
correct target position because it rejects these locations as false matches already in the first
candidate target classification stage.
7. Conclusions
In this paper we proposed a new approach to speed up template matching. Integral images
are used to efficiently compute local (image and template) binary test patterns in a cascaded
and patchwise fashion, which allows early termination of the computational process if the
initial estimates suggest that the local image region corresponds to a poor match. The
present results show that the proposed method can provide up to three orders of magnitude
speedup over the traditional computation of normalized cross correlation, and up to an
order of magnitude improvement over a state-of-the-art FFT based method. Furthermore,
it is relatively robust for local image distortions. A limitation of the proposed method is
the fact that the parameter LN E cannot be determined adaptively or based on the content
of the input template image. We plan to address this issueina future study.
References
[1] S. Omachi and M. Omachi, Fast template matching with polynomials, IEEE Transactions on
Image Processing, vol. 16(8), 2007, pp. 2139-2149.
[2] A. Del Bimbo and P. Pala, Visual image retrieval by elastic matching of user sketches, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 19(2), 1997, pp. 121-132.
[3] H. Peng, L. Fulmi, and C. Zheru, Document image recognition based on template matching of
component block projections, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 25(9), 2003, pp. 1188-1192.
[4] Y. Bentoutou, N. Taleb, K. Kpalma, and J. Ronsin, An Automatic Image Registration for
Applications in Remote Sensing, IEEE Transactions on Geoscience and Remote Sensing, vol.
43(9), 2005, pp. 2127-2137.
[5] R.M. Dufour, E.L. Miller, and N.P. Galatsanos, Template matching based object recognition
with unknown geometric parameters, IEEE Transactions on Image Processing, vol. 11(12),
2002, pp. 1385-1396.
[6] L.D. Stefano, M. Marchionni, and S. Mattoccia, A fast area-based stereo matching algorithm,
Image and Vision Computing, vol. 22(12), 2004, pp. 983-1005.
[7] S.L. Kilthau, M.S. Drew, and T. Moller, Full search content independent block matching based
on the fast Fourier transform, Proceedings of the 2002 International Conference on Image
Processing, vol. I, IEEE, Piscataway, NJ, USA, 2002, pp. 669-672.
[8] A. Rosenfeld and G.J. Vanderburg, Coarse-fine template matching, IEEE Transactions on
Systems, Man and Cybernetics, vol. SMC-7(2), 1977, pp. 104-107.
[9] S.L. Tanimoto, Template matching in pyramids, Computer Graphics and Image Processing,
vol. 16(4), 1981, pp. 356-369.
11
Tirui Wu and Alexander Toet
[10] P. Viola and M.J. Jones, Robust real-time face detection, International Journal of Computer
Vision, vol. 57(2), 2004, pp. 137-154.
[11] J.P. Lewis, Fast template matching, Vision Interface, vol. 95(120123), 1995, pp. 15-19.
[12] K. Briechle and U.D. Hanebeck, Template matching using fast normalized cross correlation,
Optical Pattern Recognition XII,vol. SPIE-4387, The International Society for Optical Engi-
neering, Bellingham, WA, USA, 2001, pp. 95-102.
[13] H. Schweitzer, J.W. Bell, and F. Wu, Very fast template matching, Computer Vision ECCV
2002, Springer, Berlin Heidelberg, Germany, 2006, pp. 358-372.
[14] Y. Hel-Or and H. Hel-Or, Real-time pattern matching using projection kernels, IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, vol. 27(9), 2005, pp. 1430-1445.
[15] F. Tang and H. Tao, Fast multi-scale template matching using binary features, Proceedings of
the IEEE Workshop on Applications of Computer Vision WACV ’07, 2007, p. 36.
[16] G. Guo and C.R. Dyer, Patch-based image correlation with rapid filtering, IEEE Conference
on Computer Vision and Pattern Recognition (CVPR ’07), IEEE, Piscataway, NJ, USA, 2007,
pp. 1-6.
[17] F.C. Crow, Summed-area tables for texture mapping, Proceedings of the 11th annual conference
on Computer Graphics and Interactive Techniques (SIGGRAPH ’84),vol. 18, ACM, New York,
NY, USA, 1984, pp. 207-212.
[18] P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features,
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR 2001),vol. I, IEEE, Piscataway, NJ, USA, 2001, pp. 511-518.
[19] O. Tuzel, F. Porikli, and P. Meer, Region covariance: A fast descriptor for detection and
classification, Computer Vision ECCV 2006, vol. LNCS 3952, Part II, Springer-Verlag, Berlin
Heidelberg, 2006, pp. 589-600.
[20] F. Porikli and O. Tuzel, Fast construction of covariance matrices for arbitrary size image win-
dows, Proceedings of the IEEE International Conference on Image Processing 2006, IEEE,
Piscatawy, NJ, USA, 2006, pp. 1581-1584.
[21] X. Wang, G. Doretto, T. Sebastian, J. Rittscher, and P. Tu, Shape and appearance context
modeling, Proceedings of the 11th International Conference on Computer Vision (ICCV 2007),
IEEE, Piscataway, NJ, USA, 2007, pp. 1-8.
[22] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, Speeded-up robust features (SURF), Computer
Vision and Image Understanding, vol. 110(3), 2008, pp. 346-359.
[23] C. Beleznai and H. Bischof, Fast human detection in crowded scenes by contour integration and
local shape estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2009), IEEE, Piscataway, NJ, USA, 2009, pp. 2246-2253.
[24] C. Messom and A. Barczak, Stream processing of integral images for real-time object detec-
tion, Proceedings of the Ninth International Conference on Parallel and Distributed Computing,
Applications and Technologies (PDCAT 2008), IEEE, Piscataway, NJ, 2008, pp. 405-412.
[25] O. Veksler, Fast variable window for stereo correspondence using integral images, Proceed-
ings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR’03),vol. I, IEEE, Piscataway, NJ, USA, 2003, pp. 556-561.
[26] F. Shafait, D. Keysers, and T.M. Breuel, Efficient implementation of local adaptive thresholding
techniques using integral images, Document Recognition and Retrieval XV, vol. SPIE-6815,
article id 681510, The International Society for Optical Engineering, Bellingham, WA, USA,
2008.
[27] C. Harris and M. Stephens, A combined corner and edge detector, Proceedings of the Fourth
Alvey Vision Conference, 1988, pp. 147-151.
12