0% found this document useful (0 votes)
53 views4 pages

Fast Template Matching: J. P. Lewis Industrial Light & Magic P.O. Box 2459 San Rafael CA 94912 U.S.A

This document summarizes J.P. Lewis's paper on fast normalized cross-correlation for template matching. It discusses how normalized cross-correlation is typically computed in the spatial domain due to lack of an efficient frequency domain expression. The document then shows that unnormalized cross-correlation can be efficiently normalized using precomputed summed-area tables containing integrals of the image and image squared over the search window. This avoids directly computing local image statistics at each candidate match location during template matching.

Uploaded by

vpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views4 pages

Fast Template Matching: J. P. Lewis Industrial Light & Magic P.O. Box 2459 San Rafael CA 94912 U.S.A

This document summarizes J.P. Lewis's paper on fast normalized cross-correlation for template matching. It discusses how normalized cross-correlation is typically computed in the spatial domain due to lack of an efficient frequency domain expression. The document then shows that unnormalized cross-correlation can be efficiently normalized using precomputed summed-area tables containing integrals of the image and image squared over the search window. This avoids directly computing local image statistics at each candidate match location during template matching.

Uploaded by

vpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

J.P.

Lewis, Fast Template Matching, Vision Interface 95, Canadian Image Processing
and Pattern Recognition Society, Quebec City, Canada, May 15-19, 1995, p. 120-123.
(Also see the expanded and corrected version "Fast Normalized Cross-Correlation").

Fast Template Matching


J. P. Lewis
Industrial Light & Magic
P.O. Box 2459 San Rafael CA 94912 U.S.A.

Abstract clidean distance)


X
d2f,t (u, v) = [f (x, y) − t(x − u, y − v)]2
Although it is well known that cross correlation can be x,y
efficiently implemented in the transform domain, the
normalized form of cross correlation preferred for tem- (the sum is over x, y under the window containing the
plate matching applications does not have a simple fre- feature positioned at u, v). In the expansion of d2
quency domain expression. Normalized cross correla- X
tion is usually computed in the spatial domain for this d2f,t (u, v) = [f 2 (x, y) − 2f (x, y)t(x − u, y − v)
reason. This short paper shows that unnormalized cross x,y

correlation can be efficiently normalized using precom- + t2 (x − u, y − v)]


puted tables containing the integral of the image and P 2
image2 over the search window. the term t (x − u, y − v) is constant. If the term
f 2 (x, y) is approximately constant then the remain-
P
ing cross correlation term
X
c(u, v) = f (x, y)t(x − u, y − v)
1 Template Matching by Cross x,y

Correlation is a measure of the similarity between the image and the


feature.
Correlation is an important tool in image processing,
pattern recognition, and other fields. The correlation be-
tween two signals (cross correlation) is a standard ap- 2 Normalized Cross Correlation
proach to feature detection [1, 2] as well as a build-
ing block for more sophisticated recognition techniques If the image energy f 2 (x, y) is not constant however,
P
(e.g. [3]). Textbook presentations of correlation com-
feature matching by cross correlation can fail. For ex-
monly mention the convolution theorem and the atten- ample, the correlation between the template and an ex-
dant possibility of efficiently computing correlation in
actly matching region in the image may be less than the
the frequency domain via the fast Fourier transform. Un-
correlation between the template and a bright spot. An-
fortunately the normalized form of correlation (correla- other drawback of cross correlation is that the range of
tion coefficient) preferred in many applications does not
c(u, v) is dependent on both the size of the template and
have a correspondingly simple and efficient frequency the template and image amplitudes.
domain expression, and spatial domain implementation
is recommended instead (e.g. [2], p. 585; also see e.g. Variation in the image energy under the template can
[4] sections 13.2 and 14.5). This paper shows that the be reduced by high-pass filtering the image before cross
unnormalized correlation can be efficiently normalized correlation. In a transform domain implementation the
using using precomputed tables of the integral of the sig- filtering can be conveniently added to the frequency do-
nal and signal2 , i.e., summed-area tables [5]. main processing, but selection of the cutoff frequency is
problematic – a low cutoff may leave significant image
Template matching techniques [1] attempt to answer
energy variations, whereas a high cutoff may remove in-
some variation of the following question: Does the im-
formation useful to the match.
age contain a specified view of some feature, and if so,
where? The use of cross correlation for template match- Normalized cross correlation overcomes these difficul-
ing is motivated by the distance measure (squared Eu- ties by normalizing the image and template vectors to
unit length, yielding a cosine-like correlation coeffi- SincePt′ has zero mean and thus zero sum the term
cient f¯u,v t′ (x − u, y − v) is also zero, so the numerator
of the normalized cross correlation can be computed us-
γ(u, v) =
ing (3).
P ¯
x,y [f (x, y) − fu,v ][t(x − u, y − v) − t̄]
nP o0.5 (1) Examining the denominator of (1), the length of the tem-
[f (x, y) − ¯u,v ]2 [t(x − u, y − v) − t̄]2
f plate vector can be precomputed in approximately 3N 2
x,y
operations (small compared to the cost of the cross cor-
where t̄ is the mean is the mean of the template and f¯u,v relation), and in fact the template can be pre-normalized
is the mean of f (x, y) in the region under the template. to length one.
The
P problematic ¯quantities are those in the expression
2
3 Computation x,y [f (x, y) − f u,v ] . The image mean and local en-
ergy must be computed at each u, v, i.e. at (M −N +1)2
locations, resulting in almost 3N 2 (M − N + 1)2 opera-
Consider the numerator in (1) and assume that we have tions (counting add, subtract, multiply as one operation
images f ′ (x, y) ≡ f (x, y) − f¯u,v and t′ (x, y) ≡ each). This computation is more than is required for the
t(x, y) − t̄ in which the mean value has already been direct computation of (2) and it may considerably out-
removed: weight the computation indicated by (3) when the trans-
num X
γ (u, v) = f ′ (x, y)t′ (x − u, y − v) (2) form method is applicable. A more efficient means of
x,y computing the image mean and energy under the tem-
plate is desired.
For a search window of size M 2 and a template of size
N 2 (2) requires approximately N 2 (M − N + 1)2 addi- These quantities can be efficiently computed from
tions and N 2 (M − N + 1)2 multiplications. summed-area tables containing the integral (running
sum) of the image and image square over the search area,
Eq. (2) is a convolution of the image with the reversed
i.e.,
template t′ (−x, −y) and can be computed by
F −1 {F (f ′ )F ∗ (t′ )} (3) s(u, v) = f (u, v)+s(u−1, v)+s(u, v−1)−s(u−1, v−1)
where F is the Fourier transform. The complex con-
jugate accomplishes reversal of the template via the s2 (u, v) = f 2 (u, v) + s2 (u − 1, v)
Fourier transform property F f ∗ (−x) = F ∗ (ω). + s2 (u, v − 1) − s2 (u − 1, v − 1)
Implementations of the fast Fourier transform (FFT) al-
gorithm generally require that f ′ and t′ be extended with s(u, v) = s2 (u, v) = 0 when either u, v < 0. The
with zeros to a common power of two. The complexity energy of the image under the template positioned at u, v
of the transform computation (2) is then 12M 2 log2 M is then
real multiplications and 12M 2 log2 M real additions [6].
ef (u, v) = s2 (u + N − 1, v + N − 1)
When M is much larger than N the complexity of the
direct ‘spatial’ computation (2) is approximately N 2 M 2 − s2 (u − 1, v + N − 1)
multiplications/additions, and the direct method is faster − s2 (u + N − 1, v − 1)
than the transform method. The transform method be- + s2 (u − 1, v − 1)
comes relatively more efficient as N approaches M and
with larger M, N . and similarly for the image sum under the template. This
technique of computing a definite sum from a precom-
puted running sum was introduced in [5] to rapidly low-
4 Normalized Cross Correlation in pass filter texture images.

the Transform Domain The problematic quantity x,y [f (x, y)−f¯u,v ]2 can now
P
be computed with very few operations since it expands
into an expression involving only the image sum and
Examining again the numerator of (1), we note that the
sum squared under the template. The construction of the
mean of the template can be precomputed, leaving
tables requires approximately 3M 2 operations, which is
num X
γ (u, v) = f (x, y)t′ (x − u, y − v) less than the cost of computing the numerator by (3) and
considerablyPless than the 3N 2 (M − N + 1)2 required
− f¯u,v
X
t′ (x − u, y − v) to compute x,y [f (x, y) − f¯u,v ]2 at each u, v.
5 Application References

The integration of synthetic and processed images into


special effects sequences often requires accurate track- [1] R.O. Duda and P.E.Hart, Pattern Classification and
ing of sequence movement and features. The algorithm Scene Analysis, New York: Wiley, 1973.
described in this paper was used for this purpose in the
recent movie Forest Gump. The special effect sequences
in that movie included the replacement of various mov-
ing elements and the addition of a contemporary ac-
tor into historical film and video sequences. Manually
picked features from one frame of a sequence were au- [2] R. C. Gonzalez and R. E. Woods, Digital Image
tomatically tracked over the remaining frames; this in- Processing (third edition), Reading, Massachusetts:
formation was used as the basis for further processing. Addison-Wesley, 1992.
The relative performance of feature tracking by the
transform domain algorithm is a two-dimensional func-
tion of the problem size (search window size) and the
ratio of the template size to search window size. Rela-
tive performance increases along the problem size axis,
with an additional ripple reflecting the relation between [3] R. Brunelli and T. Poggio, “Face Recognition: Fea-
the search window size and the bounding power of two tures versus Templates”, IEEE Trans. Pattern Anal-
(Fig. 1). The property that the relative performance is ysis and Machine Intelligence, vol. 15, no. 10, pp.
greater on larger problems is desirable. Table 1 illus- 1042-1052, 1993.
trates the performance obtained in practice.
Note that while a small (e.g. 102 ) template size would
suffice in an ideal digital image, in practice larger fea-
ture sizes are more immune to imaging noise such as the
effects of film grain and motion blur. Due to the high
digital resolution required to represent video and film, a [4] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and
small movement across frames may correspond to a dis- B.P. Flannery, Numerical Recipes in C, Second
tance of many pixels. Also the selected features are of Edition, Cambridge: Cambridge University Press,
course constrained to the available features in the image; 1992.
distinct “features” are not always available at preferred
scales and locations. As a result of these considerations
search windows of 502 and larger are often employed.

[5] F. Crow, “Summed-Area Tables for Texture Map-


6 Conclusion ping”, Computer Graphics, vol 18, No. 3, pp. 207-
212, 1984.
The transform domain normalized cross correlation al-
gorithm is well suited to special effects feature track-
ing. Manual tracking of multiple features over several
minutes of frames was not feasible, and small errors in
manual tracking would have prevented the seamless in-
tegration of new image elements. [6] A. Goshtasby, S.H. Gage, and J.F. Bartholic, “A
Two-Stage Cross Correlation Approach to Template
The approach presented here should also be compared to
Matching”, IEEE Trans. Pattern Analysis and Ma-
spatial multigrid convolution approaches. We note that
chine Intelligence, vol. 6, no. 3, pp. 374-378, 1984.
while these approaches would work for some examples,
we have encountered other cases in which the available
features (e.g. a configuration of small spots on a wall
or floor) have very little low-frequency content and the
low-frequency search would not be robust.
10

100

0.4 0.5 50
0.6 0.7
0.8 0.9

Figure 1: Measured relative performance of transform


domain versus spatial domain normalized cross correla-
tion as a function of the search window size (depth axis)
and the ratio of the template size to search window size.

search window(s) length direct transform


168 × 86 896 frames 15 hours 1.7 hours
115 × 200, 150 × 150 490 frames 14.3 hours 57 minutes
Table 1. Two tracking sequences from Forest Gump
were re-timed using both transform and direct meth-
ods using identical templates and search windows on an
≈ 60 SPECmark workstation. These times include a 162
sub-pixel search at the location of the best whole-pixel
match. The sub-pixel search was computed using Eq.
(1) (direct approach) in all cases.

You might also like