2-D Convolution and Correlation
2-D Convolution and Correlation
In this section, we discuss the use of spatial filters for image processing. Spatial filter-
ing is used in a broad spectrum of image processing applications, so a solid under-
standing of filtering principles is important. As mentioned at the beginning of this
chapter, the filtering examples in this section deal mostly with image enhancement.
Other applications of spatial filtering are discussed in later chapters.
The name filter is borrowed from frequency domain processing (the topic of
Chapter 4) where “filtering” refers to passing, modifying, or rejecting specified fre-
quency components of an image. For example, a filter that passes low frequencies
is called a lowpass filter. The net effect produced by a lowpass filter is to smooth an
image by blurring it. We can accomplish similar smoothing directly on the image
itself by using spatial filters.
Spatial filtering modifies an image by replacing the value of each pixel by a func-
tion of the values of the pixel and its neighbors. If the operation performed on the
See Section 2.6 regarding
image pixels is linear, then the filter is called a linear spatial filter. Otherwise, the
linearity. filter is a nonlinear spatial filter. We will focus attention first on linear filters and then
introduce some basic nonlinear filters. Section 5.3 contains a more comprehensive
list of nonlinear filters and their application.
www.EBooksWorld.ir
As coordinates x and y are varied, the center of the kernel moves from pixel to pixel,
generating the filtered image, g, in the process.†
Observe that the center coefficient of the kernel, w(0, 0) , aligns with the pixel at
location ( x, y). For a kernel of size m × n, we assume that m = 2a + 1 and n = 2b + 1,
It certainly is possible where a and b are nonnegative integers. This means that our focus is on kernels of
to work with kernels of
even size, or mixed even odd size in both coordinate directions. In general, linear spatial filtering of an image
and odd sizes. However, of size M × N with a kernel of size m × n is given by the expression
working with odd sizes
simplifies indexing and a b
is also more intuitive
because the kernels have g( x, y) = ∑ ∑ w(s, t ) f ( x + s, y + t )
s = −a t = −b
(3-31)
centers falling on integer
values, and they are
spatially symmetric. where x and y are varied so that the center (origin) of the kernel visits every pixel in
f once. For a fixed value of ( x, y), Eq. (3-31) implements the sum of products of the
form shown in Eq. (3-30), but for a kernel of arbitrary odd size. As you will learn in
the following section, this equation is a central tool in linear filtering.
†
A filtered pixel value typically is assigned to a corresponding location in a new image created to hold the results
of filtering. It is seldom the case that filtered pixels replace the values of the corresponding location in the origi-
nal image, as this would change the content of the image while filtering is being performed.
www.EBooksWorld.ir
f (x 1, y 1) f (x 1, y) f (x 1, y 1)
Kernel coefficients
Figure 3.29(a) shows a 1-D function, f, and a kernel, w. The kernel is of size 1 × 5, so
a = 2 and b = 0 in this case. Figure 3.29(b) shows the starting position used to per-
form correlation, in which w is positioned so that its center coefficient is coincident
with the origin of f.
The first thing we notice is that part of w lies outside f, so the summation is
Zero padding is not the
undefined in that area. A solution to this problem is to pad function f with enough
only padding option, as 0’s on either side. In general, if the kernel is of size 1 × m, we need (m − 1) 2 zeros
we will discuss in detail
later in this chapter.
on either side of f in order to handle the beginning and ending configurations of w
with respect to f. Figure 3.29(c) shows a properly padded function. In this starting
configuration, all coefficients of the kernel overlap valid values.
www.EBooksWorld.ir
The first correlation value is the sum of products in this initial position, computed
using Eq. (3-32) with x = 0 :
2
g( 0 ) = ∑ w(s) f (s + 0) = 0
s = −2
This value is in the leftmost location of the correlation result in Fig. 3.29(g).
To obtain the second value of correlation, we shift the relative positions of w and
f one pixel location to the right [i.e., we let x = 1 in Eq. (3-32)] and compute the sum
of products again. The result is g(1) = 8, as shown in the leftmost, nonzero location
in Fig. 3.29(g). When x = 2, we obtain g(2) = 2. When x = 3, we get g(3) = 4 [see Fig.
3.29(e)]. Proceeding in this manner by varying x one shift at a time, we “build” the
correlation result in Fig. 3.29(g). Note that it took 8 values of x (i.e., x = 0, 1, 2, … , 7 )
to fully shift w past f so the center coefficient in w visited every pixel in f. Sometimes,
it is useful to have every element of w visit every pixel in f. For this, we have to start
www.EBooksWorld.ir
with the rightmost element of w coincident with the origin of f, and end with the
leftmost element of w being coincident the last element of f (additional padding
would be required). Figure Fig. 3.29(h) shows the result of this extended, or full, cor-
relation. As Fig. 3.29(g) shows, we can obtain the “standard” correlation by cropping
the full correlation in Fig. 3.29(h).
There are two important points to note from the preceding discussion. First, cor-
relation is a function of displacement of the filter kernel relative to the image. In
other words, the first value of correlation corresponds to zero displacement of the
kernel, the second corresponds to one unit displacement, and so on.† The second
thing to notice is that correlating a kernel w with a function that contains all 0’s and
a single 1 yields a copy of w, but rotated by 180°. A function that contains a single 1
Rotating a 1-D kernel
by 180° is equivalent to with the rest being 0’s is called a discrete unit impulse. Correlating a kernel with a dis-
flipping the kernel about
its axis.
crete unit impulse yields a rotated version of the kernel at the location of the impulse.
The right side of Fig. 3.29 shows the sequence of steps for performing convolution
(we will give the equation for convolution shortly). The only difference here is that
the kernel is pre-rotated by 180° prior to performing the shifting/sum of products
operations. As the convolution in Fig. 3.29(o) shows, the result of pre-rotating the
kernel is that now we have an exact copy of the kernel at the location of the unit
impulse. In fact, a foundation of linear system theory is that convolving a function
with an impulse yields a copy of the function at the location of the impulse. We will
use this property extensively in Chapter 4.
The 1-D concepts just discussed extend easily to images, as Fig. 3.30 shows. For a
kernel of size m × n, we pad the image with a minimum of (m − 1) 2 rows of 0’s at
the top and bottom and (n − 1) 2 columns of 0’s on the left and right. In this case,
m and n are equal to 3, so we pad f with one row of 0’s above and below and one
column of 0’s to the left and right, as Fig. 3.30(b) shows. Figure 3.30(c) shows the
initial position of the kernel for performing correlation, and Fig. 3.30(d) shows the
final result after the center of w visits every pixel in f, computing a sum of products
at each location. As before, the result is a copy of the kernel, rotated by 180°. We will
discuss the extended correlation result shortly.
In 2-D, rotation by 180°
For convolution, we pre-rotate the kernel as before and repeat the sliding sum of
is equivalent to flipping products just explained. Figures 3.30(f) through (h) show the result. You see again
the kernel about one axis
and then the other.
that convolution of a function with an impulse copies the function to the location
of the impulse. As noted earlier, correlation and convolution yield the same result if
the kernel values are symmetric about the center.
The concept of an impulse is fundamental in linear system theory, and is used in
numerous places throughout the book. A discrete impulse of strength (amplitude) A
located at coordinates ( x0 , y0 ) is defined as
⎧⎪ A if x = x0 and y = y0
d( x − x0 , y − y0 ) = ⎨ (3-33)
⎩⎪0 otherwise
†
In reality, we are shifting f to the left of w every time we increment x in Eq. (3-32). However, it is more intuitive
to think of the smaller kernel moving right over the larger array f. The motion of the two is relative, so either
way of looking at the motion is acceptable. The reason we increment f and not w is that indexing the equations
for correlation and convolution is much easier (and clearer) this way, especially when working with 2-D arrays.
www.EBooksWorld.ir
For example, the unit impulse in Fig. 3.29(a) is given by d( x − 3) in the 1-D version of
Recall that A = 1 for a
unit impulse.
the preceding equation. Similarly, the impulse in Fig. 3.30(a) is given by d( x − 2, y − 2)
[remember, the origin is at (0, 0) ].
Summarizing the preceding discussion in equation form, the correlation of a
kernel w of size m × n with an image f ( x, y), denoted as (w 夽 f )( x, y), is given by
Eq. (3-31), which we repeat here for convenience:
a b
(w 夽 f )( x, y) = ∑ ∑ w(s, t ) f ( x + s, y + t )
s = −a t = −b
(3-34)
Because our kernels do not depend on ( x, y), we will sometimes make this fact explic-
it by writing the left side of the preceding equation as w 夽 f ( x, y). Equation (3-34) is
evaluated for all values of the displacement variables x and y so that the center point
of w visits every pixel in f,† where we assume that f has been padded appropriately.
†
As we mentioned earlier, the minimum number of required padding elements for a 2-D correlation is (m − 1) 2
rows above and below f, and (n − 1) 2 columns on the left and right. With this padding, and assuming that f
is of size M × N , the values of x and y required to obtain a complete correlation are x = 0, 1, 2, … , M − 1 and
y = 0, 1, 2, … , N − 1. This assumes that the starting configuration is such that the center of the kernel coincides
with the origin of the image, which we have defined to be at the top, left (see Fig. 2.19).
www.EBooksWorld.ir
where the minus signs align the coordinates of f and w when one of the functions is
rotated by 180° (see Problem 3.17). This equation implements the sum of products
process to which we refer throughout the book as linear spatial filtering. That is, lin-
ear spatial filtering and spatial convolution are synonymous.
Because convolution is commutative (see Table 3.5), it is immaterial whether w
or f is rotated, but rotation of the kernel is used by convention. Our kernels do not
depend on ( x, y), a fact that we sometimes make explicit by writing the left side
of Eq. (3-35) as w 夹 f ( x, y). When the meaning is clear, we let the dependence of
the previous two equations on x and y be implied, and use the simplified notation
w 夽 f and w 夹 f . As with correlation, Eq. (3-35) is evaluated for all values of the
displacement variables x and y so that the center of w visits every pixel in f, which
we assume has been padded. The values of x and y needed to obtain a full convolu-
tion are x = 0, 1, 2, … , M − 1 and y = 0, 1, 2, … , N − 1. The size of the result is M × N .
We can define correlation and convolution so that every element of w (instead of
just its center) visits every pixel in f. This requires that the starting configuration be
such that the right, lower corner of the kernel coincides with the origin of the image.
Similarly, the ending configuration will be with the top left corner of the kernel coin-
ciding with the lower right corner of the image. If the kernel and image are of sizes
m × n and M × N, respectively, the padding would have to increase to (m − 1) pad-
ding elements above and below the image, and (n − 1) elements to the left and right.
Under these conditions, the size of the resulting full correlation or convolution array
will be of size Sv × Sh , where (see Figs. 3.30(e) and (h), and Problem 3.19),
Sv = m + M − 1 (3-36)
and
Sh = n + N − 1 (3-37)
Often, spatial filtering algorithms are based on correlation and thus implement
Eq. (3-34) instead. To use the algorithm for correlation, we input w into it; for con-
volution, we input w rotated by 180°. The opposite is true for an algorithm that
implements Eq. (3-35). Thus, either Eq. (3-34) or Eq. (3-35) can be made to perform
the function of the other by rotating the filter kernel. Keep in mind, however, that
the order of the functions input into a correlation algorithm does make a difference,
because correlation is neither commutative nor associative (see Table 3.5).
www.EBooksWorld.ir
TABLE 3.5
Property Convolution Correlation
Some fundamen-
tal properties of Commutative f 夹g = g夹 f —
convolution and
correlation. A Associative f 夹 ( g 夹 h) = ( f 夹 g ) 夹 h —
dash means that
the property does
Distributive f 夹 ( g + h) = ( f 夹 g ) + ( f 夹 h) f 夽 ( g + h) = ( f 夽 g ) + ( f 夽 h)
not hold.
a b
1 1 1 0.3679 0.6065 0.3679
FIGURE 3.31
Examples of 1 1
smoothing kernels: 1 1 1 0.6065 1.0000 0.6065
9 4.8976
(a) is a box kernel;
(b) is a Gaussian
1 1 1 0.3679 0.6065 0.3679
kernel.
www.EBooksWorld.ir
⎡1⎤ ⎡1 1 1 ⎤
c rT = ⎢ ⎥ [ 1 1 1 ] = ⎢1 1 1 ⎥ = w
⎣1⎦ ⎣ ⎦
A separable kernel of size m × n can be expressed as the outer product of two vec-
tors, v and w:
w = vwT (3-41)
where v and w are vectors of size m × 1 and n × 1, respectively. For a square kernel
of size m × m, we write
w = vvT (3-42)
It turns out that the product of a column vector and a row vector is the same as the
2-D convolution of the vectors (see Problem 3.24).
The importance of separable kernels lies in the computational advantages that
result from the associative property of convolution. If we have a kernel w that can
be decomposed into two simpler kernels, such that w = w1 夹 w2 , then it follows
from the commutative and associative properties in Table 3.5 that
This equation says that convolving a separable kernel with an image is the same as
convolving w1 with f first, and then convolving the result with w2 .
We assume that the For an image of size M × N and a kernel of size m × n, implementation of Eq.
values of M and N (3-35) requires on the order of MNmn multiplications and additions. This is because
include any padding of
f prior to performing it follows directly from that equation that each pixel in the output (filtered) image
convolution. depends on all the coefficients in the filter kernel. But, if the kernel is separable and
we use Eq. (3-43), then the first convolution, w1 夹 f , requires on the order of MNm
www.EBooksWorld.ir