CH 3
CH 3
NW N NE
W * E
SW S SE
b) eight-neighborhood N8
3
1 2 1
2 4 2
1 2 1
b)
1
1
1
1
1
c)
40
40
40
40
40
40
40
40
40
40
80
80
80
80
80
80
80
80
80
80
80
80
80
80
80
1 2 1
2 4 2
1 2 1
b) 3 3 mask
0 0
1 0
1 0
0 0
5
0 1
0 0
a) E: external corners
1 1
1 0
1 1
0 1
1 0
1 1
0 1
1 1
b) I: internal corners
Figure 3.5: The 2 by 2 masks for counting the foreground objects in a binary image. The
1's represent foreground pixels, and the 0's represent background pixels.
is given below. We will use this syntax for all procedures given in the text. Note that
all routines are called procedures, but those that are functions include a return statement
(as in C) to return a value. To keep the procedures short and simple, we will often use
utility procedures within them such as external match and internal match. The code for
very straightforward utility procedures such as these is usually omitted. We also omit type
declarations, which are language-dependent, but we specify the required types in the text
and explain important variables in comments. Finally, we use global constants for various
sizes rather than clouding the procedure calls with extra arguments.
In the object-counting procedure, the constant MaxRow is the row number of the last
row in the image, while MaxCol is the column number of the last column. The rst row
and the rst column are assumed to be row and column zero, the default for C arrays.
f
if external match(L, P) then E := E + 1;
if internal match(L, P) then I := I + 1;
g;
return((E - I) / 4);
g
What is the maxiumum number of times that procedure count objects examines each pixel
of the image? How can procedures external match and internal match be coded to be as
ecient as possible?
Obtain some graph paper to represent a pixel array and blacken some region of connected
squares (keep it small at rst). The blackened squares correspond to the foreground pixels
and the empty squares correspond to the background. Imagine that the pixels are all
city blocks and you are driving around the blackened region in a clockwise direction. Do
your right turns correspond to E corners or I corners? What about left turns? Is there
a relationship between the number of left turns and the number of right turns made in
driving the complete perimeter? If so, what is it? In driving the entire perimeter, did you
ever cross over or touch a previously visited intersection? Is that ever possible? Why or why
not? Before answering, consider the case of only two blackened blocks touching diagonally
across a single shared intersection. Do your left-right counting rules still hold? Does the
object-counting formula still hold?
1
1
1
0
1
0
1
1
0 1 1 1 0
0 1 0 1 0
1 1 0 0 0
0 0 0 0 0
1 1 0 1 0
0 1 0 1 0
0 1 0 0 0
0 1 0 1 1
a) binary image
7
1
1
1
1
1
1
1
1
1 1 0 1 1 1 0 2
1 1 0 1 0 1 0 2
1 1 1 1 0 0 0 2
0 0 0 0 0 0 0 2
3 3 3 3 0 4 0 2
0 0 0 3 0 4 0 2
5 5 0 3 0 0 0 2
5 5 0 3 0 2 2 2
b) connected components labeling
Suppose that B is a binary image and that B(r; c) = B(r0 ; c0) = v where either v = 0 or
v = 1. The pixel (r; c) is connected to the pixel (r0; c0) with respect to value v if there is a
sequence of pixels (r; c) = (r0 ; c0); (r1; c1); : : :; (rn; cn) = (r0; c0) in which B(ri ; ci) = v; i =
0; : : :; n, and (ri ; ci) neighbors (ri 1; ci 1) for each i = 1; : : :; n. The sequence of pixels
(r0; c0 ); : : :; (rn; cn) forms a connected path from (r; c) to (r0 ; c0). A connected component of
value v is a set of pixels C, each having value v, and such that every pair of pixels in the
set are connected with respect to v. Figure 3.6a) shows a binary image with ve such connected components of 1's; these components are actually connected with respect to either
the eight-neighborhood or the four-neighborhood denition.
A connected components labeling of a binary image B is a labeled image
LB in which the value of each pixel is the label of its connected component.
A label is a symbol that uniquely names an entity. While character labels are possible, positive integers are more convenient and are most often used to label the connected
components. Figure 3.6b) shows the connected components labeling of the binary image of
Figure 3.6a).
1 Definition
There are a number of dierent algorithms for the connected components labeling operation. Some algorithms assume that the entire image can t in memory and employ a
simple, recursive algorithm that works on one component at a time, but can move all over
the image while doing so. Other algorithms were designed for larger images that may not t
in memory and work on only two rows of the image at a time. Still other algorithms were designed for massively parallel machines and use a parallel propagation strategy. We will look
at two dierent algorithms in this chapter: the recursive search algorithm and a row-by-row
algorithm that uses a special union-nd data structure to keep track of components.
Suppose that B is a binary image with MaxRow + 1 rows and MaxCol + 1 columns. We
wish to nd the connected components of the 1-pixels and produce a labeled output image
LB in which every pixel is assigned the label of its connected component. The strategy,
adapted from the Tanimoto AI text, is to rst negate the binary image, so that all the 1-pixels
become -1's. This is needed to distinguish unprocessed pixels (-1) from those of component
label 1. We will accomplish this with a function called negate that inputs the binary image B
and outputs the negated image LB, which will become the labeled image. Then the process
of nding the connected components becomes one of nding a pixel whose value is -1 in LB,
assigning it a new label, and calling procedure search to nd its neighbors that have value -1
and recursively repeat the process for these neighbors. The utility function neighbors(L,P)
is given a pixel position dened by L and P. It returns the set of pixel positions of all of its
neighbors, using either the 4-neighborhood or 8-neighborhood denition. Only neighbors
that represent legal positions on the binary image are returned. The neighbors are returned
in scan-line order as shown in Figure 3.7. The recursive connected components labeling
algorithm is a set of six procedures, including negate, print, and neighbors, which are left
for the reader to code.
1
1 2 3
2 * 3
4 * 5
4
6 7 8
a) four-neighborhood
b) eight-neighborhood
The classical algorithm, deemed so because it is based on the classical connected components
algorithm for graphs, was described in Rosenfeld and Pfaltz (1966). The algorithm makes
two passes over the image: one pass to record equivalences and assign temporary labels and
the second to replace each temporary label by the label of its equivalence class. In between
the two passes, the recorded set of equivalences, stored as a binary relation, is processed to
determine the equivalence classes of the relation. Since that time, the union-nd algorithm,
which dynamically constructs the equivalence classes as the equivalences are found, has been
widely used in computer science applications. The union-nd data structure allows ecient
construction and manipulation of equivalence classes represented by tree structures. The
addition of this data structure is a useful improvement to the classical algorithm.
f
if LB[L',P'] == -1
then search(LB, label, L', P');
g
10
Step 1.
-1 -1 0 -1 -1 -1
-1 -1 0 -1 0
-1 -1 -1 -1 0
0
0
Step 2.
1 -1 0 -1 -1 -1
-1 -1 0 -1 0 0
-1 -1 -1 -1 0 0
Step 3.
-1 -1 0 -1 0 0
Step 4.
1 1 0 -1 -1 -1
1 -1 0 -1 0 0
-1 -1 -1 -1 0 0
Step 5.
1
1
0 -1 -1 -1
-1 -1 -1 -1 0
1 0 -1 -1 -1
1 0 -1 0 0
-1 -1 -1 -1 0 0
Figure 3.8: The rst ve steps of the recursive labeling algorithm applied to the rst component of the binary image of Figure 3.6. The image shown is the (partially) labeled image
LB. The boldface pixel of the image is the one being processed by the search procedure.
Using the neighborhood orderings shown in Figure 3.7, the rst unprocessed neighhbor of
the boldface pixel whose value is -1 is selected at each step as the next pixel to be processed.
Union-Find Structure The purpose of the union-nd data structure is to store a collec-
tion of disjoint sets and to eciently implement the operations of union (merging two sets
into one) and nd (determining which set a particular element is in). Each set is stored as
a tree structure in which a node of the tree represents a label and points to its one parent
node. This is accomplished with only a vector array PARENT whose subscripts are the
set of possible labels and whose values are the labels of the parent nodes. A parent value
of zero means that this node is the root of the tree. Figure 3.9 illustrates the tree structure
for two sets of labels f 1,2,3,4,8 g and f 5,6,7 g . Label 3 is the parent node and set label
for the rst set; label 7 is the parent node and set label for the second set. The values in
array PARENT tell us that nodes 3 and 7 have no parents, label 2 is the parent of label
1, label 3 is the parent of labels 2, 4, and 8, and so on. Note that element 0 of the array is
not used, since 0 represents the background label, and a value of 0 in the array means that
a node has no parent.
The nd procedure is given a label X and the parent array PARENT. It merely follows
the parent pointers up the tree to nd the label of the root node of the tree that X is
in. The union procedure is given two labels X and Y and the parent array PARENT. It
modies the structure (if necessary) to merge the set containing X with the set containing
Y . It starts at labels X and Y and follows the parent pointers up the tree until it reaches
11
PARENT
1 2 3 4 5 6 7 8
2 3 0 3 7 7 0 3
3
Figure 3.9: The union-nd data structure for two sets of labels. The rst set contains the
labels f 1,2,3,4,8 g , and the second set contains labels f 5,6,7 g . For each integer label i,
the value of PARENT [i] is the label of the parent of i or zero if i is a root node and has
no parent.
Algorithm 3: Find
12
Algorithm 4: Union
the roots of the two sets. If the roots are not the same, one label is made the parent of the
other. The procedure for union given here arbitrarily makes X the parent of Y . It is also
possible to keep track of the set sizes and to attach the smaller set to the root of the larger
set; this has the eect of keeping the tree depths down.
nd data structure makes the classical connected components labeling algorithm more efcient. The rst pass of the algorithm performs label propagation to propagate a pixel's
label to its neighbors to the right and below it. Whenever a situation arises in which two
dierent labels can propagate to the same pixel, the smaller label propagates and each such
equivalence found is entered in the union-nd structure. At the end of the rst pass, each
equivalence class has been completely determined and has a unique label, which is the root
of its tree in the union-nd structure. A second pass through the image then performs a
translation, assigning to each pixel the label of its equivalence class.
The procedure uses two additional utility functions: prior neighbors and labels. The
prior neighbors function returns the set of neighboring 1-pixels above and to the left of a
given one and can be coded for a 4-neighborhood (in which case the north and west neighbors are returned) or for an 8-neighborhood (in which case the northwest, north, northeast,
and west neighbors are returned). The labels function returns the set of labels currently
assigned to a given set of pixels.
Figure 3.10 illustrates the application of the classical algorithm with union-nd to the
binary image of Figure 3.6. Figure 3.10a) shows the labels for each pixel after the rst
pass. Figure 3.10b) shows the union-nd data structure indicating that the equivalence
classes determined in the rst pass are ff1; 2g; f3; 7g;4; 5; 6g. Figure 3.10c) shows the nal
labeling of the image after the second pass. The connected components represent regions of
the image for which both shape and intensity properties can be computed. We will discuss
13
procedure initialize();
\Initialize global variable label and array PARENT."
f
\Initialize label."
label := 0;
\Initialize the union-nd structure."
for i := 1 to MaxLab
PARENT[i] := 0;
The operations of binary morphology input a binary image B and a structuring element S,
which is another, usually much smaller, binary image. The structuring element represents a
shape; it can be of any size and have arbitrary structure that can be represented by a binary
image. However, there are a number of common structuring elements such as a rectangle of
14
\Initialize structures."
initialize();
\Pass 1 assigns initial labels to each row L of the image."
for L := 0 to MaxRow
A := prior neighbors(L,P);
if isempty(A)
then f M := label; label := label + 1; g;
else M := min(labels(A));
LB[L,P] := M;
for X in labels(A) and X <> M
union(M, X, PARENT);
g
Suppose a binary image has one foreground region, a rectangle of size 1000 by 1000. How
many times does the recursive algorithm look at (read or write) each pixel? How many
times does the classical procedure look at each pixel?
Exercise 4 Relabeling
Because equivalent labels are merged into one equivalence class, some of the initial labels
from Pass 1 are lost in Pass 2, producing a nal labeling whose numeric sequence of labels
often has many gaps. Write a relabeling procedure that converts the labeling to one that
has a contiguous sequence of numbers from 1 to the number of components in the image.
1
1
1
0
4
0
6
6
0
0
1
0
4
0
0
0
2
2
1
0
4
4
4
4
2
0
0
0
0
0
0
0
2
2
0
0
5
5
0
7
15
0
0
0
0
0
0
0
7
3
3
3
3
3
3
3
3
1
1
1
0
4
0
6
6
1
1
1
0
4
0
6
6
a) after Pass 1
0
0
1
0
4
0
0
0
1
1
1
0
4
4
4
4
1
0
0
0
0
0
0
0
1
1
0
0
5
5
0
3
0
0
0
0
0
0
0
3
3
3
3
3
3
3
3
3
c) after Pass 2
PARENT
1 2 3 4 5 6 7
0 1 0 0 0 0 3
Design and implement a row-by-row labeling algorithm that uses the run-length encoding
of a binary image instead of the image itself and uses the LABEL eld of the structure to
store the labels of the runs.
specied dimensions [BOX(l,w)] or a circular region of specied diameter [DISK(d)]. Some
image processing packages oer a library of these primitive structuring elements. Figure
3.12 illustrates some common structuring elements and several nonstandard ones.
The purpose of the structuring elements is to act as probes of the binary image. One
pixel of the structuring element is denoted as its origin; this is often the central pixel of a
symmetric structuring element, but may in principle be any chosen pixel. Using the origin
as a reference point, translations of the structuring element can be placed anywhere on the
image and can be used to either enlarge a region by that shape or to check whether or not
the shape ts inside a region. For example, we might want to check the size of holes by
seeing if a smaller disk ts entirely within a region, while a larger disk does not.
The basic operations of binary morphology are dilation, erosion, closing, and opening. As
the names indicate, a dilation operation enlarges a region, while an erosion makes it smaller.
A closing operation can close up internal holes in a region and eliminate \bays" along the
boundary. An opening operation can get rid of small portions of the region that jut out
from the boundary into the background region. The mathematical denitions are as follows:
16
0
1
2
3
4
1
2
3
4
5
6
7
0 1 2 3 4
1
1
1
0
0
1 0
1 0
1 1
0 0
1 1
(a)
1
0
0
0
1
1
1
1
0
1
0
1
2
3
4
Figure 3.11: Binary image (a) and its run-length encoding (b) and (c). Each run of 1's is
encoded by its row (ROW) and the columns of its starting and ending points (START COL
and END COL). In addition, for each row of the image, ROW START points to the rst
run of the row and ROW END points to the last run of the row. The LABEL eld will hold
the component label of the run; it is initialized to zero.
2 Definition
Xt = f x + t j x 2 X g
(3.1)
Thus the translation of a set of 1's in a binary image moves the entire set of ones by the
specied amount. The translation t would be specied as an ordered pair (r; c) where r
is the amount to move in rows and c is the amount to move in columns.
The dilation of binary image B by structuring element S is denoted by B S
and is dened by
[
3 Definition
B S =
b2B
Sb
(3.2)
17
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1
a) BOX(3,5)
b) DISK(5)
1
1
1
1
1
1
1 1 1
1 1 1
d)
1
1
1
1 1 1
1 1 1
1
1
1
c) RING(5)
1 1 1 1 1 1
1
1 1
1
1
1 1
1
1
1 1
1
1
1
1
1
e)
f)
4 Definition
B S = f b j b + s 2 B 8s 2 S g
(3.3)
The erosion operation also sweeps the structuring element over the entire image. At each
position where every 1-pixel of the structuring element covers a 1-pixel of the binary image,
the binary image pixel corresponding to the origin of the structuring element is ORed to
the output image. Figure 3.13d illustrates an erosion of the binary image of Figure 3.13a
by the 3 by 3 rectangular structuring element.
Dilation and erosion are the most primitive operations of mathematical morphology.
There are two more common operations that are composed of these two: closing and opening.
The closing of binary image B by structuring element S is denoted by B S
and is dened by
5 Definition
B S = (B S) S
(3.4)
18
1 1 1 1
1
1
1 1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1
1 1 1
1 1 1
a) Binary image B
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
b) Structuring Element S
1
1
1
1
1
1
1
c) Dilation B S
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
e) Closing B S
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1 1
1 1
d) Erosion B S
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
f) Opening B S
Figure 3.13: The basic operations of binary morphology. Foreground pixels are shown as
1's. Background pixels, whose value is 0, are shown as blanks.
19
6 Definition
B S = (B S) S
(3.5)
Figure 3.13e illustrates the closing of the binary image of Figure 3.13a by the 3 by 3
rectangular structuring element; Figure 3.13f illustrates the opening of the binary image by
the same structuring element.
A camera takes an image I of a penny, a dime, and a quarter lying on a white background
and not touching one another. Thresholding is used successfully to create a binary image
B with 1 bits for the coin regions and 0 bits for the background. You are given the known
diameters of the coins DP , DD , and DQ . Using the operations of mathematical morphology
(dilation, erosion, opening, closing) and the logical operators AND, OR, NOT, and MINUS
(set dierence), show how to produce three binary output images: P, D, and Q. P should
contain just the penny (as 1 bits), D should contain just the dime, and Q should contain
just the quarter.
Closings and openings are useful in imaging applications where thresholding, or some other
initial process, produces a binary image with tiny holes in the connected components or
with a pair of components that should be separate joined by a thin region of foreground
pixels. Figure 3.14a is a 512 512 16-bit gray-scale medical image, Figure 3.14b is the
result of thresholding to select pixels with gray tones above 1070, and Figure 3.14c is the
result of performing an opening operation to separate the organs and a closing to get rid of
small holes. The structuring element used in the opening was DISK(13), and the structuring
element used in the closing was DISK(2).
Binary morphology can also be used to perform very specic inspection tasks in industrial
machine vision. Sternberg (1985) showed how a watch gear could be inspected to check
whether it had any missing or broken teeth. Figure 3.15a shows a binary image of a watch
gear. The watch gear has four holes inside of the main object and is surrounded by a number
of teeth, which are individually visible in the image. In order to process the watch gear
images, Sternberg dened several special purpose structuring elements whose shapes and
sizes were derived from the physical properties of the watch gear. The following structuring
elements are used in the watch-gear inspection algorithm:
hole ring: a ring of pixels whose diameter is slightly larger than the diameters of the
four holes in the watch gears. It ts just around these holes and can be used to mark
a few pixels at their centers.
hole mask: an octagon that is slightly larger than the holes in the watch gears.
gear body: a disk structuring element that is as big as the gear minus its teeth.
sampling ring spacer: a disk structuring element that is used to move slightly
outward from the gear body.
20
a) Medical image G
b) Thresholded image B
21
sampling ring width: a disk structuring element that is used to dilate outward to
Sternberg used a ring structuring element to detect the centers of the holes in the gear-tooth
inspection task. If your system only supports disk and box structuring elements, what can
you do to detect the centers of the holes?
Suppose a satellite image of a region can be thresholded so that the water pixels are 1's.
However, bridges across reivers produce thin lines of 0's cutting across the river regions. a)
Describe how to restore the bridge pixels to the water region. b) Describe how to detect
the thin bridges as separate objects.
Binary morphology can also be used to extract primitive features of an object that can
be used to recognize the object. For instance, the corners of
at two-dimensional objects
can be good primitives in shape recognition. If an object with sharp corners is opened
22
a) original image B
c) B2 = B1 hole mask
b) B1 = B hole ring
d) B3 = B OR B2
e) B7 (see text)
f) B8 = B AND B7
g) B9 = B8 tip spacing
Figure 3.15: The gear-tooth inspection procedure (courtesy of Stanley R. Sternberg with
permission of Academic Press).
a) original
23
b) opening
c) corners
One use of binary morphology is to identify certain components of a binary image that
satisfy certain shape and size constraints. It is often possible to derive a structuring element that when applied to a binary image removes the components that do not satisfy the
constraints and leaves a few 1-pixels of those components that do satisfy the constraints.
But we want the entire components, not just what remains of them after the erosion. The
conditional dilation operation was dened to solve this problem.
Given an original binary image B , a processed binary image C , and a structuring element S , let C0 = C and Cn = (Cn 1 S) \ B . The conditional dilation of C
by S with respect to B is dened by
7 Definition
C jB S = Cm
(3.6)
where the index m is the smallest index satisfying Cm = Cm 1 .
This denition is intended for discrete sets of points arising from nite digital images. It
says that the set C = C0 is repeatedly dilated by structuring element S, and each time the
result is reduced to only the subset of pixels that were 1's in the original binary image B.
Figure 3.17 illustrates the operation of conditional dilation. In the gure, the binary image
B was eroded by structuring element V to select components in which 3-pixel long vertical
edges could be found. Two of the components were selected, as shown in the result image
C. In order to see these entire components, C is conditionally dilated by D with respect to
the original image B to produce the results.
24
1
1 1
1 1 1 1
1
1
1
1
1 1
a) Binary image B
1
1
d) C = B V
1 1 1
1 1 1
1 1 1
1
1
b) Structuring
element V
c) Structuring
element D
1
1 1
1 1 1 1
1
1
1
1
1 1
e) C jB D
25
area:
A=
X
(r;c)2R
(3.7)
which means that the area is just a count of the pixels in the region R.
centroid:
r = A1
c = A1
(r;c)2R
P
(r;c)2R
r
c
(3.8)
(3.9)
The centroid (r; c) is thus the \average" location of the pixels in the set R. Note that even
though each (r; c) 2 R is a pair of integers, (r; c) is generally not a pair of integers; often a
precision of tenths of a pixel is justiable for the centroid.
The gear-tooth example was designed to use only morphological and logical operations that
could be rapidly executed on a specially-designed machine. Given that we are looking for
larger-than-normal gaps between the teeth, how could the detection be performed in a way
that minimizes the morphological operations for general purpose machines on which they
do not run rapidly?
The length of the perimeter P of a region is another global property. A simple denition
of the perimeter of a region without holes is the set of its interior border pixels. A pixel of
a region is a border pixel if it has some neighboring pixel that is outside the region. When
8-connectivity is used to determine whether a pixel inside the region is connected to a pixel
outside the region, the resulting set of perimeter pixels is 4-connected. When 4-connectivity
is used to determine whether a pixel inside the region is connected to a pixel outside the
region, the resulting set of perimeter pixels is 8-connected. This motivates the following
denition for the 4-connected perimeter P4 and the 8-connected perimeter P8 of a region R:
perimeter:
P4 = f(r; c) 2 RjN8(r; c) R 6= ;g
P8 = f(r; c) 2 RjN4(r; c) R =
6 ;g
Describe an algorithm to generate a binary image of a region without holes, given only its
perimeter.
To compute length jP j of perimeter P; the pixels in P must be ordered in a sequence
P =< (ro ; co ); : : :; (rK 1; cK 1) >; each pair of successive pixels in the sequence being
neighbors, including the rst and last pixels. Then the perimeter length jP j is dened by
26
Design an algorithm to compute the area of a region without holes, given only its perimeter.
Is it possible to perform the task without regenerating the binary image?
perimeter length:
jP j = jf
pkj(rk+1; ck+1) 2 N4 (rk ; ck )gj
+ 2jfkj(rk+1; ck+1) 2 N8 (rk ; ck ) N4 (rk ; ck )gj
(3.10)
where k +1 is computed modulo K, the length of the pixel sequence. Thus two vertically or
horizontally adjacent pixels in the perimeter cause value 1 to be added to the total, while
two diagonally adjacent pixels cause about 1.4 to be added.
With the area A and perimeter P dened, a common measure of the circularity of the
region is the length of the perimeter squared divided by the area.
circularity(1):
2
C1 = jPAj
(3.11)
However, for digital shapes, jP j2=A assumes its smallest value not for digital circles, as it
would for continuous planar shapes, but for digital octagons or diamonds depending on
whether the perimeter is computed as the number of its 4-neighboring border
p pixels or as
the length of the border, counting 1 for vertical or horizontal moves and 2 for diagonal
moves. To solve this problem, Haralick (1974) proposed a second circularity measure
circularity(2):
C2 = R
(3.12)
KX1
k(rk ; ck) (r; c)k
R = K1
(3.13)
KX1
R = ( K1
[k(rk ; ck ) (r; c)k R ]2)1=2
(3.14)
where R and R are the mean and standard deviation of the distance from the centroid
of the shape to the shape boundary and can be computed according to the following formulas.
k=0
where the set of pixels (rk ; ck ); k = 0; : : :; K 1 lie on the perimeter P of the region. The
circularity measure C2 increases monotonically as the digital shape becomes more circular
and is similar for digital and continuous shapes.
Figure 3.18 illustrates some of these basic properties on a simple labeled image having
three regions: an ellipse, a rectangle, and a 3 3 square.
0
0
0
0
2
2
2
2
2
2
2
2
2
2
2
2
0
0
0
0
2
2
2
2
2
2
2
2
2
2
2
2
0
0
0
0
2
2
2
2
2
2
2
2
2
2
2
2
0
0
0
0
2
2
2
2
2
2
2
2
2
2
2
2
27
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
3
3
0
0
0
0
0
0
0
0
0
0
0
0
0
3
3
3
0
0
0
0
0
0
1
1
1
0
0
0
0
3
3
3
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
44
48
9
6
9
13
11.5
21.2
10.2
15.4
1.5
28
16.3
2.5
7
8
7.1
5.8
properties of the three regions
3.33
3.80
1.2
.05
2.28
0.04
Suppose you have a collection of two-dimensional shapes. Some of them are triangles,
some are rectangles, some are octagons, some are circles, and some are ellipses or ovals.
Devise a recognition strategy for these shapes. You may use the operations of mathematical
morphology and/or the properties dened so far.
28
Topmost right
Leftmost top
Leftmost bottom
Rightmost top
Rightmost bottom
Bottommost left
Bottommost right
Figure 3.19: The eight extremal points of a region and the normally oriented bounding box
that encloses the region. The dotted lines pair together opposite extremal points and form
the extremal point axes of the shape.
Extremal points occur in opposite pairs: topmost left with bottommost right; topmost
right with bottommost left; rightmost top with leftmost bottom; and rightmost bottom with
leftmost top. Each pair of opposite extremal points denes an axis. Useful properties of
the axis include its axis length and orientation. Because the extremal points come from a
spatial digitization or quantization, the standard Euclidean distance formula will provide
distances that are biased slightly low. (Consider, for example, the length covered by two
pixels horizontally adjacent. From the left edge of the left pixel to the right edge of the right
pixel is a length of 2 but the distance between the pixel centers is only 1.) The appropriate
calculation for distance adds a small increment to the Euclidean distance to account for
this. The increment depends on the orientation angle of the axis and is given by
(
1
: jj < 45
Q() = j cos1 j : jj > 45
(3.15)
j sin j
With this increment, the length of the extremal axis from extremal point (r1; c1) to extremal
point (r2 ; c2) is
(3.16)
29
Spatial moments are often used to describe the shape of a region. There are three second
order spatial moments of a region. They are denoted by rr , rc , and cc and are dened
as follows:
rr = A1
X
(r;c)2R
(r r)2
(3.17)
rc = A1
X
(r;c)2R
(r r)(c c)
(3.18)
cc = A1
X
(r;c)2R
(c c)2
(3.19)
Thus rr measures row variation from the row mean, cc measures column variation from
the column mean, and rc measures row and column variation from the centroid. These
quantities are often used as simple shape descriptors, as they are invariant to translation
and scale change of a 2D shape.
The second spatial moments have value and meaning for a region of any shape, the same
way that the covariance matrix has value and meaning for any two-dimensional probability
distribution. If the region is an ellipse, there is an algebraic meaning that can be given to
the second spatial moments.
If a region R is an ellipse whose center is the origin, then R can be expressed as
R = f(r; c) j dr2 + 2erc + fc2 1g
(3.20)
A relationship exists between the coecients d, e, and f of the equation of the ellipse and
the second moments rr , rc , and cc. It is given by
1
cc rc
d e =
(3.21)
e f
4(rr cc 2rc ) rc rr
Since the coecients d, e, and f determine the lengths of the major and minor axes and the
orientation of the ellipse, this relationship means that the second moments rr , rc , and cc
also determine the lengths of the major and minor axes and the orientation of the ellipse.
Ellipses are frequently the result of imaging circular objects. Ellipses also provide a rough
approximation to other elongated objects.
30
1=2
1=21=2
The minor axis is oriented at an angle 90 counterclockwise from the major axis and
has a length of
8 rr + cc
i1=21=2
i1=21=2
i1=21=2
The minor axis is oriented at an angle of 90 counterclockwise from the major axis
and has a length of
8 rr + cc
i1=21=2
* best axis:
Some image regions (objects) have a natural axis; for example, a pencil or hammer, or
the characters 'I', '/' and '-'. A best axis for an object can be computed as that axis about
which the region pixels have least second moment. Using an analogy from mechanics, this
31
(r,c)
d
V
= + 90
(r,c)
Figure 3.20: Moment about an axis is computed by summing the squared distance of each
pixel from the axis.
is an axis of least inertia { an axis about which we could spin the pixels with least energy
input. Note that for a circular disk, all axes have equal minimum (and maximum) inertia.
It is known that an axis of least inertia must pass through the centroid (r; c) of our set of
pixels (unit masses), and we will assume this here. First, we compute the second moment of
a point set about an arbitrary axis; then we'll nd the axis of least second moment. A set
of moments about a selected set of axes might provide a good set of features for recognizing
objects, as we shall see in the next chapter. For example, the second moment of character
'I' about a vertical axis through its centroid is very small, whereas that of the character '/'
or '-' is not small.
Figure 3.20 shows a set of pixels and an axis making angle with the row axis. The
angle = + 90 is the angle that a perpendicular to the axis makes with the row axis.
To compute the second moment of the point set about the axis, we need to sum the squares
of the distances d for all pixels: we normalize by the number of pixels to obtain a feature
that does not change signicantly with the number of pixels making up the shape. Note
that, since we are summing d2, the angles and can be changed += with no change
to the second moment. Equation 3.22 gives the formula for computing the second moment:
is the vector scalar product that is used to project the vector V onto the unit vector in
direction , giving length d. Any axis can be specied by the three parameters r; c and .
32
1 X d2
A (r;c)2R
(3.22)
where = + =2.
Write a program module, or C++ class, that manages a bag of 2D points and provides the
following functionality. A bag is dierent from a set in that duplicate points are allowed.
After creating the feature extraction module of the previous exercise, enhance it to compute
the second moments about horizontal, vertical and diagonal axes through the centroid of
points. Thus, ve dierent second moments will be available for any bag of points. Create
a set of 20x20 binary images of digits from '0' to '9' for test data, or access some existing
data. Write a program that scans an image of a digit and computes the ve moments. Study
whether or not the ve moments have potential for recognizing the input digit.
The above formula can be used to compute several moments to capture some information about the shape of the point set; for example, moments about the vertical, horizontal,
and diagonal axes are useful for classifying alphabetic characters in standard orientation.
The least (and most) inertia is an invariant property of the point set and translates and
rotates with the point set. The axis of least inertia can be obtained by minimizing r;c;.
Assuming now that the best axis must pass through the centroid, we need only dierentiate
the formula with respect to to determine the best ^ .
33
2 (r rP
)(c c)
(r r)(r r)
(c c)(c c)
P
1 2
(r r)(c c)
= 1P A
1 P
A (r r)(r r) A (c c)(c c)
(3.23)
= 2 rc
rr
cc
There are two extreme values for , a minimum and a maximum, which are 90 degrees
apart. We have already seen the method to distinguish the two in the above discussion
about the major and minor axes of an ellipse. In fact, the above formula allows us to
compute an ellipse that approximates the point set in the sense of these moments. Note
that highly symmetrical objects, such as squares and circles, will cause a zero-divide in the
above formula; hence the case analysis used with the elliptical data must also be done here.
tan 2^ =
Dierentiate the formula in Equation 3.22 and show how the best (and worst) axes are
obtained in Equation 3.23.
Exercise 16 Verify that the best axis passes through the centroid
Verify that the axis of least inertia must pass though the centroid. Consult the references
at the chapter's end or other references on statistical regression or mechanics; or, prove it
yourself.
34
can be used for keeping track of adjacencies between foreground and background regions in
the binary case and for keeping track of all adjacencies in the general image segmentation
case.
A region adjacency graph (RAG) is a graph in which each node represents a region of the image, and an edge connects two nodes if the two regions are adjacent.
8 Definition
Figure 3.21 gives an example of a region adjacency graph for a binary image of foreground and background regions. The foreground regions have been labeled as usual with
positive integers. The background regions have been labeled with zero for the large region
that starts at the upper left pixel of the image and with negative integers for the hole regions.
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
3
3
3
3
0
0
1
-1
1
0
3
-2
-2
3
0
0
1
-1
1
0
3
3
3
3
0
0
1
-1
1
0
0
0
0
0
0
0
1
1
1
0
2
2
2
2
0
0
0
0
0
0
2
-3
-3
2
0
0
2
2
2
2
2
-3
-3
2
0
0
2
2
2
2
2
2
2
2
0
0
0
0
0
0
0
0
0
0
0
-1
-2
-3
35
adjacencies, and if 8-adjacency is specied, diagonal adjacencies between points with dierent labels. As new adjacencies are detected, new edges are added to the region adjacency
graph data structure being constructed. There are two issues related to the eciency of
this algorithm. The rst is with respect to space. It is possible for an image to have tens of
thousands of labels. In this case, it may not be feasible, or at least not suitable in a paging
environment, to keep the entire structure in internal memory at once. The second issue relates to execution time. When moving along an image, point by point, the same adjacency
(ie. the same two region labels) will be detected over and over again. It is desirable to enter
the adjacency into the data structure as infrequently as possible. These issues are addressed
in the Exercise.
Design a data structure for keeping track of adjacencies while constructing a region adjacency
graph. Give algorithms that construct the graph from an arbitrary labeled image and that
attempt to minimize references to the data structure. Discuss how you would store the nal
RAG in permanent storage (on disk) and how you would handle the case where the RAG
is too large to keep in internal memory during its construction.
Thresholds can be selected interactively by a user of an interactive package, but for image
analysis processes that must run automatically, we would like to be able to compute the
thresholds automatically. The basis for choosing a threshold is the histogram of the gray-tone
image.
9 Definition
36
Figure 3.22: Histogram of the image of a bruised cherry displaying two modes, one representing the bruised portion and the other the nonbruised portion.
Figure 3.22 shows the image of a bruised cherry and its histogram. The histogram has two
distinct modes representing the bruised portion and nonbruised portion of the cherry.
A histogram can be computed by using an array data structure and a very simple procedure. Let H be a vector array dimensioned from 0 to MaxVal, where 0 is the value of the
smallest possible gray-level value and MaxVal is the value of the largest. Let I be the twodimensional image array with row values from 0 to MaxRow and column values from 0 to
MaxCol as in the previous sections. The histogram procedure is given by the following code.
Compute the histogram H of gray-tone image I.
procedure histogram(I,H);
f
\Initialize the bins of the histogram to zero."
for i := 0 to MaxVal
H[i] := 0;
\Compute values by accumulation."
for L := 0 to MaxRow
for P := 0 to MaxCol
grayval := I[r,c];
H[grayval] := H[grayval] + 1;
g;
37
into a single bin, usually for purposes of displaying the histogram when there are many
possible gray tones. In this case the procedures can easily be modied to calculate the bin
number as a function of the gray-tone. If binsize is the number of gray tones per bin, then
grayval=binsize truncated to its integer value gives the correct bin subscript.
Given the histogram, automatic procedures can be written to detect peaks and valleys
of the histogram function. The simplest case is when we are looking for a single threshold
that separates the image into dark pixels and light pixels. If the distributions of dark pixels
and bright pixels are widely separated, then the image histogram will be bimodal, one mode
corresponding to the dark pixels and one mode corresponding to the bright pixels. With
little distribution overlap, the threshold value can easily be chosen as any value in the valley
between the two dominant histogram modes as shown in Figure 3.23a. However, as the
distributions for the bright and dark pixels become more and more overlapped, the choice
of threshold value becomes more dicult, because the valley begins to disappear as the two
distributions begin to merge together as shown in Figure 3.23b.
b) Overlapped modes
Figure 3.23: Two image histograms. The histogram on the left has two easily-separable
modes; the one on the right has overlapped modes that make it more dicult to nd a
suitable threshold.
Several dierent methods have been proposed for automatic threshold determination. We
discuss here the Otsu method, which selects the threshold based on the minimization of
the within-group variance of the two groups of pixels separated by the thresholding operator. For this discussion, we will specify the histogram function as a probability function
P where P(0); :::; P(I) represent the histogram probabilities of the observed gray values
0; :::; I; P(i) = jf(r; c) j Image(r; c) = igj=jR C j; where R C is the spatial domain of the
image. If the histogram is bimodal, the histogram thresholding problem is to determine a
best threshold t separating the two modes of the histogram from each other. Each threshold
38
t determines a variance for the group of values that are less than or equal to t and a variance
for the group of values greater than t. The denition for best threshold suggested by Otsu
is that threshold for which the weighted sum of within-group variances is minimized. The
weights are the probabilities of the respective groups.
We motivate the within-group variance criterion by considering the situation that sometimes happens at a ski school. A preliminary test of capabilities is given and the histogram
of the resulting scores is bimodal. There are advanced skiers and novices. Lessons that
are aimed at the advanced skiers go too fast for the others, and lessons that are aimed at
the level of the novices are boring to the advanced skiers. To x this situation, the teacher
decides to divide the class into two mutually exclusive and homogeneous groups based on
the test score. The question is to determine which test score to use as the dividing criterion.
Ideally, each group should have test scores that have a unimodal bell-shaped histogram, one
around a lower mean and one around a higher mean. This would indicate that each group
is homogeneous within itself and dierent from the other.
A measure of group homogeneity is variance. A group with high homogeneity will have
low variance. A group with low homogeneity will have high variance. One possible way to
choose the dividing criterion is to choose a dividing score such that the resulting weighted
sum of the within-group variances is minimized. This criterion emphasizes high group homogeneity. A second way to choose the dividing criterion is to choose a dividing score that
maximizes the resulting squared dierence between the group means. This dierence is related to the between-group variance. Both dividing criteria lead to the same dividing score
because the sum of the within-group variances and the between-group variances is a constant.
Let W2 be the weighted sum of group variances, that is, the within-group variance. Let
(t) be the variance for the group with values less than or equal to t and 22 (t) be the
variance for the group with values greater than t. Let q1(t) be the probability for the group
with values less than or equal to t and q2(t) be the probability for the group with values
greater than t. Let 1(t) be the mean for the rst group and 2 (t) the mean for the second
group. Then the within-group variance W2 is dened by
2 (t) = q (t) 2(t) + q (t) 2 (t)
W
(3.24)
1
2
1
2
where
2
1
q1(t) =
q2(t) =
1 (t) =
2 (t) =
12 (t) =
t
X
P(i)
i=1
I
X
i=t+1
t
X
P(i)
(3.25)
i P(i)=q1(t)
i=1
I
X
i=t+1
t
X
i=1
i P(i)=q2(t)
[i 1 (t)]2 P(i)=q1(t)
(3.26)
39
22 (t) =
I
X
i=t+1
[i 2(t)]2 P(i)=q2(t)
(3.27)
The best threshold t can then be determined by a simple sequential search through all
possible values of t to locate the threshold t that minimizes W2 (t). In many situations this
can be reduced to a search between the two modes. However, identication of the modes is
really equivalent to the identication of separating values between the modes.
There is a relationship between the within-group variance W2 (t) and the total variance
that does not depend on the threshold. The total variance is dened by
2
2 =
I
X
i=1
where
=
(i )2 P(i)
I
X
i=1
i P(i)
The relationship between the total variance and the within-group variance can make the
calculation of the best threshold less computationally complex. By rewriting 2 , we have
2
=
=
t
X
i=1
t
X
i=1
I
X
i=t+1
I
X
i=t+1
But
t
X
i=1
I
X
[i 2 (t)][2(t) )]P(i) = 0
i=t+1
Since
q1(t)
2
t
X
I
X
i=1
t
X
i=t+1
i=1
P(i)
40
I
X
i=t+1
(3.28)
The rst bracketed term is the within-group variance W2 . It is just the sum of the weighted
variances of each of the two groups. The second bracketed term is called the between-group
variance B2 . It is just the sum of the weighted squared distances between the means of each
group and the grand mean. The between-group variance can be further simplied. Note
that the grand mean can be written as
= q1(t) 1 (t) + q2(t) 2(t)
(3.29)
Using Eq. (3.29) to eliminate in Eq. (3.28), substituting 1 q1 (t) for q2(t); and simplifying,
we obtain
2 = W2 (t) + q1(t)[1 q1(t)] [1 (t) 2(t)]2
Since the total variance 2 does not depend on t, the t minimizing W2 (t) will be the t
maximizing the between group variance B2 (t),
B2 (t) = q1(t) [1 q1(t)] [1 (t) 2 (t)]2
(3.30)
To determine the maximizing t for B2 (t); the quantities determined by Eqs. (3.25) to (3.27)
all have to be determined. However, this need not be done independently for each t. There
is a relationship between the value computed for t and that computed for the next t : t + 1.
We have directly from Eq. (3.25) the recursive relationship
q1(t + 1) = q1 (t) + P(t + 1)
(3.31)
with initial value q1(1) = P (1).
From Eq. (3.26) we obtain the recursive relation
1 (t + 1) = q1(t) 1 (t)q +(t(t++1)1)P(t + 1)
(3.32)
1
with the initial value 1 (0) = 0: Finally, from Eq. (3.29) we have
1 (t + 1)
2 (t + 1) = q11(t +q 1)
(3.33)
1(t + 1)
Automatic threshold-nding algorithms only work well when the images to be thresholded satisfy their assumptions about the distribution of the gray-tone values over the image.
The Otsu automatic threshold nder assumes a bimodal distribution of gray-tone values. If
the image approximately ts this constraint, it will do a good job. If the image is not at
all bimodal, the results are not likely to be useful. Figure 3.24 illustrates the application
of the Otsu operator to the gray-tone image of some toy blocks shown in a). The operator
returned a threshold of 93 from the possible range of 0 to 255. The pixels below and above
a) original image
41
b) pixels below 93
c) pixels above 93
Figure 3.24: A gray-tone image and the pixels below and above the threshold of 93 (shown
in white) found by the Otsu automatic thresholding operator.
the threshold are shown in b) and c), respectively. Only the very dark regions of the image
have been isolated.
If the gray-tone values of an image are strongly dependent on the location within the
image, for example lighter in the upper left corner and darker in the lower right, then it
may be more appropriate to use local instead of global thresholds. This idea is sometimes
called dynamic thresholding. In some applications, the approximate shapes and sizes of the
objects to be found are known in advance. In this case a technique called knowledge-based
thresholding, which evaluates the resultant regions and chooses the threshold that provides
the best results, can be employed. Finally, some images are just not thresholdable, and
alternate techniques must be used to nd the objects in them.
Write a program to implement the Otsu automatic threshold nder. Try the program on
several dierent types of scanned images.
3.9 References
There are a number of dierent algorithms for the connected components labeling operation,
each designed to address a certain task. Tanimoto (1990) assumes that the entire image can
t in memory and employs a simple, recursive algorithm that works on one component at
a time, but can move all over the image. Other algorithms were designed for larger images
that may not t in memory and work on only two rows of the image at a time. Rosenfeld
and Pfalz (1966) developed the two-pass algorithm that uses a global equivalence table and
is sometimes called the `classical' connected components algorithm. Lumia, Shapiro, and
Zuniga (1983) developed another two-pass algorithm that uses a local equivalence table to
avoid paging problems. Danielsson and Tanimoto (1983) designed an algorithm for massively parallel machines that uses a parallel propagation strategy. Any algorithms that keep
track of equivalences can use the union-nd data structure (Tarjan, 1975) to eciently perform set-union operations.
Serra (1982) produced the rst systematic theoretical treatment of mathematical mor-
42
phology. Sternberg (1985) designed a parallel pipeline architecture for rapidly performing
the operations and applied it to problems in medical imaging and industrial machine vision.
He also extended the binary morphology operations to gray-scale morphology (1986), which
has become a standard image ltering operation. Haralick, Sternberg, and Zhuang (1987)
published a tutorial paper on both binary and gray-scale morphology that has helped to
show their value to the computer vision community. Shapiro, MacDonald, and Sternberg
(1987) showed that morphological feature detection can be used for object recognition.
Automatic thresholding has been addressed in a number of papers. The method described in this text is due to Otsu (1979). Other methods have been proposed by Kittler
and Illingworth (1986) and by Cho, Haralick, and Yi (1989). Sahoo et al. (1988) give a
general survey of thresholding techniques.
1. S. L. Tanimoto, The Elements of Articial Intelligence Using Common LISP, W. H.
Freeman and Company, New York, 1990.
2. A. Rosenfeld and J. L. Pfaltz, \Sequential Operations in Digital Picture Processing,"
Journal of the Association for Computing Machinery, Vol. 13, 1966, pp. 471{494.
3. R. Lumia, .G. Shapiro, and O. Zuniga, \A New Connected Components Algorithm
for Virtual Memory Computers," Computer Vision, Graphics, and Image Processing,
Vol. 22, 1983, pp. 287{300.
4. P.-E. Danielsson and S.L. Tanimoto, \Time Complexity for Serial and Parallel Propagation in Images," in Architecture and Algorithms for Digital Image Processing, A.
Oosterlinck and P.-E. Danielsson (eds.), Proceedings of the SPIE, Vol. 435, 1983, pp.
60-67.
5. R. E. Tarjan, \Eciency of a Good but not Linear Set Union Algorithm," Journal of
the Association for Computing Machinery, Vol. 22, 1975, pp. 215-225.
6. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, New York,
1982.
7. S. R. Sternberg, \An Overview of Image Algebra and Related Architectures," Integrated Technology for Parallel Image Processing, Academic Press, London, 1985, pp.
79-100.
8. S. R. Sternberg, \Grayscale Morphology," Computer Vision, Graphics, and Image
Processing, Vol. 35, 1986, pp. 333-355.
9. R. M. Haralick, S. R. Sternberg, and X. Zhuang, \Image Analysis Using Mathematical
Morphology," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.
PMI-9, 1987, pp. 523-550.
10. L. G. Shapiro, R. S. MacDonald, and S. R. Sternberg, \Ordered Structural Shape
Matching with Primitive Extraction by Mathematical Morphology," Pattern Recognition, Vol 20, No. 1, 1987, pp. 75-90.
11. R. M. Haralick, \A Measure of Circularity of Digital Figures," IEEE Transactions on
Systems, Man, and Cybernetics, Vol. SMC-4, 1974, pp. 394-396.
43
12. N. Otsu, \A Threshold Selection Method from Gray-Level Histograms," IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-9, 1979, pp. 62{66.
13. J. Kittler and J. Illingworth, \Minimum Error Thresholding," Pattern Recognition,
Vol. 19, 1986, pp. 41{47.
14. S. Cho, R.M. Haralick, and S. Yi, \Improvement of Kittler and Illingworth's Minimum
Error Thresholding," Pattern Recognition, Vol. 22, 1989, pp. 609{617.
15. P. K. Sahoo, et al., \A Survey of Thresholding Techniques," Computer Vision, Graphics, and Image Processing, Vol. 41, 1988, pp. 233{260.