0% found this document useful (0 votes)
19 views20 pages

Chapter 09 Solutions-Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views20 pages

Chapter 09 Solutions-Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

9.

1 Find the length of the shortest path from (i) (1, 1) to (5, 3) (ii) (1, 6) to
(3, 1) using (a) 4-connectivity and (b) 8-connectivity.

(i) (1, 1) to (5, 3)


(a) 6
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

(b) 4
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

(ii) (1, 6) to (3, 1)


(a) 7
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

(b) 5
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6

9.2 What is the difference between the result of opening performed once
and twice? What is idempotency?

Opening is erosion followed by dilation. The erosion operation shrinks the


foreground, eliminating thin protrusions and breaking narrow isthmuses to
produce a smoother boundary: the dilation operation expands the resulting
foreground to produce a similar sized foreground to the original. If we were
to perform a second opening there would be no thin protrusions to
eliminate; they were removed in the first pass. Thus the second opening
produces no changes.

Idempotency is a property of an operation that when applied twice or more


gives the same result as if it were applied once (e.g. opening and closing).

9.3 Sketch the structuring elements required for the hit-or-miss


transform to locate (i) isolated points in an image (ii) end-points in a
binary skeleton and (iii) junction points in a binary skeleton. Several
structuring elements may be needed in some cases to locate all
possible orientations.
(i)
0 0 0
0 1 0
0 0 0

(ii)
0 0 0
0 1 0
0 0 1

and the other 7 SEs produced by rotating this by 450 intervals.

(iii) There are two different types of junctions that need to be considered –

T-junctions
(which require

0 1 0
0 1 1
0 1 0

and the 7 SEs formed by rotating this by 450 intervals)

and Y-junctions
(which require

0 1 0
1 1 0
0 0 1

and the other 7 SEs produced by rotating this by 450 intervals).

Thus 16 SEs in total.

9.4 How can the hit-or-miss transform be used to perform erosion? How
can the hit-and-miss transform, together with the NOT (or inverse)
operation, be used to perform dilation?

The hit-or-miss transform can be applied to all the pixels in an image to


determine whether each pixel has at least one background neighbor; if it
does that pixel becomes a background pixel. This would implement
erosion.
The image could be inverted (using NOT) and the same transform applied;
the result should then be inverted to implement dilation of the original
image.

9.5 If an edge detector has produced long lines in its output that are
approximately x pixels thick, what is the longest length spurious
spur (prune) that you could expect to see after thinning to a single-
pixel thickness? Test your estimate on some real images. Hence,
approximately how many iterations of pruning should be applied to
remove spurious spurs from lines that were thinned down from a
thickness of x pixels?

If the output lines are x pixels thick and then they are skeletonized to a
thickness of one pixel, the thickness of the “lost” pixels would be x – 1.
So, any spur perpendicular to the skeleton could be no longer than (x –
1)/2; if the spur were at an angle to the skeleton it could be no longer than
([(x-1)/2}.√2. One iteration of pruning will remove either a perpendicular or
angled pixel from the spur. Thus (x-1)/2 iterations are required.

9.6 Sketch the skeleton of (i) a square (ii) an equilateral triangle (iii) a
circle.

(Skeletons shown in red).


9.7 How can the MAT be used to reconstruct the original shape of the
region from which it was derived?

The medial axis transform produces a grayscale image of a


skeletonization process where the value of each pixel represents its
distance to a boundary in the original image. By constructing circles
centered at each pixel with radii equal to the pixel values, the original
object can be “grown” from the MAT image.

9.8 What shape and size of structuring element would you need to use in
order to detect just the horizontal lines in figure P9.1?

A structuring element that just finds inside the smallest horizontal line (and
doesn’t fit inside the vertical and diagonal lines) is required.

9.9 The features in the image shown in figure P9.2(i) are flawed by small
gaps, which have been removed in the image shown in figure P9.2(ii).
What processing operation would achieve this result? What size and
shape of structuring element is required?

Binary closing would remove the flaws. The structuring element has to fit
inside the foreground shapes. The structuring element should be
circular—considering the rounded edges throughout the image—and its
diameter should be just smaller than the width of the long strand in the
image.

9.10 What is (i) the skeleton and (ii) the medial transform of figure P9.3?
(i) (ii)

Note: The image for (ii), the MAT, has been log-stretched for better
resolution.

9.11 Which distance metric is used to obtain the distance transform in


figure 9.22?

The transform image was generated using the N8 distance metric.

9.12 Grayscale dilation and erosion are generalizations of binary dilation


and erosion. Describe how they are implemented.

Grayscale dilation involves superimposing a structuring element—defined


by a pattern of 1’s—over every pixel in the grayscale image.
Subsequently, only those pixels with a 1 over it are considered and the
resulting output pixel is the maximum of those considered pixel values.
Grayscale erosion is implemented in essentially the same way as
grayscale dilation, the difference being that the minimum of the considered
pixel values is chosen, rather than the maximum.

9.13 What is the top hat transformation and when is it used? Explain how
the top hat transformation can help to segment dark characters on a
light, but variable, background. Draw a one-dimensional profile
through an image to illustrate your explanation.
The top hat transformation is analogous to unsharp masking and is given
by:

TH = Image – Open (Image).

As such, the top hat transform is used (like the unsharp mask) to enhance
gray-level detail in the presence of shading. The top hat transformation
begins with the grayscale opening of the given image. This decreases the
brightness by “elevating” the dark-level of the background and objects.
This opening process is essentially taking the max(min(image)), meaning
that a minimum structuring element is applied to every pixel in the image
(erosion) and then a maximizing structuring element is applied to every
picture of the resulting, eroded image (dilation). The initial erosion
darkens the dark characters in the RIO and the following dilation
separates the darkened characters from the background. The top hat
transform is then completed when the opened image is subtracted from
the original image – making the background uniform and dark, and the
characters bright. The following images help illustrate the top hat
transform:

Figure 1:Original Figure 2:After Opening Figure 3: Original - Opening (Top Hat)
Figure 4:DipProfile, Original Figure 5:DipProfile, top hat

Figure 1 is the original image. Figure 2 is the image after grayscale


opening has been performed. Figure 3 is the top hat transform of Figure
1, i.e. the original image minus the grayscale opened image. Figure 4 is a
profile through the original image (Fig 1) and Figure 5 is a through the
resulting top hat transform image (Fig 3), showing the text characters
segmented from the background.

Another example:

Here’s a profile through a line of text in the original image –


Here’s the profile after the top hat transform –

We can see that all the background intensities have been subtracted
leaving just the text).

9.14 Why is finding the approximate convex hull using thickening so


slow?

The approximate convex hull is essentially a “45° convex hull”


approximation, meaning the boundaries of the convex hull must contain
orientations that are multiples of 45°. As such, the structuring elements of
the approximate convex hull need to be used in turn, followed by their
individual 90° rotations. This leads to a total of eight structuring elements
to be used for each iteration. The iteration process—the thickening—
continues until boundaries of the image objects meet the multiple-of-45°
condition (meaning no further changes occur in the image under the
application of the eight structuring elements). If the image contains
numerous objects that are at varying degrees of orientation while at the
same time containing a number of boundaries that are not multiples of 45°
(specifically if the objects are “thin” with “lengthy” boundaries), then the
eight-structuring-element iteration process can be very lengthy because
the iterations must continue until each of the image objects are thickened
enough for their boundaries to meet the multiple-of-45° condition.

9.15 What would be an effective way to remove “pepper” noise in a


grayscale image? Explain.

A 3-by-3 structuring element can be used to perform grayscale closing.


Grayscale closing smoothes an image from below its brightness surface,
attenuating the dark-intensity areas of “pepper” to a higher brightness,
smoothing these regions so that they no longer stand out as dark spots
but blend in more with the general grayness of the image. This would be
the opposite of grayscale opening, which decreases the brightness level of
small pixel regions, smoothing the image down towards the general gray
level.
10.1 What is image segmentation and why might one want to segment an
image? Describe an image segmentation algorithm and explain its
advantages and disadvantages.

Image segmentation is the partitioning of an image to separate regions of


interest from the background. This is done so that the image is easier to
analyze. It is typically used to locate objects within an image. An
application would be to distinguish a tumor from healthy tissue, or to
distinguish one organ from other parts of the body.

The isodata algorithm is a typical automatic (unsupervised) algorithm for


image segmentation. First the image is thresholded using the mean of all
the pixel values as a starting value. The mean of the resulting foreground
pixels and background pixels are calculated. A new threshold value is
calculated by averaging these two sample means. The process is
repeated until the threshold value no longer changes.

The algorithm works very well if the spreads of the individual distributions
are approximately equal. However, the disadvantage is that it does not
perform well where the distributions have differing variances or are of
abnormal shape.

10.2 Explain the basis for optimal segmentation using the Otsu method.
The Otsu method treats an image’s gray-level histogram as a probability
density function where, given that ni is the number of pixels with a gray
value of i and N is the total number of image pixels, the probability that a
particular pixel has a particular gray-level is given by pi = ni/N.

By thresholding the image at level k, we can define ω(k) to be the sum of


certain probabilities—summed over 0 to k—and μ(k) to be the remaining
probabilities—summed over k + 1 to L – 1, where L is the number of gray-
levels.
Subsequently, the Otsu method involves finding the value k that
maximizes the difference between ω(k) and μ(k) via an algorithm that
essentially maximizes the between-class variance (and minimizes the
within-class variance). This maximizes the separation of the two classes
(foreground and background), thus minimizing their overlap.

10.3 Explain the difference between contextual and non-contextual


segmentation methods.
Contextual segmentation exploits the relationships between image
features. For example, contextual segmentation might group pixels
together based on similar location or similar gradients. On the other hand,
non-contextual segmentation ignores the relationships between image
features. In this case, pixels are grouped together based upon a defined,
global attribute like gray-level.

10.4 What segmentation method is particularly useful for segmenting


images that contain a variable background? Explain the basis of the
method and why it works.
Adaptive thresholding is useful for segmenting variable background
images. It changes the image threshold dynamically, as opposed to
applying a single, global threshold to all of the pixels in the image. Each
pixel is considered to have a local neighborhood of n x n pixels, the mean
or median of which is used to calculate the local threshold value.
Subsequently, the pixel of interest is set to black or white depending on
whether it falls below or above this local threshold value. The adaptive
threshold technique is improved when an additive constant is chosen
which can be subtracted from the calculated mean or median of the n x n
neighborhood.
The technique is successful at compensating for variations in the
background so long as the n x n neighborhood is large enough, but not too
large.

10.5 Distinguish between automatic and semi-automatic methods of


segmentation, giving examples of each.
Automatic, or unsupervised, methods of segmentation proceed without
user intervention. Once the algorithm is chosen and implemented, the
user has no further interaction with the process—the algorithm contains all
the formulae and variables internally. Examples of unsupervised
segmentation are Otsu and isodata thresholding.

Semi-automatic, or supervised, methods of segmentation require that the


user inputs values for constants and other parameters before the
algorithm can proceed. An example of supervised segmentation would be
local adaptive thresholding, specifically the user must define the n x n
region, whether the mean or median should be used, and supply the
additive constant.

10.6 Design an energy term for a snake to track lines of constant gray
value.
Let f(x, y) be the image in question. We need to design both an internal
energy term and an external energy term for our snake. If we are tracking
lines of constant gray level, then we want our snake to behave like a thin,
metal strip rather than a shrinking elastic band. As such, the internal
energy function needs to be defined as the sum of the curvatures of the
snake measured at the control points. This can be taken to be the
measure of the slope of the line connecting any two given control points.
Hence:

Einternal = ∑ [(yi+1 – yi)/(xi+1 – xi)]2


where (xi,yi) and (xi+1, yi+1) represent the coordinates of two adjacent
control points and the sum ranges from i = 1 to i = N. Likewise, the
external energy term can be defined as the sum of the squares of the
image gradient since our snake simply needs to distinguish the boundary
of a constant gray-level line. Hence:

Eexternal = ∑ [grad f(x,y)]2

where, again, the sum ranges from i = 1 to i = N. Hence, the energy term
for our snake is:

Esnake = ∑ [(yi+1 – yi)/(xi+1 – xi)]2 + ∑ [grad f(x,y)]2

where both sums range from i = 1 to i = N.

10.7 Illustrate the use of the distance transform and morphological


watershed for separating objects that touch each other.
The distance transform of a binary image with overlapping objects is taken
– by taking multiple successive erosions of the objects in the image using
a structuring element that has a user-determined distance metric. This
distance transform is essentially a three-dimensional image with the gray-
level intensities representing the “height”. This distance transform image
is then thresholded at a given level, which is akin to “flooding” the
topographic surface of the “three-dimensional image” until the overlapping
objects in the original image have become “hills” separated by a flooded
“valley”.

Typically, this technique is implemented by complementing the “surface”


causing the “hills” to become “valleys”. This resulting “image” is
“immersed” in water, which begins filling the “basins” formed by the image
complementation. “Dams” are built to separate the flooded basins from
one another—subsequently, the dams form the watershed lines between
the “basins”, which segments the image into separated, non-overlapping
regions.

10.8 Explain why the watershed lines of a binary image correspond to the
“skiz” lines.
The skiz lines are the result of skeletonizing the background of an image,
reducing the objects in the image to interconnected lines, or regions. If
the skiz lines are superimposed over the original image, the objects in the
image are separated. The skiz lines are created by structuring elements
that continuously erode a given object until the structuring element
encounters another eroded object, essentially forming a boundary line
between two regions. This corresponds to the watershed lines because
when the thresholded image is complemented - the regions becoming
“valleys” - and the valleys “filled with water” the dams (watershed lines)
are formed when the boundaries of the “water-filled” regions meet. These
watershed lines match the skiz lines because they represent the same
region boundaries.
Solutions to Chapter 11

1. Shape features include area, perimeter, maximum Feret’s diameter, eccentricity,


circularity, Euler number, Fourier descriptors, medial axis transform, and convex hull.

• Translation: When we translate an object, we simply move it in the direction of a


specified vector. Many shape features are invariant under translation. In fact, the
only feature in the above list that is affected by translation is a Fourier descriptor.
Translating the original image changes the phase spectrum of the Fourier
transform. However, its magnitude spectrum remains unchanged.
• Rotation: In order to rotate an image, we specify a point and an angle, then use
that point as an axis of rotation to rotate the image by the specified angle. All of
the shape features in the above list are invariant under rotation.
• Scaling: Scaling an object is a linear transformation that enlarges or diminishes
objects. Therefore, the features invariant under scaling are those that are
independent of size units. This includes eccentricity, circularity, and Euler
number.
• Noise: Image noise is the random variation of brightness or color information.
Shape features are invariant with respect to noise as long as the noise does not
affect the shape of the object in question. For example, if an image is somewhat
grainy, the object may still be discernible. However, if the noise begins to
obscure the image, shape features may no longer be good descriptors because
the noise is creating artificial shape change within the image.
• Illumination: Illumination is the use of light and shadow. In an image, heavy use
of illumination will affect many of the shape features. For example, if an image is
originally black and white, the shape features will be very easy to describe. When
shadow is introduced, parts of the image will become gray. If part of an object is
in shadow, its area may appear smaller than in the original image. Similarly, the
medial axis transform may be affected because the edges of the image have
been obscured in the shadow.
2.

i. Pattern: A pattern is a theme of recurring objects that repeat in a predictable


manner. Patterns help us to describe the world in a computationally friendly
manner. In terms of image analysis, a pattern may be either a repeating motif in
the image or, more subtly, any recurring theme.

ii. Class: A class is a way of describing objects that have common properties.
Classification involves sorting objects in an image into separate classes. We use
measurable features (i.e. area, petal length, circularity) in order to enumerate
classes.

iii. Classifier: A classifier evaluates the evidence presented from feature extraction
and makes a decision as to the class each object should be assigned.

iv. Feature Space: A feature space is the n-dimensional space for which feature
vectors (vectors containing a set of measured features) can be plotted as points.
In feature space, each feature constitutes a dimension. For example, if we
measured sepal length, petal length, and petal width of irises, we would have
three-dimensional feature space.

v. Decision Rule: The decision rule compares the sample mean to the hypothesized
mean. If the sample mean is “close” to the hypothesized mean, we accept the
null hypothesis, i.e. there is no discernible difference between the hypothesized
set and the sample set.

vi. Discriminant Function: Given a set of linear combination of variables, whose


values are as close as possible within groups and as far apart as possible
between groups, the linear combinations are called discriminant functions. They
are of the form:

where

fkm = the value (score) on the canonical discriminant function for case m in the
group k.
Xikm = the value on discriminant variable Xi for case m in group k; and
ui = coefficients which produce the desired characteristics in the function.

3. A training set is a set of data that is used to fit, or train a model for prediction or
classification of values that are known in the training set but unknown in other future
data. It is typically chosen randomly to avoid skewed results future predictions. It is
usually chosen to be about 20% of the total sample set.

4.
x y z
7 4 3
4 1 8
X=
6 3 5
8 6 1
8 5 7
7 2 9
8 2 2
7 4 5
9 5 8
5 3 3

Step 1: Calculate the covariances for xx, xy, xz, yx, yy, yz, zx, zy and zz.

xx = 2.09 yy = 2.25

xy = yx = 1.45 yz = zy = -1.15

xz = zx = -0.39 zz = 7.09

Step 2: Arrange these values into a matrix of the following form:


cov(x,x) cov(x,y) cov(x,z)
C= cov(y,x) cov(y,y) cov(y,z)
cov(z,x) cov(z,y) cov(z,z)

C= 2.09 1.45 -0.39

1.45 2.25 -1.15

-0.39 -1.15 7.09


5. Using the discriminant.xls program

μ σ π λ
4 2 0.50000 1
10 1 0.50000 1

a= 3.0 b2–4ac = 642.5


b= -72.0 x+ = 16.225
c= 378.5 x– = 7.775

The optimal decision point is 7.775.

6. When a classifier is designed a training set of images is used. If the classes to which
these images belong is known we refer to process as supervised learning. If the classes
to which the images belong are unknown then the process is referred to as
unsupervised learning. In unsupervised learning the data are plotted to see whether
they cluster naturally.
7.
(a)
-7.427 2.328

1-NN: Nearest neighbor is vector 2 in class 3


3-NN: Two of the three nearest neighbors are vectors 2 and 1 in class 3

(b)
-4.797 -1.408

1-NN: Nearest neighbor is vector 1 in class 2


3-NN: Two of the three nearest neighbors are vectors 1 and 3 in class 2

(c)
1.079 -1.754

1-NN: Nearest neighbor is vector 5 in class 1


3-NN: Two of the three nearest neighbors are in class 2

(d)
4.821 2.435

1-NN: Nearest neighbor is vector 4 in class 1


3-NN: Two of the three nearest neighbors are vectors 4 and 1 in class 1

(e)
2.545 0.065

1-NN: Nearest neighbor is vector 5 in class 1


3-NN: Two of three nearest neighbors are vectors 5 and 3 in class 1

8. The features used to extract the letter E were Euler number (lack of holes) and
eccentricity
You could also skeletonize it and look for 3 end points.

9. In heart disease the arteries can be segmented and skeletonized, and the
shapes categorized and described with a grammar that could then be analyzed.

You might also like