Unit 4 DIVP
Unit 4 DIVP
Some image processing methods has inputs and outputs are images, whereas
segmentation comes in methods in which the inputs are images but the outputs are
attributes extracted from those images. Segmentation subdivides an image into its
constituent regions or objects. The level of detail to which the subdivision is carried
depends on the problem being solved. That is, segmentation should stop when the objects
or regions of interest in an application have been detected.
Point, Line, and Edge Detection
The three types of image features which are isolated points, lines, and edges.
An isolated point may be viewed as a line whose length and width are equal to one pixel.
A line may be viewed as an edge segment in which the intensity of the background on
either side of the line is either much higher or much lower than the intensity of the line
pixels.
Similarly, Edge pixels are pixels at which the intensity of an image function changes
abruptly, and edges are sets of connected edge pixels. Edge detectors are local image
processing methods designed to detect edge pixels.
Detection of Isolated Points
Based on the conclusions we know that point detection should be based on the second
derivative. This implies using the Laplacian:
డమ డమ
ߘ ଶ ݂(ݔ, = )ݕ డ௫ మ
+ డ௬ మ
where the partials are obtained using Eq
In the x-direction, we have
డమ (௫,௬)
= f(x+1, y) + f(x-1, y) -2f(x, y)
డ௫ మ
In the y-direction, we have
డమ (௫,௬)
డ௬ మ
= f(x, y+1) + f(x, y-1) -2f(x, y)
The Laplacian is then
ߘ ଶ ݂(ݔ, = )ݕf(x+1, y) + f(x-1, y) + f(x, y+1) + f(x, y-1) -4f(x, y)
Laplacian mask in Fig. (a), we say that a point has been detected at the location on which
the mask is centered if the absolute value of the response of the mask at that point
exceeds a specified threshold. Such points are labeled 1 in the output image and all others
are labeled 0, thus producing a binary image. The output is obtained using the following
expression:
1 ݂݅ |ܴ (ݔ, ܶ ≥ |)ݕ
݃(ݔ, = )ݕቄ
0 ܱݐℎ݁݁ݏ݅ݓݎ
where g is the output image, T is a nonnegative threshold, and R is given by
Fig Laplacian mask
Line Detection
The Laplacian mask is isotropic, so its response is independent of direction (with respect
to the four directions of the Laplacian mask: vertical, horizontal, and two diagonals).
Often, interest lies in detecting lines in specified directions. Consider the masks in Fig.
Suppose that an image with a constant background and containing various lines (oriented
at 0°, ± 45° and 90°) is filtered with the first mask.
The maximum responses would occur at image locations in which a horizontal line
passed through the middle row of the mask. This is easily verified by sketching a simple
array of 1s with a line of a different intensity (say, 5s) running horizontally through the
array. A similar experiment would reveal that the second mask in Fig. responds best to
lines oriented at ± 45° ; the third mask to vertical lines; and the fourth mask to lines in the
- 45° direction. The preferred direction of each mask is weighted with a larger coefficient
(i.e., 2) than other possible directions. The coefficients in each mask sum to zero,
indicating a zero response in areas of constant intensity.
Let R1, R2, R3 and R4 denote the responses of the masks in Fig. from left to right.
Suppose that an image is filtered (individually) with the four masks. If, at a given point in
the image, Rk > Rj, for all j ≠ k, that point is said to be more likely associated with a line
in the direction of mask k. For example, if at a point in the image, R1 > Rj, for j=2,3,4
that particular point is said to be more likely associated with a horizontal line.
Edge Models
Edge detection is the approach used most frequently for segmenting images based on
abrupt (local) changes in intensity. Edge models are classified according to their intensity
profiles. A step edge involves a transition between two intensity levels occurring ideally
over the distance of 1 pixel. Figure (a) shows a section of a vertical step edge and a
horizontal intensity profile through the edge. Edges are more closely modeled as having
an intensity ramp profile, such as the edge in Fig. (b). The slope of the ramp is inversely
proportional to the degree of blurring in the edge. A third model of an edge is the so-
called roof edge, having the characteristics illustrated in Fig. (c). Roof edges are models
of lines through a region, with the base (width) of a roof edge being determined by the
thickness and sharpness of the line.
Figure (a) shows the image from which the segment was extracted. Figure (b) shows a
horizontal intensity profile. This figure shows also the first and second derivatives of the
intensity profile. As moving from left to right along the intensity profile, we note that the first
derivative is positive at the onset of the ramp and at points on the ramp, and it is zero in areas
of constant intensity. The second derivative is positive at the beginning of the ramp, negative
at the end of the ramp, zero at points on the ramp, and zero at points of constant intensity.
The signs of the derivatives just discussed would be reversed for an edge that transitions from
light to dark. The intersection between the zero intensity axis and a line extending between
the extrema of the second derivative marks a point called the zero crossing of the second
derivative. The magnitude of the first derivative can be used to detect the presence of an edge
at a point in an image. Similarly, the sign of the second derivative can be used to determine
whether an edge pixel lies on the dark or light side of an edge. We note two additional
properties of the second derivative around an edge: (1) it produces two values for every edge
in an image (an undesirable feature); and (2) its zero crossings can be used for locating the
centers of thick edges,
We conclude this section by noting that there are three fundamental steps performed in
edge detection:
1. Image smoothing for noise reduction. The need for this step is amply illustrated by the
results in the second and third columns of Fig.
2. Detection of edge points. As mentioned earlier, this is a local operation that extracts
from an image all points that are potential candidates to become edge points.
3. Edge localization. The objective of this step is to select from the candidate edge points
only the points that are true members of the set of points comprising an edge.
Basic Edge Detection
Detecting changes in intensity for the purpose of finding edges can be accomplished
using first- or second-order derivatives.
The image gradient and its properties
The tool of choice for finding edge strength and direction at location (x, y) of an image f,
is the gradient, denoted by ߘ and defined as the vector
measured with respect to the x-axis. As in the case of the gradient image, ɑ(x, y) also is an
image of the same size as the original created by the array division of image ݃௬ by
image ݃௫ The direction of an edge at an arbitrary point (x, y) is orthogonal to the
direction, ɑ(x, y) , of the gradient vector at the point. This vector has the important
geometrical property that it points in the direction of the greatest rate of change of f at
location (x, y).
Gradient operators
Obtaining the gradient of an image requires computing the partial derivatives and at every
pixel location in the image. We know that
డ(௫,௬)
݃௫ = = f(x+1,y) – f(x, y) and
డ௫
డ(௫,௬)
݃௬ = = f(x,y+1) – f(x, y)
డ௬
These two equations can be implemented for all pertinent values of x and y by filtering
f(x, y) with the 1-D masks.
When diagonal edge direction is of interest, we need a 2-D mask. The Roberts cross-
gradient operators (Roberts [1965]) are one of the earliest attempts to use 2-D masks
with a diagonal preference.
The Roberts operators are based on implementing the diagonal differences
డ డ
݃௫ = డ௫ = (ݖଽ - ݖହ ) and ݃௬ = డ௬ = ( ଼ݖ- ) ݖ
Masks of size 2x2 are simple conceptually, but they are not as useful for computing edge
direction as masks that are symmetric about the center point, the smallest of which are of
size 3x3. These masks take into account the nature of the data on opposite sides of the
center point and thus carry more information regarding the direction of an edge.
The simplest digital approximations to the partial derivatives using masks of size 3x3 are
given by
డ
݃௫ = డ௫ = ( ݖ+ ଼ݖ+ݖଽ ) - (ݖଵ +ݖଶ +ݖଷ ) and
డ
݃௬ = డ௬ = (ݖଷ + ݖ+ ݖଽ ) - (ݖଵ +ݖସ +) ݖ
In these formulations, the difference between the third and first rows of the region
approximates the derivative in the x-direction, and the difference between the third and
first columns approximate the derivate in the y-direction. Intuitively, we would expect
these approximations to be more accurate than the approximations obtained using the
Roberts operators.
Equations can be implemented over an entire image by filtering with the two masks in
Figs. These masks are called the Prewitt operators. (Prewitt [1970]).
A slight variation of the preceding two equations uses a weight of 2 in the center
coefficient:
డ
݃௫ = = ( ݖ+2 ଼ݖ+ݖଽ ) - (ݖଵ +2ݖଶ +ݖଷ ) and
డ௫
డ
݃௬ = డ௬ = (ݖଷ + 2 ݖ+ ݖଽ ) - (ݖଵ +2ݖସ +) ݖ
Figures show the masks used to implement Eqs. These masks are called the Sobel
operators. Note that the coefficients of all the masks in Fig. sum to zero, thus giving a
response of zero in areas of constant intensity, as expected of a derivative operator.
In addition, Eqs. give identical results for vertical and horizontal edges when the Sobel or
Prewitt masks are used.It is possible to modify the masks so that they have their strongest
responses along the diagonal directions. Figure shows the two additional Prewitt and
Sobel masks needed for detecting edges in the diagonal directions.
Figure (a) illustrates the geometrical interpretation of the parameters ρ and θ. A horizontal
line has θ = 0 with ρ being equal to the positive x-intercept. Similarly, a vertical line has θ =
90 with ρ being equal to the positive y-intercept, or θ = -90 with ρ being equal to the negative
y-intercept. Each sinusoidal curve in Figure (b) represents the family of lines that pass
through a particular point (xk,yk) in the xy-plane.The intersection point (ρ’,θ’) in Fig.(b)
corresponds to the line that passes through both (xi,yi) and (xj,yj) in Fig.(a).
The computational attractiveness of the Hough transform arises from subdividing the ρθ
parameter space into so-called accumulator cells, as Fig.(c) illustrates, where (ρmin,ρmax) and
(θmin,θmax) are the expected ranges of the parameter values: -90 ≤ θ ≤ 90 and –D ≤ ρ ≤ D
where D is the maximum distance between opposite corners in an image. The cell at
coordinates (i, j) with accumulator value A(i, j) corresponds to the square associated with
parameter-space coordinates (ρi,θj). Initially, these cells are set to zero. Then, for every non-
background point (xk,yk) in the xy-plane, we let θ equal each of the allowed subdivision
values on the θ-axis and solve for the corresponding ρusing the equation
An approach based on the Hough transform is as follows:
1. Obtain a binary edge image using any of the techniques.
2. Specify subdivisions in the ρθ-plane
3. Examine the counts of the accumulator cells for high pixel concentrations.
4. Examine the relationship (principally for continuity) between pixels in a chosen cell.
Thresholding
Regions were identified by first finding edge segments and then attempting to link the
segments into boundaries. Thresholding is a techniques for partitioning images directly into
regions based on intensity values and/or properties of these values.
When T is a constant applicable over an entire image, the process given in this equation is
referred to as global thresholding. When the value of T changes over an image, we use
the term variable thresholding.
Figure (b) shows a more difficult thresholding problem involving a histogram with three
dominant modes corresponding, for example, to two types of light objects on a dark
background. Here, multiple thresholding classifies a point (x,y) as belonging to the
background if f(x,y) ≤ T1, to one object class if T1 < f(x,y) ≤ T2 and to the other object
class if f(x,y) > T2.
That is, the segmented image is given by
where f(x,y) is the input image. This equation is evaluated for all pixel locations in the
image, and a different threshold is computed at each location (x, y) using the pixels in the
neighborhood Sxy.
Region-Based Segmentation
Region Growing
As its name implies, region growing is a procedure that groups pixels or sub regions into
larger regions based on predefined criteria for growth. The basic approach is to start with
a set of “seed” points and from these grow regions by appending to each seed those
neighboring pixels that have predefined properties similar to the seed (such as specific
ranges of intensity or color).
Selecting a set of one or more starting points often can be based on the nature of the
problem. When a priori information is not available, the procedure is to compute at every
pixel the same set of properties that ultimately will be used to assign pixels to regions
during the growing process. If the result of these computations shows clusters of values,
the pixels whose properties place them near the centroid of these clusters can be used as
seeds. The selection of similarity criteria depends not only on the problem under
consideration, but also on the type of image data available.
Descriptors alone can yield misleading results if connectivity properties are not used in
the region-growing process. For example, visualize a random arrangement of pixels with
only three distinct intensity values. Grouping pixels with the same intensity level to form
a “region” without paying attention to connectivity would yield a segmentation result that
is meaningless. Another problem in region growing is the formulation of a stopping rule.
Region growth should stop when no more pixels satisfy the criteria for inclusion in that
region. Criteria such as intensity values, texture, and color are local in nature and do not
take into account the “history” of region growth. Additional criteria that increase the
power of a region-growing algorithm utilize the concept of size, likeness between a
candidate pixel and the pixels grown so far (such as a comparison of the intensity of a
candidate and the average intensity of the grown region), and the shape of the region
being grown.
Let: f(x,y) denote an input image array; S(x,y) denote a seed array containing 1s at the
locations of seed points and 0s elsewhere; and Q denote a predicate to be applied at each
location (x,y). Arrays f and S are assumed to be of the same size.
A basic region-growing algorithm based on 8-connectivity may be stated as follows.
1. Find all connected components in S(x,y) and erode each connected component to one
pixel; label all such pixels found as 1. All other pixels in S are labeled 0.
2. Form an image fQ such that, at a pair of coordinates (x,y), let fQ(x,y) = 1 if the input
image satisfies the given predicate, Q, at those coordinates; otherwise, let fQ(x,y) = 0
3. Let g be an image formed by appending to each seed point in S all the 1-valued points
in fQ that are 8-connected to that seed point.
4. Label each connected component in with a different region label (e.g., 1,2,3…).This is
the segmented image obtained by region growing.
Properties of DCT
The DCT is a real transform. This property makes it attractive in comparison to
the Fourier transform.
The DCT has excellent energy compaction properties. For that reason it is widely
used in image compression standards (as for example JPEG standards).
There are fast algorithms to compute the DCT, similar to the FFT for computing
the DFT.
WALSH TRANSFORM (WT)