0% found this document useful (0 votes)
36 views19 pages

Unit 4 DIVP

Uploaded by

Abhishek Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views19 pages

Unit 4 DIVP

Uploaded by

Abhishek Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Image Segmentation

Some image processing methods has inputs and outputs are images, whereas
segmentation comes in methods in which the inputs are images but the outputs are
attributes extracted from those images. Segmentation subdivides an image into its
constituent regions or objects. The level of detail to which the subdivision is carried
depends on the problem being solved. That is, segmentation should stop when the objects
or regions of interest in an application have been detected.
Point, Line, and Edge Detection
The three types of image features which are isolated points, lines, and edges.
An isolated point may be viewed as a line whose length and width are equal to one pixel.
A line may be viewed as an edge segment in which the intensity of the background on
either side of the line is either much higher or much lower than the intensity of the line
pixels.
Similarly, Edge pixels are pixels at which the intensity of an image function changes
abruptly, and edges are sets of connected edge pixels. Edge detectors are local image
processing methods designed to detect edge pixels.
Detection of Isolated Points
Based on the conclusions we know that point detection should be based on the second
derivative. This implies using the Laplacian:
డమ ௙ డమ ௙
ߘ ଶ ݂(‫ݔ‬, ‫= )ݕ‬ డ௫ మ
+ డ௬ మ
where the partials are obtained using Eq
In the x-direction, we have
డమ ௙(௫,௬)
= f(x+1, y) + f(x-1, y) -2f(x, y)
డ௫ మ
In the y-direction, we have
డమ ௙(௫,௬)
డ௬ మ
= f(x, y+1) + f(x, y-1) -2f(x, y)
The Laplacian is then
ߘ ଶ ݂(‫ݔ‬, ‫ = )ݕ‬f(x+1, y) + f(x-1, y) + f(x, y+1) + f(x, y-1) -4f(x, y)
Laplacian mask in Fig. (a), we say that a point has been detected at the location on which
the mask is centered if the absolute value of the response of the mask at that point
exceeds a specified threshold. Such points are labeled 1 in the output image and all others
are labeled 0, thus producing a binary image. The output is obtained using the following
expression:
1 ݂݅ |ܴ (‫ݔ‬, ‫ܶ ≥ |)ݕ‬
݃(‫ݔ‬, ‫ = )ݕ‬ቄ
0 ܱ‫ݐ‬ℎ݁‫݁ݏ݅ݓݎ‬
where g is the output image, T is a nonnegative threshold, and R is given by
Fig Laplacian mask
Line Detection
The Laplacian mask is isotropic, so its response is independent of direction (with respect
to the four directions of the Laplacian mask: vertical, horizontal, and two diagonals).
Often, interest lies in detecting lines in specified directions. Consider the masks in Fig.
Suppose that an image with a constant background and containing various lines (oriented
at 0°, ± 45° and 90°) is filtered with the first mask.
The maximum responses would occur at image locations in which a horizontal line
passed through the middle row of the mask. This is easily verified by sketching a simple
array of 1s with a line of a different intensity (say, 5s) running horizontally through the
array. A similar experiment would reveal that the second mask in Fig. responds best to
lines oriented at ± 45° ; the third mask to vertical lines; and the fourth mask to lines in the
- 45° direction. The preferred direction of each mask is weighted with a larger coefficient
(i.e., 2) than other possible directions. The coefficients in each mask sum to zero,
indicating a zero response in areas of constant intensity.
Let R1, R2, R3 and R4 denote the responses of the masks in Fig. from left to right.
Suppose that an image is filtered (individually) with the four masks. If, at a given point in
the image, Rk > Rj, for all j ≠ k, that point is said to be more likely associated with a line
in the direction of mask k. For example, if at a point in the image, R1 > Rj, for j=2,3,4
that particular point is said to be more likely associated with a horizontal line.
Edge Models
Edge detection is the approach used most frequently for segmenting images based on
abrupt (local) changes in intensity. Edge models are classified according to their intensity
profiles. A step edge involves a transition between two intensity levels occurring ideally
over the distance of 1 pixel. Figure (a) shows a section of a vertical step edge and a
horizontal intensity profile through the edge. Edges are more closely modeled as having
an intensity ramp profile, such as the edge in Fig. (b). The slope of the ramp is inversely
proportional to the degree of blurring in the edge. A third model of an edge is the so-
called roof edge, having the characteristics illustrated in Fig. (c). Roof edges are models
of lines through a region, with the base (width) of a roof edge being determined by the
thickness and sharpness of the line.

Figure (a) shows the image from which the segment was extracted. Figure (b) shows a
horizontal intensity profile. This figure shows also the first and second derivatives of the
intensity profile. As moving from left to right along the intensity profile, we note that the first
derivative is positive at the onset of the ramp and at points on the ramp, and it is zero in areas
of constant intensity. The second derivative is positive at the beginning of the ramp, negative
at the end of the ramp, zero at points on the ramp, and zero at points of constant intensity.
The signs of the derivatives just discussed would be reversed for an edge that transitions from
light to dark. The intersection between the zero intensity axis and a line extending between
the extrema of the second derivative marks a point called the zero crossing of the second
derivative. The magnitude of the first derivative can be used to detect the presence of an edge
at a point in an image. Similarly, the sign of the second derivative can be used to determine
whether an edge pixel lies on the dark or light side of an edge. We note two additional
properties of the second derivative around an edge: (1) it produces two values for every edge
in an image (an undesirable feature); and (2) its zero crossings can be used for locating the
centers of thick edges,

We conclude this section by noting that there are three fundamental steps performed in
edge detection:
1. Image smoothing for noise reduction. The need for this step is amply illustrated by the
results in the second and third columns of Fig.
2. Detection of edge points. As mentioned earlier, this is a local operation that extracts
from an image all points that are potential candidates to become edge points.
3. Edge localization. The objective of this step is to select from the candidate edge points
only the points that are true members of the set of points comprising an edge.
Basic Edge Detection
Detecting changes in intensity for the purpose of finding edges can be accomplished
using first- or second-order derivatives.
The image gradient and its properties
The tool of choice for finding edge strength and direction at location (x, y) of an image f,
is the gradient, denoted by ߘ௙ and defined as the vector

The magnitude (length) of vector ߘ௙ ,denoted as M(x, y) where


is the value of the rate of change in the direction of the gradient vector.
Note that ݃௫ , ݃௬ and M(x, y) are images of the same size as the original, created when x
and y are allowed to vary over all pixel locations in f.
The direction of the gradient vector is given by the angle

measured with respect to the x-axis. As in the case of the gradient image, ɑ(x, y) also is an
image of the same size as the original created by the array division of image ݃௬ by
image ݃௫ The direction of an edge at an arbitrary point (x, y) is orthogonal to the
direction, ɑ(x, y) , of the gradient vector at the point. This vector has the important
geometrical property that it points in the direction of the greatest rate of change of f at
location (x, y).
Gradient operators
Obtaining the gradient of an image requires computing the partial derivatives and at every
pixel location in the image. We know that
డ௙(௫,௬)
݃௫ = = f(x+1,y) – f(x, y) and
డ௫
డ௙(௫,௬)
݃௬ = = f(x,y+1) – f(x, y)
డ௬
These two equations can be implemented for all pertinent values of x and y by filtering
f(x, y) with the 1-D masks.

When diagonal edge direction is of interest, we need a 2-D mask. The Roberts cross-
gradient operators (Roberts [1965]) are one of the earliest attempts to use 2-D masks
with a diagonal preference.
The Roberts operators are based on implementing the diagonal differences
డ௙ డ௙
݃௫ = డ௫ = (‫ݖ‬ଽ - ‫ݖ‬ହ ) and ݃௬ = డ௬ = (‫ ଼ݖ‬- ‫) ଺ݖ‬
Masks of size 2x2 are simple conceptually, but they are not as useful for computing edge
direction as masks that are symmetric about the center point, the smallest of which are of
size 3x3. These masks take into account the nature of the data on opposite sides of the
center point and thus carry more information regarding the direction of an edge.
The simplest digital approximations to the partial derivatives using masks of size 3x3 are
given by
డ௙
݃௫ = డ௫ = (‫ ଻ݖ‬+‫ ଼ݖ‬+‫ݖ‬ଽ ) - (‫ݖ‬ଵ +‫ݖ‬ଶ +‫ݖ‬ଷ ) and
డ௙
݃௬ = డ௬ = (‫ݖ‬ଷ + ‫ ଺ݖ‬+ ‫ݖ‬ଽ ) - (‫ݖ‬ଵ +‫ݖ‬ସ +‫) ଻ݖ‬
In these formulations, the difference between the third and first rows of the region
approximates the derivative in the x-direction, and the difference between the third and
first columns approximate the derivate in the y-direction. Intuitively, we would expect
these approximations to be more accurate than the approximations obtained using the
Roberts operators.
Equations can be implemented over an entire image by filtering with the two masks in
Figs. These masks are called the Prewitt operators. (Prewitt [1970]).

A slight variation of the preceding two equations uses a weight of 2 in the center
coefficient:
డ௙
݃௫ = = (‫ ଻ݖ‬+2‫ ଼ݖ‬+‫ݖ‬ଽ ) - (‫ݖ‬ଵ +2‫ݖ‬ଶ +‫ݖ‬ଷ ) and
డ௫
డ௙
݃௬ = డ௬ = (‫ݖ‬ଷ + 2‫ ଺ݖ‬+ ‫ݖ‬ଽ ) - (‫ݖ‬ଵ +2‫ݖ‬ସ +‫) ଻ݖ‬
Figures show the masks used to implement Eqs. These masks are called the Sobel
operators. Note that the coefficients of all the masks in Fig. sum to zero, thus giving a
response of zero in areas of constant intensity, as expected of a derivative operator.
In addition, Eqs. give identical results for vertical and horizontal edges when the Sobel or
Prewitt masks are used.It is possible to modify the masks so that they have their strongest
responses along the diagonal directions. Figure shows the two additional Prewitt and
Sobel masks needed for detecting edges in the diagonal directions.

Edge Linking and Boundary Detection


Ideally, edge detection should yield sets of pixels lying only on edges. In practice, these
pixels seldom characterize edges completely because of noise, breaks in the edges due to
non uniform illumination, and other effects that introduce spurious discontinuities in
intensity values.
Therefore, edge detection typically is followed by linking algorithms designed to
assemble edge pixels into meaningful edges and/or region boundaries. The 3 techniques
are:
The first requires knowledge about edge points in a local region (e.g., a 3x3
neighborhood);
the second requires that points on the boundary of a region be known; and
the third is a global approach that works with an entire edge image.
Local processing
The two principal properties used for establishing similarity of edge pixels in this kind of
analysis are (1) the strength (magnitude) and (2) the direction of the gradient vector.
Let Sxy denote the set of coordinates of a neighborhood centered at point (x,y) in an
image. An edge pixel with coordinates (s,t) in Sxy is similar in magnitude to the pixel at
(x,y) if

where E is a positive threshold.


An edge pixel with coordinates (s,t) in Sxy has an angle similar to the pixel at (x,y)if
where A is a positive angle threshold
A simplification particularly well suited for real time applications consists of the
following steps:
1. Compute the gradient magnitude and angle arrays, M(x,y) and ɑ(x,y) of the input image
f(x,y),
2. Form a binary image g, whose value at any pair of coordinates is given by:

where TM is a threshold, A is a specified angle direction, and TA defines a “band” of


acceptable directions about A.
3. Scan the rows of g and fill (set to 1) all gaps (sets of 0s) in each row that do not exceed
a specified length K. Note that, by definition, a gap is bounded at both ends by one or
more 1s. The rows are processed individually, with no memory between them.
4. To detect gaps in any other direction, rotate by this angle and apply the horizontal
scanning procedure in Step 3. Rotate the result back by –θ. When interest lies in
horizontal and vertical edge linking, Step 4 becomes a simple procedure in which g is
rotated ninety degrees, the rows are scanned, and the result is rotated back.
Regional processing
Often, the location of regions of interest in an image are known or can be determined.
This implies that knowledge is available regarding the regional membership of pixels in
the corresponding edge image.
In such situations, we can use techniques for linking pixels on a regional basis, with the
desired result being an approximation to the boundary of the region. One approach to this
type of processing is functional approximation, where we fit a 2-D curve to the known
points.
Figure shows a set of points representing an open curve in which the end points have been
labeled A and B. These two points are by definition vertices of the polygon. Then, we
compute the perpendicular distance from all other points in the curve to this line and
select the point that yielded the maximum distance. If this distance exceeds a specified
threshold, the corresponding point, labeled C is declared a vertex.
An algorithm for finding a polygonal fit to open and closed curves may be stated as
follows:
1. Let P be a sequence of ordered, distinct, 1-valued points of a binary image. Specify two
starting points, A and B. These are the two starting vertices of the polygon.
2. Specify a threshold, T and two empty stacks, OPEN and CLOSED.
3. If the points in P correspond to a closed curve, put A into OPEN and put B into OPEN
and into CLOSED. If the points correspond to an open curve, put A into OPEN and B
into CLOSED.
4. Compute the parameters of the line passing from the last vertex in CLOSED to the last
vertex in OPEN.
5. Compute the distances from the line in Step 4 to all the points in P whose sequence
places them between the vertices from Step 4. Select the point Vmax, with the maximum
distance, Dmax(ties are resolved arbitrarily).
6. If Dmax > T, place Vmax at the end of the OPEN stack as a new vertex. Go to Step 4.
7. Else, remove the last vertex from OPEN and insert it as the last vertex of CLOSED.
8. If OPEN is not empty, go to Step 4.
9. Else, exit. The vertices in CLOSED are the vertices of the polygonal fit to the points in
P.
Global processing using the Hough transform
In regional processing, it makes sense to link a given set of pixels only if we know
that they are part of the boundary of a meaningful region. Often, we have to work with
unstructured environments in which all we have is an edge image and no knowledge about
where objects of interest might be. In such situations, all pixels are candidates for linking and
thus have to be accepted or eliminated based on predefined global properties. In this section,
we develop an approach based on whether sets of pixels lie on curves of a specified shape.
Once detected, these curves form the edges or region boundaries of interest.
Given n points in an image, suppose that we want to find subsets of these points that lie
on straight lines. One possible solution is to find first all lines determined by every pair of
points and then find all subsets of points that are close to particular lines. This is a
computationally prohibitive task in all but the most trivial applications.
Hough proposed an alternative approach, commonly referred to as the Hough transform.
Consider a point (xi,yi) in the xy-plane and the general equation of a straight line in slope-
intercept form, yi= axi + b. Infinitely many lines pass through (xi,yi) but they all satisfy the
equation yi= axi + b for varying values of a and b. However, writing this equation as b= -xia
+ yi and considering the ab-plane (also called parameter space) yields the equation of a
single line for a fixed pair (xi,yi) . Furthermore, a second point (xj,yj) also has a line in
parameter space associated with it, and, unless they are parallel, this line intersects the line
associated with (xi,yi) at some point (a’,b’)where a’ is the slope and b’ the intercept of the line
containing both (xi,yi) and (xj,yj) in the xy-plane. In fact, all the points on this line have lines
in parameter space that intersect at (a’,b’) following figure illustrates these concepts.
A practical difficulty with this approach, however, is that a (the slope of a line) approaches
infinity as the line approaches the vertical direction. One way around this difficulty is to use
the normal representation of a line:

Figure (a) illustrates the geometrical interpretation of the parameters ρ and θ. A horizontal
line has θ = 0 with ρ being equal to the positive x-intercept. Similarly, a vertical line has θ =
90 with ρ being equal to the positive y-intercept, or θ = -90 with ρ being equal to the negative
y-intercept. Each sinusoidal curve in Figure (b) represents the family of lines that pass
through a particular point (xk,yk) in the xy-plane.The intersection point (ρ’,θ’) in Fig.(b)
corresponds to the line that passes through both (xi,yi) and (xj,yj) in Fig.(a).
The computational attractiveness of the Hough transform arises from subdividing the ρθ
parameter space into so-called accumulator cells, as Fig.(c) illustrates, where (ρmin,ρmax) and
(θmin,θmax) are the expected ranges of the parameter values: -90 ≤ θ ≤ 90 and –D ≤ ρ ≤ D
where D is the maximum distance between opposite corners in an image. The cell at
coordinates (i, j) with accumulator value A(i, j) corresponds to the square associated with
parameter-space coordinates (ρi,θj). Initially, these cells are set to zero. Then, for every non-
background point (xk,yk) in the xy-plane, we let θ equal each of the allowed subdivision
values on the θ-axis and solve for the corresponding ρusing the equation
An approach based on the Hough transform is as follows:
1. Obtain a binary edge image using any of the techniques.
2. Specify subdivisions in the ρθ-plane
3. Examine the counts of the accumulator cells for high pixel concentrations.
4. Examine the relationship (principally for continuity) between pixels in a chosen cell.

Thresholding
Regions were identified by first finding edge segments and then attempting to link the
segments into boundaries. Thresholding is a techniques for partitioning images directly into
regions based on intensity values and/or properties of these values.

The basics of intensity thresholding


Suppose that the intensity histogram in Fig. (a) corresponds to an image f(x,y), composed
of light objects on a dark background, in such a way that object and background pixels
have intensity values grouped into two dominant modes. One obvious way to extract the
objects from the background is to select a threshold T, that separates these modes.
Then, any point (x, y) in the image at which f(x, y) > T is called an object point;
otherwise, the point is called a background point. In other words, the segmented image
g(x,y), is given by

When T is a constant applicable over an entire image, the process given in this equation is
referred to as global thresholding. When the value of T changes over an image, we use
the term variable thresholding.
Figure (b) shows a more difficult thresholding problem involving a histogram with three
dominant modes corresponding, for example, to two types of light objects on a dark
background. Here, multiple thresholding classifies a point (x,y) as belonging to the
background if f(x,y) ≤ T1, to one object class if T1 < f(x,y) ≤ T2 and to the other object
class if f(x,y) > T2.
That is, the segmented image is given by

where a,b and c are any three distinct intensity values.

Basic Global Thresholding


When the intensity distributions of objects and background pixels are sufficiently distinct,
it is possible to use a single (global) threshold applicable over the entire image.
The following iterative algorithm can be used for this purpose:
1. Select an initial estimate for the global threshold,
2. Segment the image using in Eq. 1. This will produce two groups of pixels: G1
consisting of all pixels with intensity values > T, and G2 consisting of pixels with values
≤T.
3. Compute the average (mean) intensity values m1 and m2 for the pixels in G1 and G2
respectively.
4. Compute a new threshold value:
T = ½ (m1+m2)
5. Repeat Steps 2 through 4 until the difference between values of T in successive
iterations is smaller than a predefined parameter ∆T.
This simple algorithm works well in situations where there is a reasonably clear valley
between the modes of the histogram related to objects and background. Parameter ∆T is
used to control the number of iterations in situations where speed is an important issue.
In general, the larger ∆T is the less iteration the algorithm will perform. The initial
threshold must be chosen greater than the minimum and less than maximum intensity
level in the image. The average intensity of the image is a good initial choice for T.
Variable Thresholding
Image partitioning
One of the simplest approaches to variable thresholding is to subdivide an image into non
overlapping rectangles. This approach is used to compensate for non-uniformities in
illumination and/or reflectance. The rectangles are chosen small enough so that the
illumination of each is approximately uniform.
Variable thresholding based on local image properties
The basic approach to local thresholding using the standard deviation and mean of the
pixels in a neighborhood of every point in an image. Let σxy and mxy denote the standard
deviation and mean value of the set of pixels contained in a neighborhood, Sxy centered
at coordinates (x,y) in an image.
The following are common forms of variable, local thresholds:

where and are nonnegative constants, and

where mG is the global image mean. The segmented image is computed as

where f(x,y) is the input image. This equation is evaluated for all pixel locations in the
image, and a different threshold is computed at each location (x, y) using the pixels in the
neighborhood Sxy.

Region-Based Segmentation
Region Growing
As its name implies, region growing is a procedure that groups pixels or sub regions into
larger regions based on predefined criteria for growth. The basic approach is to start with
a set of “seed” points and from these grow regions by appending to each seed those
neighboring pixels that have predefined properties similar to the seed (such as specific
ranges of intensity or color).
Selecting a set of one or more starting points often can be based on the nature of the
problem. When a priori information is not available, the procedure is to compute at every
pixel the same set of properties that ultimately will be used to assign pixels to regions
during the growing process. If the result of these computations shows clusters of values,
the pixels whose properties place them near the centroid of these clusters can be used as
seeds. The selection of similarity criteria depends not only on the problem under
consideration, but also on the type of image data available.
Descriptors alone can yield misleading results if connectivity properties are not used in
the region-growing process. For example, visualize a random arrangement of pixels with
only three distinct intensity values. Grouping pixels with the same intensity level to form
a “region” without paying attention to connectivity would yield a segmentation result that
is meaningless. Another problem in region growing is the formulation of a stopping rule.
Region growth should stop when no more pixels satisfy the criteria for inclusion in that
region. Criteria such as intensity values, texture, and color are local in nature and do not
take into account the “history” of region growth. Additional criteria that increase the
power of a region-growing algorithm utilize the concept of size, likeness between a
candidate pixel and the pixels grown so far (such as a comparison of the intensity of a
candidate and the average intensity of the grown region), and the shape of the region
being grown.
Let: f(x,y) denote an input image array; S(x,y) denote a seed array containing 1s at the
locations of seed points and 0s elsewhere; and Q denote a predicate to be applied at each
location (x,y). Arrays f and S are assumed to be of the same size.
A basic region-growing algorithm based on 8-connectivity may be stated as follows.
1. Find all connected components in S(x,y) and erode each connected component to one
pixel; label all such pixels found as 1. All other pixels in S are labeled 0.
2. Form an image fQ such that, at a pair of coordinates (x,y), let fQ(x,y) = 1 if the input
image satisfies the given predicate, Q, at those coordinates; otherwise, let fQ(x,y) = 0
3. Let g be an image formed by appending to each seed point in S all the 1-valued points
in fQ that are 8-connected to that seed point.
4. Label each connected component in with a different region label (e.g., 1,2,3…).This is
the segmented image obtained by region growing.

Region Splitting and Merging


An alternative is to subdivide an image initially into a set of arbitrary, disjoint regions and
then merge and/or split the regions in an attempt to satisfy the conditions of segmentation.
Let R represent the entire image region and select a predicate Q. One approach for
segmenting R is to subdivide it successively into smaller and smaller quadrant regions so
that, for any region Ri, Q(Ri) =True.
We start with the entire region. If Q(R)=FALSE, we divide the image into
quadrants. If Q is FALSE for any quadrant, we subdivide that quadrant into sub
quadrants, and so on. This particular splitting technique has a convenient representation in
the form of so-called quadtrees, that is, trees in which each node has exactly four
descendants, as Fig. shows (the images corresponding to the nodes of a quadtree
sometimes are called quadregions or quadimages). The root of the tree corresponds to the
entire image and that each node corresponds to the subdivision of a node into four
descendant nodes. In this case, only R4 was subdivided further.
If only splitting is used, the final partition normally contains adjacent regions with
identical properties. This drawback can be remedied by allowing merging as well as
splitting. Satisfying the constraints of segmentation outlined requires merging only
adjacent regions whose combined pixels satisfy the predicate Q. That is, two adjacent
regions Rj and Rk are merged only if Q(Rj U Rk)= TRUE. The preceding discussion can
be summarized by the following procedure in which, at any step, we
1. Split into four disjoint quadrants any region for which
2. When no further splitting is possible, merge any adjacent regions and for which
3. Stop when no further merging is possible.
It is customary to specify a minimum quad region size beyond which no further splitting is
carried out.
Numerous variations of the preceding basic theme are possible. For example, a significant
simplification results if in Step 2 we allow merging of any two adjacent regions Ri and Rj if
each one satisfies the predicate individually. This results in a much simpler (and faster)
algorithm, because testing of the predicate is limited to individual quad regions. As the
following example shows, this simplification is still capable of yielding good segmentation
results.
Image transform
Image transformation relies on using a set of basis functions on which the image is
projected to form the transformed image. Image transforms are necessary for image
analysis and image processing. Transformations are mathematical functions that allow us
to convert from one domain to another.
Therefore, transformation of a signal f(x) will convert the signal into a new signal, say,
g(y), in a different domain. However, transformation does not change the information
content present in the signal/image. Depending on the transform used, the transformed
image represents the image data in a more compact form, which helps in storage and
transmission of the image easily.
Some transforms also separate the noise from the image, making the information in the
image clearer. Transformations such as Fourier transformation, cosine transformation
provide us the information about the frequency content in an image. In the cosine
transform, the basis functions are a series of cosine functions and the resulting domain is
the frequency domain. In the case of the wavelet transform, the basis functions are
derived from a mother wavelet function by dilation and translation.
For our understanding, we first consider a one-dimensional discrete signal with ܰ
samples and represent the signal as f(x),0 ≤ x ≤ N -1. Then, a transform of the signal(x)
will convert the signal into a new signal g(y), such that if (x) has N samples, then g(y)
also has N samples. The generic form of transforms is:

where, ܶ(‫ݑ‬, ‫)ݔ‬is called the forward transformation kernel.


The Eqn. shows that to carry out the transformation, all values of f(x) are required to
compute each of the N values of (‫)ݑ‬. Let f and g denote these column vectors. Then, the
transformation equation will be of the form
g=T.f
where, the matrix T is of dimension Nx N and contains the values T(y,x)for different y,x.
The Eqn. can also be extended for transformation of 2-D signals.
THE DISCRETE COSINE TRANSFORM (DCT)
In Discrete Cosine Transformation, coefficients carry information about the pixels of the
image. Also, much information is contained using very few coefficients, and the
remaining coefficient contains minimal information. These coefficients can be removed
without losing information. By doing this, the file size is reduced in the DCT domain.
DCT is used for lossy compression.

Properties of DCT
 The DCT is a real transform. This property makes it attractive in comparison to
the Fourier transform.
 The DCT has excellent energy compaction properties. For that reason it is widely
used in image compression standards (as for example JPEG standards).
 There are fast algorithms to compute the DCT, similar to the FFT for computing
the DFT.
WALSH TRANSFORM (WT)

Properties of Walsh Transform


 Unlike the Fourier transform, which is based on trigonometric terms, the Walsh
transform consists of a series expansion of basis functions whose values are only -
1 or 1 and they have the form of square waves. These functions can be
implemented more efficiently in a digital environment than the exponential basis
functions of the Fourier transform.
 The forward and inverse Walsh kernels are identical except for a constant
multiplicative factor of N 1 for 1-D signals.
 The forward and inverse Walsh kernels are identical for 2-D signals. This is
because the array formed by the kernels is a symmetric matrix having orthogonal
rows and columns, so its inverse array is the same as the array itself.
 The concept of frequency exists also in Walsh transform basis functions. We can
think of frequency as the number of zero crossings or the number of transitions in
a basis vector and we call this number sequency. The Walsh transform exhibits the
property of energy compaction.
 For the fast computation of the Walsh transform there exists an algorithm called
Fast Walsh Transform (FWT). This is a straightforward modification of the FFT.
KARHUNEN-LOEVE (KLT) or HOTELLING TRANSFORM

Drawbacks of the Karhunen-Loeve transform


 Its basis functions depend on the covariance matrix of the image, and hence they have
to recomputed and transmitted for every image.
 Perfect decorrelation is not possible, since images can rarely be modelled as
realizations of ergodic fields.
 There are no fast computational algorithms for its implementation.
 Despite of better energy compaction offered by KL transform, it requires huge
amount of computation complexity.

You might also like