0% found this document useful (0 votes)
12 views8 pages

21ai601 CV LM9 2

JASH

Uploaded by

tn20jashyt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

21ai601 CV LM9 2

JASH

Uploaded by

tn20jashyt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

21AI601 – COMPUTER VISION

UNIT II & LP 9 – IMAGE PYRAMIDS AND GAUSSIAN


DERIVATIVE FILTERS, GABOR FILTERS AND DWT

1. IMAGE PYRAMIDS
Image information occurs over many different spatial scales. Image pyramids multi-
resolution representations for images are a useful data structure for analyzing and
manipulating images over a range of spatial scales. Here well discuss three different ones, in
a progression of complexity. The first is a Gaussian pyramid, which creates versions of the
input image at multiple resolutions. This is useful for analysis across different spatial scales,
but doesnt separate the image into different frequency bands. The Laplacian pyramid provides
that extra level of analysis, breaking the image into different isotropic spatial frequency bands.
The Steerable pyramid provides a clean separation of the image into different scales and
orientations. There are various other differences between these pyramids, which well describe
below. As a motivating example, lets assume we want to detect the birds from figure ?? using
the normalized correlation approach. If we have a template of a bird, the normalized
correlation will be able to detect only the birds that have a similar image size than the template.
To introduce scale invariance, one possible solution is to change the size of the template to
cover a wide range of possible sizes and apply them to the image. Then, the ensemble of
templates will be able to detect birds of different sizes. The disadvantage of this approach is
that it will be computationally expensive as detecting large birds will require computing
convolutions with big kernels which is very slow.
Another alternative is to change the image size resulting in a multiscale image pyramid.
In this example, the original image has a resolution of 848 643 pixels. Each image in the
pyramid is obtained by scaling down the image from the previous level by reducing the
number of pixels by factor of 25%. This operation is called downsampling and we will study
it in detail in this chapter. Now we can use the pyramid to detect birds at different sizes using
a single template. The red box in the figure denotes the size of the template used. The figure
shows how birds of different sizes become detectable at, at least, one of the levels of the
pyramid. This method will be more efficient as the template can be kept small and the
convolutions will remain computationally efficient.

Multiscale Image Pyramid

Each image is 25% smaller than the previous one. The red box indicates the size of a template used for
detecting ying birds. As the size of the template is xed, it will only be able to detect the birds that tightly
fit inside the box. Birds that are smaller or larger will not be detected within a single scale. By running
the same template across many levels in this pyramid, di erent birds instances are detected at different
scales.
Mutiscale image processing and image pyramids have many applications beyond scale invariant object
detection.
1.1 Linear image transforms
Lets rst look at some general properties of linear image transforms. For an input image x of N
pixels, a linear transform is:

where r is a vector of dimensionality M, and P is a matrix of size N M. The columns of P =


[P0,P1,…PM-1] are the projection vectors. The vector r contains the transform coefficients: ri =
PT i x. The vector r corresponds to a different representation of the image x than the original
pixel space. The transform P is said to be critically sampled when M = N. The transform is
over sampled when M > N, and under-sampled when M < N. We are interested in transforms
that are invertible, so that we can recover the input x from the projection coefficients r:

The columns of B = [B0,B1,… BM 1] are the basis vectors. The input signal x can be reconstructed as a
linear combination of the basis vectors Bi weighted by the representation coefficients ri. The transform P
is complete, encoding all image structure, if it is invertible. If critically sampled (i.e., M = N) and the
transform is complete, then B = (PT) -1. If it is over complete (over-sampled and complete), then the inverse
can be obtained using the pseudo inverse B = (PPT) -1P.

1.2 Gaussian Pyramid


A gaussian filter is a natural one to use to blur out an image, since multiple applications of a gaussian
filter is equivalent to application of a single, wider gaussian filter. Here is an elegant, efficient algorithm
for making a resolution reduced version of an input image. It involves two steps: convolving the image
with a low-pass filter (for example, using the4-thbinomial filter b4=[1,4,6,4,1] /16, normalizedtosumto1,
separably in each dimension),and then subsampling by a factor of 2 the result. Each level is obtained by
filtering the previous level with the 4-th binomial filter with a stride of 2(on each dimension). Applied
recursively, this algorithm generates a sequence of images, subsequent ones being smaller, lower
resolution versions of the earlier ones in the processing.
To make the filters more intuitive, it is useful to write the two steps in matrix form. The following matrix
shows the recursive construction of level k + 1 of the Gaussian pyramid for a 1D image:
gk+1 = DkBkgk = Gkgk

where Dk is the down sampling operator, Bk is the convolution with the 4-th binomial filter, and Gk =
DkBk is the blur-and-down sample operator for level k. We call the sequence of images g0, g1, …, gN as
the Gaussian pyramid. The first level of the Gaussian pyramid is the input image: g0 = x.

It is useful to check a concrete example. If x is a 1D signal of length 8, and if we assume zero boundary
conditions, the matrices for computing g1 are:

the first level of the gaussian pyramid is a signal g1 with length 4. Applying the recursion
we can write the output of each level as a function of the input x: g2 = G1G0x, g3 =
G2G1G0x, and so on. For 2D images the operations are analogous. Figure 3.3 shows the
Gaussian pyramid of an image.

1.3 Laplacian pyramid


In the gaussian pyramid, each level losses some of the fine image details available in the previous
level. The Laplacian pyramid is simple: it represents, at each level, what is present in a Gaussian pyramid
image of one level, but not present at the level below it. We calculate that by expanding the lower-
resolution Gaussian pyramid image to the same pixel resolution as the neighboring higher-
resolution Gaussian pyramid image, then subtracting the two. This calculation is made in
a recursive, telescoping fashion.
Let’s look at the steps for calculating a Laplacian pyramid. What we want is to
compute the difference between gk and gk+1. To do this first we need to upsample the
image gk+1 so that it has the same size as gk. Let Fk = BkUk be the upsample-and-blur
operator for pyramid level k. The operator Fk applies first the upsampling operator Uk,
that inserts zeros between samples, followed by blurring by the same filter Bk than the
one we used for the Gaussian pyramid. The Laplacian pyramid coefficients, lk, at
pyramid level k, are:
Laplacian Pyramid

For instance, for a 1D input x of length 8, and assuming zero boundary conditions, the
operators to compute the first level of the Laplacian pyramid are:

The factor 2 is necessary because inserting zeros decreases the average value of the signal
gk+1 by a factor of 2.
The Laplacian pyramid is an overcomplete representation (more coefficients than
pixels): the dimensionality of the representation is higher than the dimensionality of the
input.
Note that the reconstruction property of the Laplacian pyramid does not depend on the filters
used for subsampling and upsampling. Even if we used random filters the reconstruction
property would still hold.

1.3.1 Image blending


The Laplacian pyramid is used in many image processing or analysis applications.
Here we show one fun application: image blending. The goal is to combine two images
into one. A mask is used to define how the images will be combined. If we want to blend
the following two images using the mask shown in the right:

Making a sharp transition from one image to another gives an artifactually sharp
image boundary (see the straight edge of the apple/orange.) Using the Laplacian pyramid,
we can transition from one image to the next over many different spatial scales to make
a gradual transition between the two images. First, we build the Laplacian pyramid for
the two input images, in this example we use 7 levels and we also keep the last low-pass
residual:

and the Gaussian pyramid of the mask as shown below (note that we use 8
levels, one level more than for the Laplacian pyramid):

m0 m1 m2 m3 m4 m5 m6 m7

Now we combine the three pyramids to compute the Laplacian pyramid of the blended
image. The Laplacian pyramid of the blended image is obtained as:

1.4 Steerable pyramid


The Laplacian pyramid provides a richer representation than the Gaussian
pyramid. But we would like to have an even more expressive image representation. The
steerable pyramid adds information about image orientation. Therefore, the Steerable
representation is a multi- scale oriented representation that is translation-invariant. It is
non-aliased and self-invertible. Ideally, we’d like to have an image transformation that
was shiftable–where we could perform interpolations in position, scale, and orientation
using linear combinations of a set of basis coefficients. The steerable pyramid goes part
way there.

We analyze in orientation using a steerable filter bank. We form a decomposition


in scale by introducing a low-pass filter (designed to work with the selected bandpass
filters), and recursively breaking the low-pass filtered component into angular and low-
pass frequency components. Pyramid subsampling steps are preceded by sufficient low-
pass filtering to remove aliasing.

To ensure that the image can be reconstructed from the steerable filter transform
coefficients, the filters must be designed so that their sums of squared magnitudes “tile”
in the frequency domain. We reconstruct by applying each filter a second time to the
steerable filter representation, and we want the final system frequency response to be flat,
for perfect reconstruction.
One block of the Steerable pyramid computation

The Steerable pyramid is a self-inverting overcomplete representation (more coefficients


than pixels).

The following block diagram shows the steps to build a 2 level steerable pyramid
and the reconstruction of the input. The architecture has two parts: 1) the analysis
net- work (or encoder) that transforms the input image x into a representation
composed of r = [b0,0, …, b0,n, b1,0, …b1,n, …, bk−1,0, …bk−1,n] and the low pass residual
gk−1. And 2) the synthesis network (or decoder) that reconstructs the input from the
representation r.
Steps to Build Steerable Pyramid

You might also like