0% found this document useful (0 votes)
64 views

Lect 1 and 2

This document provides an overview of an image processing course. It introduces the instructor, textbooks, prerequisites, grading structure, and coarse outline. It then discusses the motivation for image processing, defines digital images and pixels, distinguishes between image processing and computer vision, and describes different levels of visual processing. The document also introduces different imaging modalities and the key components of an image processing system. It provides an overview of the human visual system and light/electromagnetic spectrum, focusing on visible light and color.

Uploaded by

renmh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Lect 1 and 2

This document provides an overview of an image processing course. It introduces the instructor, textbooks, prerequisites, grading structure, and coarse outline. It then discusses the motivation for image processing, defines digital images and pixels, distinguishes between image processing and computer vision, and describes different levels of visual processing. The document also introduces different imaging modalities and the key components of an image processing system. It provides an overview of the human visual system and light/electromagnetic spectrum, focusing on visible light and color.

Uploaded by

renmh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

BİL 467/561 – Image Processing

TOBB ETU
Fall 2022
Lectures 1 and 2
Introduction to Digital Image Processing
and
Digital Image Processing Fundamentals

1
Course overview
• Lecturer: Dr. Toygar Akgün / Teaching assistant: ?
• Main textbooks:
• Digital Image Processing, Global Edition (4th Ed.), by Rafael C. Gonzalez,
Richard E. Woods
• Image Processing, Analysis, and Machine Vision (4th Ed.), by Milan Sonka,
Vaclav Hlavac, Roger Boyle
• Prerequisites
• Linear algebra
• Entry level signal processing
• Python or C/C++

2
Couse materials and grading
• Your main resource should be your text books and code samples.
• Main SW framework will be OpenCV.
• Programming exercises will require a computer.
• Grading:
• Midterm (30%)
• Final (40%)
• Quizes (30%)
• Grading feedback. Suggestions? Preferences?

3
Very coarse outline
• Basics of image sensing, acquisition and sampling, basic of image transforms
• Histogram processing
• 2D spatial filtering, non-linear filtering
• Image restoration and reconstruction, denoising
• Morphological image processing
• Color, multi-spectral and hyper-spectral image processing
• Image segmentation, super-pixel methods, pattern classification with template
matching
• Advanced topics – if time permits – compression, wavelets, resolution synthesis
4
Motivation
• Humans are highly visual creatures.
• We were drawing long before we were writing – maybe even properly talking.
• We are extremely good at processing visual information.
• High level visual tasks such as:
• Object detection, classification, tracking
• Color classification
• Change detection
• Image analysis, segmentation and interpretation
are trivial for the human brain.

5
Motivation
• Images are highly efficient at conveying information:
One picture is worth more than ten thousand words.
• As such, a large portion of the accumulated and accumulating human knowledge
(historical, technical documents, art, entertainment, medical and surveillance
records) is visual in nature…

6
Images and pixels
• An image is a two-dimensional function, 𝑓(𝑥, 𝑦), where 𝑥 and 𝑦 are spatial
(plane) coordinates, and the amplitude of 𝑓 at any pair of coordinates (𝑥, 𝑦) is
called the intensity or gray level of the image at that point.
• When 𝑥, 𝑦, and the intensity values of 𝑓 are all finite, discrete quantities, we call
the image a digital image.
• The field of digital image processing refers to processing digital images by means
of a digital computer.
• Note that a digital image is composed of a finite number of elements, each of
which has a particular location and value.
• These elements are called picture elements, a.k.a. pixels.

7
Image processing and computer vision
• Image processing and computer vision are used interchangeably.
• There exists two "proposed" distinctions for image processing:
• Image processing is a discipline where both the input and output of the processing
are images.
• Image processing is a discipline where the output image of the processing is
intended for a visual inspection by a human observer/agent.
• There is one for computer vision:
• Computer vision’s ultimate goal is to use computers to emulate human vision,
including learning and being able to make inferences and take actions based on
visual inputs.

8
Levels of visual processing
• Low-level processes are characterized by the fact that both the inputs and
outputs are images.
• Noise reduction, contrast enhancement, image sharpening, …
• Mid-level processes are characterized by the fact that the inputs generally are
images, but the outputs are attributes extracted from those images (e.g., edges,
contours, and the identity of individual objects).
• Segmentation (partitioning an image into regions or objects), description of those objects to
reduce them to a form suitable for computer processing, and classification (recognition) of
individual objects.
• High-level processes are characterized by performing the cognitive functions
normally associated with human vision.
• Semantics - “making sense” of recognized objects, image captioning, anomaly detection, …

9
What is an “imaging modality”?
• An image is a signal sampled (captured) on a 2D regular spatial grid (pixel
locations).
• We have different imaging modalities depending on which characteristic of the
underlying signal is captured and what type of a sensor is used:
• Gamma-ray imaging (EM radiation captured for nuclear medicine and astronomical
observations)
• X-ray imaging (EM radiation captured for medical and industrial imaging)
• Ultra-violet band imaging (EM radiation captured for lithography, industrial inspection,
microscopy, lasers, biological imaging, and astronomical observations)
• Visible band imaging (EM radiation captured for commercial imaging, mobile phones, art,
entertainment)
• Infra-red band imaging (EM radiation captured for surveillance, night vision)
• Micro-wave band imaging (EM radiation captured for imaging radars)
• Radio-band imaging (EM radiation captured for Magnetic Resonance Imaging machines)
• Ultra-sound imaging (Mechanical waves captured for ultra-sound machines)

10
Components of an image processing system

GPU, FPGA or ASIC

11
Human visual system
• The innermost membrane of the eye is the
retina, which lines the inside of the wall’s
entire posterior portion.
• When the eye is focused, light from an object
is imaged on the retina.
• Pattern vision is afforded by discrete light
receptors distributed over the surface of the
retina.
• There are two types of receptors: cones and
rods.

12
Human visual system
• There are between 6 and 7 million cones in
each eye.
• They are located primarily in the central
portion of the retina, called the fovea, and are
highly sensitive to color.
• Humans can resolve fine details because each
cone is connected to its own nerve end.
• Muscles rotate the eye until the image of a
region of interest falls on the fovea. Cone
vision is called photopic or bright-light vision.

13
Human visual system
• The number of rods is much larger: Some 75
to 150 million are distributed over the retina.
• The larger area of distribution, and the fact
that several rods are connected to a single
nerve ending, reduces the amount of detail
discernible by these receptors.
• Rods capture an overall image of the field of
view.

14
Human visual system
• Rods are not involved in color vision and are
sensitive to low levels of illumination.
• For example, objects that appear brightly
colored in daylight appear as colorless forms
in moonlight because only the rods are
stimulated.
• This phenomenon is known as scotopic or
dim-light vision.

15
Cone cell sensitivity for red, green and blue

Human eye is most sensitive to green


because the green receptors’ input
band substantially overlaps with blue
and red sensing cone receptors.

16
Human visual system
• In an ordinary photographic camera, the lens has a fixed focal length.
• Focusing at various distances is achieved by varying the distance between the
lens and the imaging plane, where the film (or imaging chip in the case of a digital
camera) is located.

17
Human visual system
• In the human eye, the converse is true; the distance between the center of the
lens and the imaging sensor (the retina) is fixed, and the focal length needed to
achieve proper focus is obtained by varying the shape of the lens.
• The fibers in the ciliary body accomplish this by flattening or thickening the lens
for distant or near objects, respectively.

18
Light and electromagnetic spectrum

19
Visible band
• Light is a type of electromagnetic radiation that can be sensed by the eye.
• The visible (color) band of the electromagnetic spectrum spans the range from
approximately 0.43 mm (violet) to about 0.79 mm (red).
• For convenience, the color spectrum is divided into six broad regions: violet, blue,
green, yellow, orange, and red.
• No color (or other component of the electromagnetic spectrum) ends abruptly;
rather, each range blends smoothly into the next.

20
Color
• The colors perceived in an object are determined by the nature of the light
reflected by the object.
• A body that reflects light relatively balanced in all visible wavelengths appears
white to the observer.
• However, a body that favors reflectance in a limited range of the visible spectrum
exhibits some shades of color.
• For example, green objects reflect light with wavelengths primarily in the 500 to
570 nm range, while absorbing most of the energy at other wavelengths.

21
Grayscale
• Light that is void of color is called monochromatic (or achromatic) light.
• The only attribute of monochromatic light is its intensity.
• Because the intensity of monochromatic light is perceived to vary from black to
grays and finally to white, the term gray level is used commonly to denote
monochromatic intensity (we use the terms intensity and gray level
interchangeably in subsequent discussions).
• The range of values of monochromatic light from black to white is usually called
the gray scale, and monochromatic images are frequently referred to as
grayscale images.

22
Image acquisition using a single sensing element
• Figure below shows the components of a single sensing element.
• A familiar sensor of this type is the photodiode, which is constructed of silicon
materials and whose output is a voltage proportional to light intensity.

23
Image acquisition using a single sensing element
• Using a filter in front of a sensor improves its selectivity. For example, an optical
green-transmission filter favors light in the green band of the color spectrum.
• As a result, the sensor output would be stronger for green light than for other
visible light components.

24
Image acquisition using a sensor arrays
• Most CMOS sensors cannot sample all color
planes at all pixels. So at any given pixel
location we only have one color component.
• The most typical sampling pattern is called the
Bayer pattern (right bottom).
• Note that green color channel is sampled twice
as much compared to red and blue. This is
because human eye is most sensitive to green.
• The image processing technique used to obtain
all color values at all pixels is called
demosaicking.

25
A simple image formation model

26
Image sampling and quantization

27
Representing digital images

28
Representing digital images
• It is important to note that the positive
x-axis extends downward, and the
positive y-axis extends to the right.
• This is precisely the right-handed
Cartesian coordinate system with which
you are familiar but shown rotated by
90° so that the origin appears on the
top, left.
• x direction is usually called rows
• y direction is usually called columns

29
Bit depth or intensity resolution
• The number of bits used to represent the values at each pixel location is called
the bit depth of the image.
• When an image can have 2𝑘 possible intensity levels, it is common practice to
refer to it as a “k-bit image”.
• Hence, a 256-level image is called an 8-bit image.
• Depending on the number of pixels and bit depth of each pixel an uncompressed
image can be megabytes in size.

30
Spatial resolution
• Intuitively, spatial resolution is a measure of
the smallest discernible detail in an image.
• Spatial resolution is neither equivalent nor
can it be measured by the number of pixels
in an image.
• A true definition of resolution requires
frequency domain concepts we will look
into later.

31
Image interpolation
• Interpolation is used in tasks such as zooming, shrinking, rotating, and geometrically correcting digital
images.
• Image reduced to 72 dpi and zoomed back to its original 930 dpi using (a) nearest neighbor
interpolation, (b) bilinear interpolation and (c) bicubic interpolation.

(a) (b) (c)

32
Nearest neightbor
• Just copy the nearest pixel’s value.

33
Bilinear interpolation – 1D
• Interpolate by fitting a straight line to the two nearest neighbors.

34
Bilinear interpolation – 2D
• Just apply 1D bilinear three times.
• First twice in x (or y) direction to get two
intermediate values
• Then a third time in y (or x if you picked y in the
previous step) direction using the intermediate
values obtained in the previous step.
• These ratios can be combined into single
multipliers for all four neighbors.

35
Cubic interpolation – 1D
• Interpolate by fitting a third degree polynomial
𝑓 𝑥 = 𝑎𝑥 3 + 𝑏𝑥 2 + 𝑐𝑥 + 𝑑
𝑓′ 𝑥 = 3𝑎𝑥 2 + 2𝑏𝑥 + 𝑐
to the four nearest neighbors.

• The condition on the first derivative of the cubic polynomial is to ensure


smoothness on the left and right connections.

36
Cubic interpolation – 2D
• Just apply 1D bicubic five times.
• First four times in x (or y) direction to get four intermediate values
• Then a fifth time in y (or x if you picked y in the first step) direction using the
intermediate values obtained in the previous step.

37
Neighbors of a pixel
• A pixel p at coordinates (x, y) has two horizontal
and two vertical neighbors with coordinates
(x + 1, y), (x − 1, y), (x, y + 1), (x, y − 1)

• This set of pixels, called the 4-neighbors of p, is


denoted N4( p).

38
Neighbors of a pixel
• The four diagonal neighbors of p have coordinates
(x + 1, y + 1), (x + 1, y − 1), (x − 1, y + 1), (x − 1, y − 1)
and are denoted ND(p).
• These neighbors, together with the 4-neighbors,
are called the 8-neighbors of p, denoted by N8( p).
• The set of image locations of the neighbors of a
point p is called the neighborhood of p.
• The neighborhood is said to be closed if it contains
p. Otherwise, the neighborhood is said to be open.

39
Distance measures
• For pixels p, q, and s, with coordinates ( x, y), (u, v), and (w, z), respectively, D is a metric if
(a) D(p,q) ≥ 0 (D(p,q)=0 if and only if p=q),
(b) D(p,q) = D(q, p), and
(c) D(p,s) ≤ D(p,q) + D(q,s).

• The Euclidean distance between p and q is defined as 𝐷𝑒 𝑝, 𝑞 = (𝑥 − 𝑢)2 +(𝑦 − 𝑣)2

• The city-block distance between p and q is defined as 𝐷4 𝑝, 𝑞 = 𝑥 − 𝑢 + 𝑦 − 𝑣


• The pixels with 𝐷4 = 1 are the 4-neighbors of (x, y).

• The chess-board distance between p and q is defined as 𝐷8 𝑝, 𝑞 = max( 𝑥 − 𝑢 , 𝑦 − 𝑣 )


• The pixels with 𝐷8 = 1 are the 8-neighbors of the pixel at (x, y).

40
Elementwise versus matrix operations
• An elementwise operation involving one or more images is carried out on a pixel-
by-pixel basis.
• For example, elementwise multiplication for two “images”:

41
Basic set operations

42
Logical operations

43
Spatial operations
• Spatial operations are performed directly on the pixels of an image:
(1) Single-pixel operations (common)
(2) Neighborhood operations (common)
(3) Geometric spatial transformations (uncommon)

44
Single-pixel operations
• The simplest operation we perform on a digital image is to alter the intensity of
its pixels individually using a transformation function, T, of the form:
s = T(z)
where z is the intensity of a pixel in the original image and s is the (mapped)
intensity of the corresponding pixel in the processed image.
• Typical gamma correction and simple contrast enhancement techniques can be
implemented this way.
• Note that given the possible intensity levels in any practical image format is finite
(256 for 8-bits), this can easily be implemented as a look-up operation.

45
Example: Contrast processing

46
Neighborhood operations
• Let Sxy denote the set of coordinates of a
neighborhood centered on an arbitrary
point (x, y) in an image, f.
• Neighborhood processing generates a
corresponding pixel at the same
coordinates in an output (processed)
image, g, such that the value of that pixel
is determined by a specified operation on
the neighborhood of pixels in the input
image with coordinates in the set Sxy .

47
Neighborhood operations
• For example, suppose that the specified
operation is averaging in a rectangular
neighborhood of size m × n centered on
(x, y).

• The coordinates of pixels in this region are


the elements of set Sxy . We can express
this averaging operation as

1
𝑔 𝑥, 𝑦 = ෍ 𝑓(𝑟, 𝑐)
𝑚𝑛
𝑟,𝑐∈𝑆𝑥𝑦
where r and c are the row and column
coordinates of the pixels whose coordinates
are in the set Sxy.

48
Neighborhood operations
• Image g is created by varying the
coordinates (x,y) so that the center of
the neighborhood moves from pixel to
pixel in image f, and then repeating the
neighborhood operation at each new
location.

49
Example: Sharpen

50
Example: Blur

51
Geometric transformations
• We use geometric transformations modify the spatial arrangement of pixels in an
image.
• These transformations are called rubber-sheet transformations because they may
be viewed as analogous to “printing” an image on a rubber sheet, then stretching
or shrinking the sheet according to a predefined set of rules.
• Geometric transformations of digital images consist of two basic operations:
1. Spatial transformation of coordinates.
2. Intensity interpolation that assigns intensity values to the spatially transformed
pixels.

52
Geometric transformations
• The transformation of coordinates may be expressed as:

where (x, y) are pixel coordinates in the original image and (x′, y′) are the corresponding pixel
coordinates of the transformed image.
• For example, the transformation(x′, y′) = (x/2, y/2) shrinks the original image to half its size in
both spatial directions.

• Our interest is in so-called affine transformations (6 parameters), which include scaling,


translation, rotation, and shearing.

• The key characteristic of an affine transformation in 2-D is that it preserves points, straight lines,
and planes.

53
Geometric transformations
• Previous expression can be used to express the transformations just mentioned, except
translation, which would require that a constant 2-D vector be added to the right side of
the equation.
• However, it is possible to use homogeneous coordinates to express all four affine
transformations using a single 3 × 3 matrix in the following general form:

• This transformation can scale, rotate, translate, or sheer an image, depending on the
values chosen for the elements of matrix A. (a11, a12, a13, a21, a22, a23 – 6 parameters)
• Multiple transformations can be piggy-backed as multiple matrix multiplications.

54
Geometric transformations
• The preceding transformation moves the coordinates of pixels in an
image to new locations.

• To complete the process, we have to assign intensity values to those


locations.

• This task is accomplished using intensity interpolation by nearest


neighbor, bilinear, and bicubic interpolation techniques.

55
Example: Rotate

56
Nearest neightbor, bilinear and bicubic
Input image

Output image:
• Rotate and
• Zoom out
Imagine that pixel values
are assigned at the
centers of the boxes (+).

57
Nearest neightbor, bilinear and bicubic

Imagine that pixel values


are assigned at the
centers of the boxes (+).

58
Geometric transformations
• We can implement geometric transformations in two basic ways.
• The first, is a forward mapping, which consists of scanning the pixels of the input
image and, at each location (x, y), computing the spatial location (x′, y′) of the
corresponding pixel in the output image using transformation formula.
• But this just does not work? Why?

59
Geometric transformations
• There two problems with the forward mapping approach:
1. Two or more pixels in the input image can be transformed to the same location
in the output image, raising the question of how to combine multiple output
values into a single output pixel value.
2. It is possible that some output locations may not be assigned a pixel at all.
• The second approach (actually the one and only approach) is starting from the
output and going backward.

60
Geometric transformations
• The second approach, called inverse mapping, scans the output pixel locations
and, at each location (x′, y′), computes the corresponding location in the input
image using (x, y) = A−1(x′, y′).
• It then interpolates among the nearest input pixels to determine the intensity of
the output pixel value.
• Inverse mappings are more efficient to implement than forward mappings and
are used in numerous commercial implementations of spatial transformations
(for example, MATLAB uses this approach).

61
Geometric transformations

62
Geometric transformations

63
Image registration
• Image registration is an important application of digital image processing used to
align two or more images of the same scene.
• In image registration, we have available an input image and a reference image.
• The objective is to transform the input image geometrically to produce an output
image that is aligned (registered) with the reference image.
• Unlike the discussion in the previous section where transformation functions are
known, the geometric transformation needed to produce the output, registered
image generally is not known, and must be estimated.

64
Image registration
• Examples of image registration include:
• Aligning two or more images taken at approximately the same time, but using
different imaging systems, such as an MRI (magnetic resonance imaging) scanner and
a PET (positron emission tomography) scanner, or
• Multiple cameras located at different locations.
• Or, perhaps the images were taken at different times using the same instruments,
such as satellite images of a given location taken several days, months, or even
years apart.

65
Image registration
• In either case, combining the images or performing quantitative analysis and
comparisons between them requires compensating for geometric distortions
caused by:
• Differences in viewing angle,
• Distance,
• Orientation,
• Sensor resolution,
• Shifts in object location,
• and other factors.

66
Image registration

input reference tie points (control points) registered image

67
A closely related problem is image stitching

68
Key/tie/control points
• One of the principal approaches for solving the problem just discussed is to use
tie points (also called key or control points).
• These are corresponding points whose locations are known precisely in the input
and reference images.
• Approaches for selecting tie points range from selecting them interactively to
using algorithms that detect these points automatically.
• Some imaging systems have physical artifacts (such as small metallic objects)
embedded in the imaging sensors.
• These produce a set of known points (called reseau marks or fiducial marks)
directly on all images captured by the system. These known points can then be
used as guides for establishing tie points.

69
Estimating the transformation function
• The problem of estimating the transformation function is one of modeling.
• For example, suppose that we have a set of four tie points each in an input and a
reference image. A simple model based on a bilinear approximation is given by:
x = c1v + c2w + c3vw + c4
y = c5v + c6w + c7vw + c8
(v, w for input image and x, y for reference image)

• During the estimation phase, (v, w) and (x, y) are the coordinates of tie points in
the input and reference images, respectively.
• If we have four pairs of corresponding tie points in both images, we can write
eight equations using equations above and use them to solve for the eight
unknown coefficients, c1 through c8.

70
Forward mapping
• After the coefficients have been computed, we let (v, w) denote the coordinates
of each pixel in the input image, and (x, y) become the corresponding coordinates
of the output image.
• The same set of coefficients, c1 through c8 , are used in computing all coordinates
(x, y); we just step through all (v, w) in the input image to generate the
corresponding (x, y) in the output, registered image.
• If the tie points were selected correctly, this new image should be registered with
the reference image, within the accuracy of the bilinear approximation model.

71
Backward mapping
• For reasons we discussed previously, the forward mapping approach is
problematic.
• Backward mapping approach is much more preferable for implemetation
purposes.
• So instead of computing the transformation that maps the input to the output
image, the inverse mapping coefficients, c’1 through c’8 , are used in computing all
coordinates (v, w); we just step through all (x, y) in the output image to generate
the corresponding (v, w) in the input image.
v = c’1x + c’2y + c’3xy + c’4
w = c’5x + c’6y + c’7xy + c’8

• As most of the computed pixel locations will be off-grid in the input image, use
interpolation (bilinear, bicubic) to compute their intensity values.

72
Image registration
• In situations where four tie points are insufficient to obtain satisfactory
registration, an approach used frequently is to select a larger number of tie points
and then treat the quadrilaterals formed by groups of four tie points as
subimages.
• The subimages are processed as above, with all the pixels within a quadrilateral
being transformed using the coefficients determined from the tie points
corresponding to that quadrilateral (dörtgen).
• Then we move to another set of four tie points and repeat the procedure until all
quadrilateral regions have been processed.

73
Image registration
• It is possible to use more complex regions than quadrilaterals, and to employ
more complex models, such as polynomials fitted by least squares algorithms.
• The number of control points and sophistication of the model required to solve a
problem is dependent on the severity of the geometric distortion.
• Finally, keep in mind that the transformations defined previously or any other
model for that matter, only map the spatial coordinates of the pixels in the input
image.
• We still need to perform intensity interpolation using any of the methods
discussed previously to assign intensity values to the transformed pixels.

74
Realistic image registration
• Image registration is quite common in realistic image processing
applications/problems.

• A realistic image registration process flow can be given as:


1. Detection of key points (key pixels) in the images to be registered
2. Obtaining local patches centered at these key pixels and extracting descriptors from
these patches
3. Matching the key points between the images to be registered using the descriptors
extracted around key pixels
4. Using these matchings to obtain the quations required to compute the transformation
(affine, etc.) between the images to be matched
5. Finally, using these transformation parameters to map the input image to the
reference image and completing the registration tasks

75
Detection of key points
• In many real-life scenarios the key points cannot be marked into the image during
acquisition process.
• So, you need to extract such key pixels from the images.
• These key pixels are typically distinguished as pixels with high gradients in
multiple directions (corners, etc.)
• One of the most well-known key point detectors is Harris corner detector
(invented by Chris Harris and Mike Stephens in 1988).

76
Detection of key points

77
Extracting local descriptors
• If your key point detector is a simple one (such as Harris corner detector) that can
only localize key pixels without extracting local features that can be matched
between input and reference images, then you also need to compute these local
features using local patches centered at your detected key pixels.
• As a simple solution, you can use the pixel values within the local patch.
• Histogram of gradients of the patch is a better and more popular choice.
• Some algorithms combine key point/pixel detection with local feature/descriptor
extraction steps into a single comple algorithm:
1. Scale-Invarient Feature Transform (SIFT): Invented in 1999, still in use.
2. Speeded-Up Robust Features (SURF): Designed to be faster than SIFT.

78
Computing the image transform parameters
• You can have hundreds of local features centered at key points in both images to
be registered.
• Exhaustive search is used to compare these local descriptors and find the best
matching pairs.
• These matched pixel locations provide the equations used to compute the
parameters of the image transformation that relates the input and reference
images.

79
Computing the image transform parameters

80
Vector and matrix operations
• As we saw in geometric transformations, matrix / vector representations are
quite common in image processing.
• Pixel coordinate transformations are much simpler to represent in matrix / vector
format.
• Multispectral and hyperspectral image processing are typical areas in which
vector and matrix operations are used routinely.

81
Vector and matrix operations
• For example, typical color images are formed in
RGB color space by using red, green, and blue
component images.
• Here we see that each pixel of an RGB image
has three components, which can be organized
in the form of a column vector z:

• Here z1 is the intensity of the pixel in the


red image, and z2 and z3 are the
corresponding pixel intensities in the green
and blue images, respectively.

82
Vector and matrix operations
• Thus, an RGB color image of size M × N can be represented by three component
images of this size, or by a total of MN vectors of size 3×1.

• A general multispectral / hyperspectral case involving n component images will


result in n-dimensional vectors:

83
Vector and matrix operations
• Entire images can be treated as matrices (or, equivalently, as vectors), a fact that has important
implication in the solution of numerous image processing problems.
• For example, we can express an image of size M × N as a column vector of dimension MN × 1 by
letting the first M elements of the vector equal the first column of the image, the next M
elements equal the second column, and so on.
• With images formed in this manner, we can express a broad range of linear processes applied to
an image by using the notation
g = Hf + n
• Here f is an MN × 1 vector representing an input image, n is an MN × 1 vector representing an
M × N noise pattern, g is an MN × 1 vector representing a processed image, and H is an MN × MN
matrix representing a linear process applied to the input image.

84
Image transforms
• All the image processing approaches discussed thus far operate directly on the
pixels of an input image; that is, they work directly in the spatial domain.
• In some cases, image processing tasks are best formulated by:
1. Transforming the input images,
2. Carrying the specified task in a transform domain, and
3. Applying the inverse transform to return to the spatial domain.

85
Linear transforms
• A particularly important class of 2-D linear transforms, denoted T(u, v), can be
expressed in the general form:
𝑀−1 𝑁−1
𝑢 = 0, 1, … , (𝑀 − 1)
𝑇 𝑢, 𝑣 = ෍ ෍ 𝑓 𝑥, 𝑦 𝑟(𝑥, 𝑦, 𝑢, 𝑣) for
𝑣 = 0, 1, … , (𝑁 − 1)
𝑥=0 𝑦=0

• 𝑓 𝑥, 𝑦 is an input image, 𝑟(𝑥, 𝑦, 𝑢, 𝑣) is called a forward transformation kernel.


• As before, 𝑥 and 𝑦 are spatial variables, while 𝑀 and 𝑁 are the row and column
dimensions of the input image 𝑓.
• Variables 𝑢 and 𝑣 are called the transform variables.
• 𝑇 𝑢, 𝑣 is called the forward transform of 𝑓 𝑥, 𝑦 .

86
Linear transforms
• Given 𝑇 𝑢, 𝑣 , we can recover 𝑓 𝑥, 𝑦 using the inverse transform of 𝑇 𝑢, 𝑣 :

𝑀−1 𝑁−1
𝑥 = 0, 1, … , (𝑀 − 1)
𝑓 𝑥, 𝑦 = ෍ ෍ 𝑇 𝑢, 𝑣 𝑠(𝑥, 𝑦, 𝑢, 𝑣) for
𝑦 = 0,1, … , (𝑁 − 1)
𝑢=0 𝑣=0

• 𝑠(𝑥, 𝑦, 𝑢, 𝑣) is called an inverse transformation kernel.


• 𝑟(𝑥, 𝑦, 𝑢, 𝑣) and 𝑠(𝑥, 𝑦, 𝑢, 𝑣) together are called a transform pair.

87
Transform domain processing flow

88
Frequency domain processing
• One of the most famous, useful and widely-used linear transforms is the Fourier Transform which
can be given as:
𝑢𝑥 𝑣𝑦
−𝑗2𝜋( + )
𝑟 𝑥, 𝑦, 𝑢, 𝑣 = 𝑒 𝑀 𝑁

1 𝑗2𝜋(𝑢𝑥+𝑣𝑦)
𝑠 𝑥, 𝑦, 𝑢, 𝑣 = 𝑒 𝑀 𝑁
𝑀𝑁
𝑀−1 𝑁−1
𝑢𝑥 𝑣𝑦 𝑢 = 0, 1, … , (𝑀 − 1)
−𝑗2𝜋( 𝑀 + 𝑁 )
𝑇 𝑢, 𝑣 = ෍ ෍ 𝑓 𝑥, 𝑦 𝑒 for
𝑣 = 0, 1, … , (𝑁 − 1)
𝑥=0 𝑦=0

𝑀−1 𝑁−1
1 𝑢𝑥 𝑣𝑦
𝑗2𝜋( + ) 𝑥 = 0, 1, … , (𝑀 − 1)
𝑓 𝑥, 𝑦 = ෍ ෍ 𝑇 𝑢, 𝑣 𝑒 𝑀 𝑁 for
𝑀𝑁 𝑦 = 0,1, … , (𝑁 − 1)
𝑢=0 𝑣=0

89
Short notes on Fourier Transform
• A kernel is said to be separable if
𝑟(𝑥, 𝑦, 𝑢, 𝑣) = 𝑟1 (𝑥, 𝑢)𝑟2 (𝑦, 𝑣)

• A kernel is said to be symmetric if 𝑟1 (𝑥, 𝑢) is functionally equal to 𝑟2 (𝑦, 𝑣), so that


𝑟(𝑥, 𝑦, 𝑢, 𝑣) = 𝑟1 (𝑥, 𝑢)𝑟1 (𝑦, 𝑣)

• It can be shown that the Fourier kernels are separable and symmetric, and that
separable and symmetric kernels allow 2-D transforms to be computed using 1-D
transforms.

90
Why is frequency domain analysis important?
• Let’s start with some results from linear algebra
• Eigen-values and eigen-vectors:
𝐀𝒆= 𝜆𝒆

• Once the eigenvalues and eigenvectors of a matrix 𝐀 are known, these


eigenvectors can be used as a basis to decompose any given vector.
• This decomposition is special because:
𝒙 = 𝑘1 𝒆𝟏 + 𝑘2 𝒆𝟐 + … + 𝑘𝑁 𝒆𝑁 ⇒ 𝐀 𝒙 = 𝑘1 𝜆1 𝒆𝟏 + 𝑘2 𝜆2 𝒆𝟐 + … + 𝑘𝑁 𝜆𝑁 𝒆𝑁

91
LTI systems
• Linear Time Invariant (LTI) systems are of key importance as many real-life
problems can be modelled as one.

x(t) T(x) y(t)

• Linearity is defined by two properties:


• Scaling: T[ax(t)] = a T[x(t)]
• Superposition: T[x1(t)+x2(t)] = T[x1(t)] + T[x2(t)]

• Time invariant means:


• If T[x(t)] = y(t), then T[x(t-τ)] = y(t-τ)

92
LSI systems
• Linear Spatially Invariant (LSI) systems have an index or a spatial variable as
input instead of time.

x(s) T(x) y(s)

• Linearity is defined by two properties:


• Scaling: T[ax(s)] = a T[x(s)]
• Superposition: T[x1(s)+x2(s)] = T[x1(s)] + T[x2(s)]

• Spatially invariant means:


• If T[x(s)] = y(s), then T[x(s-q)] = y(s-q)

93
LTI/LSI systems
• LTI/LSI systems are defined by their impulse response and the convolution
operation:

𝑦 𝑡 = ෍ 𝑥 𝜏 ℎ(𝑡 − 𝜏)
𝜏=−∞

• Convolution operation is not easy to compute or interpret.


• As such, the design of the system – that is the design of the impulse response h(t)
– is quite hard in time domain.

94
Eigen-functions
• The idea of eigen-vectors extends beyond linear algebra.
• In the case of LTI systems we have eigen-functions:
T[e(t)] = 𝜆 e(t)

• It can be shown that complex exponentials are eigenfunctions of LTI systems.


• So, for any function x(t) that is decomposed as a combination of complex
exponentials, the effect of an LTI system on x(t) becomes very easy to compute.

95
Inner product operation on functions
• The inner product operator in linear algebra also extends beyond linear algebra.
• Inner product can be defined on discrete functions as

෍ 𝑓 𝑛 𝑔[𝑛]
𝑛=−∞

• This operation can be interpreted as taking projection of f on g.

96
Fourier Transform is a projection
• Fourier Transform is simply a projection onto a set of basis functions that are the
complex exponentials.
• Fourier transform of a function can be interpreted as showing how much energy
is present at different frequencies.
• Probably the only down side of Fourier analysis is that you lose all time
resolution.

97
Fourier Transform is a projection
• It can be shown that convolution in time domain becomes multiplication in the
frequency domain (that is simply the Fourier Transform domain)
• As a result, values of the Fourier Transform of the system impulse response can
be designed as multipliers to amplify or cut-down specific frequencies present in
the Fourier Transform of the input signal.
• A simple inverse transform completes the system design by providing the impulse
response coefficients in time-domain.

98
Pixels as random values
• We can treat image intensities as random quantities.

• For example, let 𝑧𝑖 for 𝑖 = 0,1, … , (𝐿 − 1) denote the values of all possible
intensities in an 𝑀 × 𝑁 digital image. The probability, 𝑝(𝑧𝑘 ), of intensity level 𝑧𝑘
occurring in the image is estimated as
𝑛𝑘
𝑝(𝑧𝑘 ) =
𝑀𝑁

• Here 𝑛𝑘 is the number of times that intensity 𝑧𝑘 occurs in the image and 𝑀𝑁 is
the total number of pixels.

99
Pixels as random values
• Once we have 𝑝(𝑧𝑘 ), we can determine a number of important image
characteristics such as mean and variance:

𝐿−1 𝐿−1

𝑚 = ෍ 𝑧𝑘 𝑝(𝑧𝑘 ) 𝜎 2 = ෍ (𝑧𝑘 −𝑚)2 𝑝(𝑧𝑘 )


𝑘=0 𝑘=0

100

You might also like