0% found this document useful (0 votes)
12 views182 pages

Lecture1 Merged

The document provides an introduction to Image Processing and Computer Vision, explaining key concepts such as the definition of computer vision, image processing, and how images and videos are represented and stored in computer systems. It discusses the structure of images as 2D arrays of pixels, the significance of color channels, and the basic principles of video as a sequence of frames. Additionally, it outlines the challenges of extracting meaningful information from images and introduces mathematical concepts relevant to the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views182 pages

Lecture1 Merged

The document provides an introduction to Image Processing and Computer Vision, explaining key concepts such as the definition of computer vision, image processing, and how images and videos are represented and stored in computer systems. It discusses the structure of images as 2D arrays of pixels, the significance of color channels, and the basic principles of video as a sequence of frames. Additionally, it outlines the challenges of extracting meaningful information from images and introduces mathematical concepts relevant to the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 182

Image Processing and Computer Vision

CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 1
Welcome to Image Processing and Computer Vision!

What is Computer Vision?


Computer Vision is about enabling a computer to be able to understand the contents of images/videos in a
manner similar to that of a human’s visual system.

What is Image Processing?


Image processing is about presenting the data in images/videos in an alternative manner so that vision problems
become easier to solve. Sharpening an image and computing the Fourier Transform of an image are examples
of image processing.

−→ In this course we will be dealing with images and videos


Understanding images on the computer

−→ This image has a width of 800 pixels and


a height of 533 pixels, so the resolution of
this image is 800x533.

How are images presented on computer hardware?


−→ An image (on the computer screen) is a 2D array of small dots called pixels.
−→ The dimension of the image in terms of pixels is called the resolution of the image. The same image
taken at a higher resolution will have more pixels and is 'clearer' when we see it.
−→ Each pixel has its own color.
−→ The pixel color comes from three light
emitting elements, which are called the
R, G and B elements of the pixel.
−→ We can control the light intensities of each
element separately (through software) to
Pixels in the hardware
make different colors.
−→ The R element produces pure red light
The G element produces pure green light
The B element produces pure blue light
Pixel color
−→ Usually but not always the R, G and B values of a pixel are each stored as an 8-bit number (1 byte).
This is called 8-bit color.

−→ If the R, G and B values are made equal then the pixel becomes a shade of grey

−→ Sometimes the R, G and B values are each stored as a 24-bit number. This is called true color.

How are images stored in computer software?


Images are stored as arrays of numbers
• Black and white images • Grayscale images • Colored images
Black and white images

−→ Each pixel is either black or white (ignore  


0 0 ...0
the red square) 0 0 . . . 0
black = 0 
 .. .. 

white = 1 . . 
−→ The image is stored as a 2D array (or 1 1 ...0
How the computer sees the black and white image
matrix) of binary values.
−→ Such an array is also called a black and
white image array.
Grayscale images

−→ Each pixel is a shade of grey ranging from  


22 24 . . . 30
black to white (ignore the red square) 22 24 . . . 30
black = 0 dark grey ≈ 0 
 ..

.. 
white = 255 light grey ≈ 255 . . 
−→ The image is stored as a 2D 93 95 . . . 24
array (or matrix) of bytes. How the computer sees the grayscale image

−→ Such an array is also called a


greyscale image array.
Colored images

  
18 20 . . . 30 23 25 . . . 34
−→ Each pixel has its own color. 18 20 . . . 24  23 25 . . . 30
 
−→ Color information is usually but not
 
 .. ..   .. .. 
. .  . . 
always stored by saving the R, G and B
78 80 . . . 23 102 104 . . . 25
values of each pixel.  
−→ The image is stored as three 2D arrays (of 27 29 . . . 23
27 29 . . . 30
bytes).   R, G and B
 .. ..  channels
There is one array for R values, one array . . 
for G values and one array for B values. 86 88 . . . 20
How the computer sees the Colored image
−→ Each channel is itself a grayscale image

Original colored image R channel

G channel B channel
Understanding videos

...
−→ A video is just a sequence of images. Each image in the video is called a frame.
−→ A small amount of motion takes place between two consecutive frames
−→ When the frames are played very rapidly we get the illusion of motion. The number of images
played in one second is called the framerate.
−→ Videos usually have a framerate of 25 frames per second.
−→ A HD video 'usually' has a resolution of 1920x1080 pixels and a framerate of 25 frames per second.
How are videos stored in a computer?
Videos are stored as a sequence of image arrays.
• Black and white videos (a sequence of black and white image arrays)
• Greyscale videos (a sequence of grayscale image arrays)
• Colored videos (three sequences of grayscale image arrays, namely the R, G and B channels.

Black and white videos Colored videos

G channel
Binary image arrays R Channel

Grayscale videos

B channel

Grayscale image arrays


−→ The computer sees images and videos as arrays of numbers
−→ Vision is the ability to be able to extract meaningful information from images and videos
−→ Computer Vision is about giving a computer the sense of vision.
A computer has to extract meaningful information from images and videos (which are stored as
arrays of numbers
  
18 20 . . . 30 23 25 . . . 34
18 20 . . . 24  23 25 . . . 30
  
 .. ..   .. ..  What we expect the computer to
. .  . . 
produce
78 80 . . . 23 102 104 . . . 25
 
27 29 . . . 23
27 29 . . . 30
How we see images   R, G and B
 .. ..  channels
. . 
86 88 . . . 20
How the computer sees the Colored image

−→ Computer Vision is a challenging problem to solve!


What do we mean by 'meaningful information'?

What do you see in the image? Do you see shapes in the left image?
Is it the number 2 or lots of colored circles? Are the regions A and B of different colors?
Both answers are technically correct but different. −→ We may see things which are not there
−→ Meaningful information can be subjective! because our brain 'fills in the blanks'.
−→ Meaningful information can be deceptive!
−→ In Computer Vision we are expecting a computer to take an array of numbers and see what we are
seeing even if those things are not there!
−→ Computer Vision is a challenging, subjective and deceptive problem to solve!
Extracting information from images
−→ Images are stored
  as arrays (or matrices) inside the computer
x11 x12 . . . x1n
x
 21 x22 . . . x2n 

 .. .. 
 . . 
xm1 xm2 . . . xmn
Image Statistics
−→ For any portion of the image array we can compute things like the mean, the variance, a histogram,
a probabilty distribution etc . . . These are called image statistics.
Geometry within images
−→ For any portion of the image array we can check whether certain kinds of shapes are present within
the image. We can locate lines, circles and other kinds of shapes within that portion (Hough
transform). This called geometry within the images
−→ Computer vision algorithms are 'broadly' categorized into three types.
(i) Statistics based algorithms (use image statistics)
(ii) Geometry based algorithms (use geometry within images)
(iii)Algorithms which attempt to mimic the human visual systems, e.g. neural networks.
Some interesting problems entailing Computer Vision

Industrial Inspection

−→ A foriegn object is detected an indutrial process


Object Tracking

−→ Certain moving objects are identified


Digitizing a book from video

−→ The following is done.


(i) Pages of a textbook are flipped
(ii) A video recording is made
(iii)A soft copy of the book is produced
Optical character recognition

−→ Text within an image or video is recognized.


Logistics for this course

Course information
−→ Join the What’s App group ''CMPE-443/674 IPCV Fall24''

How to reach me
−→ Send me a text or audio message on What’s App (fastest)
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 2
Mathematical terms and their notation
Scalars, vectors and matrices
−→ Scalars are denoted by lower case letters such as s, a, b etc . . .
−→ vectors will be denoted with boldface lower case letter having a bar such as x, y.
−→ Matrices will be denoted with upper case letters such M, X etc . . .
−→ x will be treated as column vector in this course.
 
x1
x2 
e.g. x ∈ R4 ⇒ x =  x3 

x4

−→ To write x as a row vector we will use xT


e.g. x ∈ R4 ⇒ xT = x1 x2 x3 x4
 
   T
x1 x1
 x2  
 = x1 x2 . . . xd T ⇒ xT = x1 x2 . . . xd =  x2 
    
−→ x ∈ Rd ⇒ x = 
. . . . . .
xd xd
Magnitude of a vector
−→ The magnitude of a vector x ∈ Rd is given by
p
∥x∥ = x21 + x22 + · · · + x2d

p p
∥x∥ = x21 + x22 ∥x∥ = x21 + x22 + x23
−→ Magnitude is also called Euclidean norm
Unit vector
A vector whose Euclidean norm is 1 is called a unit vector. ∥x∥ = 1 ⇔ x is a unit vector.
b = [0.1 0.2 0.2]T
−→ Unit vectors will be denoted with boldface lower case letter having a hat. e.g. x

Normalization of a vector
−→ Dividing a vector by its magnitude will make it a unit vector (called normalization of a vector).
x
xb= is the normalization of x
∥x∥

Angle between vectors


Orthonormal vectors
If two vectors have a Euclidean norm of 1 and an angle of 90◦ between them then they are orthonormal vectors.
−→ If x, y ∈ Rd are orthonormal then ∥x∥ = 1, ∥y∥ = 1, θ = 90◦

Similarity between vectors


−→ The similarity bewteen vectors is measured by considering the angle between them.
Dot Product T T
Let x ∈ Rd ⇒ x = x1 x2 . . . xd , y ∈ Rd ⇒ y = y1 y2 . . . yd
 

·
Then x y = ⟨x, y⟩ = (x, y) = x1 y1 x2 y2 . . . xd yd = xT y = ∥x∥ ∥y∥ cos θ
 

−→ The dot product can be used to compute the similarity between two vectors based on the angle
between them.

θ = arcos

x y · 

∥x∥ ∥y∥
−→ Two vectors are said to be similar when
they are aligned.

·
−→ θ = 90◦ ⇒ x y = 0.
·
−→ If x and y are orthonormal vectors then ∥x∥ = 1, ∥y∥ = 1 and x y = xT y = 0.
−→ The dot product can also be used to compute the Euclidean norm of a vector.
 T
Let x ∈ Rd ⇒ x = x1 x2 . . . xd
√ √
p
· p p
∥x∥ = x1 + x2 + · · · + xd = x x = ⟨x, x⟩ = (x, x) = xT x
2 2 2

Vectors are shown as arrows and are also shown as points


−→ The (tip of the) vector represents a point (in a vector space).
−→ We can also call a vector a point.

−→ The straight line distance between 'points' x ∈ Rd and y ∈ Rd is given by


q q
∥x − y∥ = (x − y) (x − y) = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xd − yd )2
T
Outer product
T T
Let x = x1 x2 . . . xd ∈ Rd , y = x1 x2 . . . xk ∈ Rk
 

Then the outer product of x and y is given by


 
x y x y . . . x y
 
x1 1 1 1 2 1 k

T
 x 2
   x2 y1 x2 y2 . . . x2 yk 

xy =  y1 y2 . . . yk =  . 
.
. . .  .. ..  
xd xd y1 xd y2 . . . xd yk

Outer product vs inner product


1. Inner product of x and y = xT y
Outer product of x and y = x y T

2. For the inner product of x and y, dim(x) = dim(y)


For the outer product of x and y such a condition does not exist

3. Inner product of x and y gives a scalar


Outer product of x and y gives a matrix
Functions
Functions can be 'broadly' categorized into four categories.

Scalar single variable functions


The input is a scalar e.g. x and the output is a scalar e.g. f = f (x).

Scalar multi-variable functions


The input is a vector e.g. x and the output is a scalar e.g. f = f (x) = f (x1 , x2 , . . . xd ).

Vector single-variable functions  


f1 (x)
The input is a scalar e.g. x and the output is a vector e.g. f = f (x) =  ... 
 
fn (x)
Vector multivariable functions    
f1 (x) f1 (x1 , x2 , . . . xd )
The input is a vector e.g. x and the output is also a vector e.g. f = f (x) =  ...  =  ..
   
. 
fn (x) fn (x1 , x2 , . . . xd )
Scalar multivariate functions
−→ These are functions which take a vector as an
 input
 and produce a scalar as an output.
x
e.g. f (x) = sin (x1 ) + cos (x2 ) + 5, x = 1 ∈ R2
x2
−→ The function can be plotted as a 2D surface. Note: In the figure X Y Z mean x1 x2 f .
Gradient of a function

I am standing at point X in the domain of a funtion f . In which direction should I move so that I experience
the maximum rate of increase in the function? Answer: Move in the direction of the gradient of f = ∇f .
−→ ∇f is a vector in the domain of the function f , whose direction points the maximum rate of increase
in the function, and whose magnitude gives the value of the maximum rate of change of the function.
Computing the gradient
If f (x) is a multivariate function where x ∈ Rd
∂f
 
 ∂x1 
 
 ∂f 
 
Then ∇f =  ∂x
 2

 .. 
 . 
 ∂f 
∂xd
Example  
x
f (x) = sin (x1 ) + cos (x2 ) + 5, x = 1 ∈ R2
x
 2 
−5.588
We are standing at the point x0 = in the domain of f . In which direction should we move so that
−2.05
the function increases the most rapidly? What is the maximum rate of the change of the function?
Solution

 
∂f        
 ∂x1  cos (x1 ) −5.588 cos (−5.588) 0.768
∇f = 
 ∂f  = − sin (x2 ) . At x = x0 = −2.05 , ∇f = − sin (−2.05) ≈ 0.887

∂x2
 
0.768
So we need to move in the direction
0.887

The maximum rate of change of the function = ∥∇f ∥ = 0.7682 + 0.8872 ≈ 1.71  
−→ This means that if we are standing at x and move a small amount dl in the direction 0.768
0 the
0.887
function will increase by ≈ 1.71dl
−→ −∇f gives us the direction in which the function f decreases the most rapidly.
Example

   
x 1 1
f (x) = (x1 − 1)2 + (x2 − 2)2 + 4, x = ∈ R2 , We are at x0 = in the domain of f .
x2 2
In which direction should we move so that the function increases the most rapidly?
Solution

 
∂f  
 ∂x1  2(x1 − 1)
∇f =  ∂f  = 2(x2 − 2)

∂x2
     
1 2(1 − 1) 0
At x = x0 = , ∇f = =
2 2(2 − 2) 0
The gradient (at x0 ) = 0. So which direction do we move in?

−→ When the gradient is 0 it means that the function is 'flat' at the given point. The function is (locally)
minimum or maximum at the given point.
Difference between gradient and derivative
1. A derivative is computed with respect to any one variable of the input vector, whereas the gradient is
computed with respect to all the variables of the input vector
2. The derivative is a scalar quantity whereas the gradient is a vector quantity.

Partial gradient
The gradient is computed with respect to only some of the variables in the input vector.
Example  
x1      
2
x2  x 1 x 3 x2
x3  , p = x2 , q = x4 , r = x3
f (x) = x1 + sin (x2 ) + cos (x3 ) + 8x4 , x =  

x4
−→ ∇p f = partial gradient of f w.r.t. p
     
∂f   ∂f   ∂f  
 ∂x1  2x 1
 ∂x3  − sin (x 3 )  ∂x2  cos (x 2 )
⇒ ∇p f = 
 ∂f  = cos (x2 ) , ∇q f =  ∂f  =
   , ∇r f = 
 ∂f  = − sin (x3 )

8
∂x2 ∂x4 ∂x3
What does the partial gradient tell us?
Example    
x1 2
x2 
f (x) = x21 + sin (x2 ) + cos (x3 ) + 8x4 , x =   , x0 =  3 
 
x3  −1
x4 −2
 
x
We are standing at x0 . How should we move in the 2 direction so that the function decreses the most
x4
rapidly?
Solution     
x2 x 3
Let t = . We need to move in the direction of −∇t f at 2 =
x4   x4 −2
∂f          
 ∂x2  − cos (x2 ) x 3 − cos (3) −0.99
⇒ −∇t f = −   ∂f  =
 . At 2 = , − ∇t f = ≈
−8 x4 −2 −8 −8
∂x4
Gradient of a vector single variable function
If we have a function
 
  df1 (x)
f1 (x)  dx 
 ..  
.. 

f (x) =  .  ⇒ ∇f =   . 
fn (x)  df (x) 
n
dx

Gradient of a vector multi-variable function


If we have a function
   
f1 (x) f1 (x1 , x2 , . . . xd )
 ..   ..  
f (x) =  .  =   ⇒ ∇f = ∇f1 . . . ∇fn

.
fn (x) fn (x1 , x2 , . . . xd )
Exercises
Show that for vectors x, y ∈ Rd and for matrices M ∈ Rkxd and A ∈ Rdxd ,

−→ ∇x y T x = ∇x xT y = y

−→ ∇x M x = M T

−→ ∇x xT Ax = (A + AT )x
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 3
Discrete domain and continuous domain functions
Examples of Scalar Single Variable (SSV) functions
Discrete domain
 πn 
f [n] = cos
8

Continuous domain
 πx 
f (x) = cos
8
Examples of Scalar Multi-Variable (SMV) functions
Discrete domain
 πn   πn 
1 2
f [n] = cos cos
8 4

Continuous domain
 πx   πx 
1 2
f (x) = cos cos
8 4
Examples of Vector Single Variable (VSV) functions
Continuous domain Discrete domain
       
f1 (x) sin (πx/8) f1 [n] sin (πn/8)
f (x) = = f [n] = =
f2 (x) e−|x| f2 [n] e−|n|
Examples of Vector Multi-Variable (VMV) functions
Continuous domain Discrete domain
   −|x |       −|n |   
f1 (x) e 2 cos (0.2πx1 ) x1 f1 [n] e 2
cos (0.2πn1 ) n
f (x) = = −|x1 | ,x = ∈ R2 f (n) = = −|n1 | ,n = 1
f2 (x) e sin (0.4πx2 ) x2 f2 [n] e sin (0.4πn2 ) n2
Grayscale images
−→ A grayscale image is a discrete domain SMV function.

−→ A normalized discrete domain SMV function can be shown as a grayscale image.


Colored Images

−→ A colored
 image is 
a VMV function

−→ r(n) r(n1, n2)
i(n) = g(n) = g(n1, n2)
b(n) b(n1, n2)

Colored Image

R channel
G channel

B channel
Linear Transformations
−→ These are VMV functions of the form f (x) = M x or f [n] = M n, where M is a matrix of
constants.
−→ If the matrix M is kxd then it will map a d-dimensional vector to a k-dimensional vector.
Continuous domain Discrete domain
f (x) = M x = x b f [n] = M n = n b
         
m11 m12 . . . m1d x1 x
b1 m11 m12 . . . m1d n1 n
b1
m m . . . m      m m . . . m     
 21 22 2d  x2  x
b2   21 22 2d  n2  n
b2 
 .. ..   =  .. ..   =
 . .  . . . . . .  . .  . . . . . .
mk1 mk2 . . . mkd xd x
bk mk1 mk2 . . . mkd nd n
bk

Self mapping linear transformations


−→ These are linear transformations which map a d-dimensional vector to a d-dimensional vector.
−→ There are three basic types of linear transformations.
• Rotation
• Scaling
• Shearing
Rotation
The projection of a point onto a plane is rotated by an angle θ in that plane.
Rotation in 2D

−→ The projection of the point x in the x1 −x2


plane (which is again x) is rotated by an
angle θ in that plane to get f (x).
−→ The transformation matrix is
 
cos (θ) − sin (θ)
M=
sin (θ) cos (θ)
−→ f (x) = M x
Rotation in 3D
−→ The projection of the point x in the x1 −x3
plane is rotated by an angle θ in that plane
to get f (x).
−→ Only the x1 and x3 coordinates change
while moving from x to f (x).
−→ The transformation matrix is
 
cos (θ) 0 − sin (θ)
M = 0 1 0 
sin (θ) 0 cos (θ)
−→ f (x) = M x
Rotation about a arbitrary point
The projection of a point onto a plane is rotated by an angle θ, about a point c.
Rotation in 2D
−→ The projection of the point x in the x1 −x2
plane (which is again x) is rotated by an
angle θ in that plane, about the point c, to
get f (x).
−→ The transformation matrix is
 
cos (θ) − sin (θ)
M=
sin (θ) cos (θ)
−→ f (x) = c + M (x − c)
Rotation in 3D
−→ The projection of the point x in the x1 −x3
plane is rotated by an angle θ in that plane,
about the point c, to get f (x).
−→ The transformation matrix is
 
cos (θ) 0 − sin (θ)
M = 0 1 0 
sin (θ) 0 cos (θ)
−→ f (x) = c + M (x − c)
Scaling
A vector is stretched by a factor k.
Scaling in 2D

−→ The vector x is stretched by a factor k, to


get f (x).
−→ The transformation matrix is
 
k 0
M=
0 k
−→ f (x) = M x
Scaling in 3D

−→ The vector x is stretched by a factor k, to


get f (x).
−→ The transformation matrix is
 
k 0 0
M = 0 k 0
0 0 k
−→ f (x) = M x
Scaling with reference to an arbitrary point
A vector is stretched by a factor k with reference to an arbitrary point c.
Scaling in 2D
−→ The vector x is stretched by a factor k
with reference to an arbitrary point c, to
get f (x).
−→ The transformation matrix is
 
k 0
M=
0 k
−→ f (x) = c + M (x − c)
Scaling in 3D
−→ The vector x is stretched by a factor k
with reference to an arbitrary point c, to
get f (x).
−→ The transformation matrix is
 
k 0 0
M = 0 k 0
0 0 k
−→ f (x) = c + M (x − c)
Shearing
One coordinate of a vector is increased in proportion to another coordinate.
−→ The x2 coordinate of the vector x increases
by kx1 .
Shearing in 2D −→ The transformation matrix
 
1 0
M=
0 k
−→ f (x) = M x

−→ The x3 coordinate of the vector x increases


by kx2 .
−→ The transformation matrix
Shearing in 3D 
1 0 0

M = 0 1 0
0 k 1
−→ f (x) = M x
Eigenvectors
−→ The mapped vector is 'similar' to the original vector.
−→ The mapped vector is a constant times the original vector. This constant is called the eigenvalue of
the eigenvector.

Example of an eigenvector in 2D
   
−4 1 4
x= is an eigenvector of M = because M x = −2x
3 3 2
The eigenvalue is − 2

Example of an eigenvector in 3D
   
1 3 4 −2
x = 1 is an eigenvector of M = 1 4 1  because M x = 3x
2 2 6 1
The eigenvalue is 3
Image Processing through linear transformations
Rotating images by θ about a point c in the image plane
Rotating grayscale images
−→ M is a matrix for rotation by angle θ in the image plane.
   
cos (θ) − sin (θ) −1 cos (−θ) − sin (−θ)
M= ⇒M =
sin (θ) cos (θ) sin (−θ) cos (−θ)
−→ The pixel n0 moves to the pixel position n = f [n0 ] = c + M (n0 − c).
−1
−→ The pixel n comes from the pixel position n0 = f [n] = c + M −1 (n − c).
−→ The new image i at the target pixel n = the original image i0 at the source pixel N N (n0 ).
i[n] = i0 [N N (n0 )], N N (n0 ) is the nearest neighbor of n0
−→ The value of the original image i0 at pixels outside the image frame is taken to be 0.
−→ To rotate a colored image each channel is rotated separately in the same manner.
Scaling images by k with reference to a point c in the image plane
Scaling grayscale images
−→ M is a matrix for scaling by a factor k in the image plane.
   −1 
k 0 k 0
M= ⇒ M −1 =
0 k 0 k −1
−→ The pixel n0 moves to the pixel position n = f [n0 ] = c + M (n0 − c).
−1
−→ The pixel n comes from the pixel position n0 = f [n] = c + M −1 (n − c).
−→ The new image i at the target pixel n = the original image i0 at the source pixel N N (n0 ).
i[n] = i0 [N N (n0 )], N N (n0 ) is the nearest neighbor of n0
−→ The value of the original image i0 at pixels outside the image frame is taken to be 0.
−→ To scale a colored image each channel is rotated separately in the same manner.
Exercises
1. An image pixel n is to be rotated about a point c by θrad and the vector corresponding to the
resulting point scaled with reference to c by a factor of 0.25, through a function f [n].
(a) Write down a mathematical expression for f [n]
(b) Write down a mathematical expression for the resulting pixel p

2. An (continuous domain) image pixel x is to be rotated about a point c1 by θrad and the vector
corresponding to the resulting point scaled with reference to c2 by a factor of 0.25, through a
function f (x). Write down a mathematical expression for f (x)

3. A 1280p video frame i[n] is rotated by 90◦ about the point c ∈ R2 which is the 'perfect' center of
the frame to give a new 1280p frame bi[n].
(a) What are the coordinates of c?
(b) Write down bi[n] in terms of i[n].
(c) How many pixels have an unknown value in bi[n]?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 4
Using a continuous domain equivalent of a discrete domain function
Continuous domain equivalent of a discrete domain SSV function
Let f [n] be a discrete domain SSV function.
−→ We assume that the samples i.e. f [n] are
1 unit apart, (in the dimension n) w.r.t. a
continuous variable x.
−→ We now imagine a smooth function f (x)
connecting the 'heads' of f [n].
−→ The value of f (x) is only known 'correctly'
when x = n.
Continuous domain equivalent of a discrete domain SMV function
Let f [n], n = [n1 , n2 ]T be a discrete domain SMV
function.
−→ We assume that the samples i.e. f [n] are
1 unit apart (in each dimension n1 , n2 ),
w.r.t. a continuous variable x = [x1 , x2 ]T .
−→ We now imagine a smooth function f (x)
connecting the 'heads' of f [n].
−→ The value of f (x) is only known
'correctly' when x = n.
Estimating derivatives from a discrete domain function
Estimating the derivatives of a SSV function f (x) from its samples f [n]
Goal
To be able to estimate the derivatives of the function
f (x) from its samples f [n], at any value of n.
Solution
−→ f [n] is replaced with its continuous
domain equivalent function f (x).
−→ The Taylor series of f (x) is used to
estimate the derivatives of f at any value
of n.
Using the Taylor series to estimate derivatives of a SSV function.
Let f [n] be a discrete domain SSV function, and let f (x) be its continuous domain equivalent.
f (x) ≈ f [n]

−→ We will skip the derivation

−→ df (x) df f [n + 1] − f [n − 1]
= [n] ≈
dx n dx 2
2
−→ d f (x) d2 f
= [n] ≈ f [n + 1] − 2f [n] + f [n − 1]
dx2 n dx2
Estimating the gradient of a SMV function f (x) from its samples f [n]
Goal
To be able to estimate the partial derivatives (and
hence gradient) of the function f [n] at any discrete
point.
Solution
−→ f [n] is replaced with its continuous
domain equivalent function f (x).
−→ The Taylor series approximation for the
derivatives of an SSV function is used to
calculate the partial derivatives of f (x).
Using the Taylor series to estimate gradient of a SMV function.
Let f [n] be a discrete domain SMV function, and let f (x) be its continuous domain equivalent.
f (x) ≈ f [n]
−→ We will skip the derivation

−→ ∂f (x) ∂f (x) f [n1 + 1, n2 ] − f [n1 − 1, n2 ]


= [n] ≈
∂x1 n ∂x1 2
2
−→ ∂ f (x) ∂ 2 f (x)
= [n] ≈ f [n1 + 1, n2 ] − 2f [n1 , n2 ] + f [n1 − 1, n2 ]
∂x21 n ∂x21

−→ ∂f (x) ∂f (x) f [n1 , n2 + 1] − f [n1 , n2 − 1]


= [n] ≈
∂x2 n ∂x2 2
2
−→ ∂ f (x) ∂ 2 f (x)
= [n] ≈ f [n1 , n2 + 1] − 2f [n1 , n2 ] + f [n1 , n2 − 1]
∂x22 n ∂x22

−→ We can now 'estimate' the gradient of f (n).


Derivative Filters
Image (i) Derivative Filters (g)
x1 partial derivative x2 partial derivative
235 18 253 79 128 8 250 161
45 123 165 112 31 184 89 197 -0.5 - 0.5 0 0.5
178 164 80 115 86 163 67 159 0
120 63 215 53 229 54 154 125 0.5

−→ The image i is shifted so that the target index n = [1 5]T is aligned under the origin (anchor) =
[0 0]T of the filter.
−→ The multiply add operation is applied between the filter g and image i to get the result h at the target
pixel n.
Mathematical notation
h[n] = (g ⊗ i)[n] (filter g applied to f )
Dealing with boundaries
Image
Filter
112 45 75 128 189 216 214 40
250 35 126 101 41 219 108 4
250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50
97 13 207 229 111 216 56 178
219 123 230 105 171 55 107 81
219 123 230 105 171 55 107 81

If parts of the filter need to operate on image values ouside the image boundary then what do we do?
−→ There are two popular ways to deal with this problem (for pixels outside the image boundary).
• Simply use 0 for the value of the image.
• Copy the value of the image value from the nearest pixel in the image.
−→ The choice of how we deal with this problem will depend on the scenario
.
Filtering
Filtering of SSV functions
Function (f ) Filter (g)

232 86 104 49 135 143 40 232 33 134 69 165

−→ The function f is shifted so that the target index n = 3 is aligned under the origin (anchor) = 0 of
the filter.
−→ The multiply add operation is applied between the filter g and function f to get the result h at the
target index n.
Mathematical notation
X2 X∞
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[k]f [k + n] (filter g applied to f )
k=−1 k=−∞
Correlation and convolution (SSV functions)
−→ The shift multiply add operation (used for filtering) is called correlation denoted by ⊗.
−→ The filtering result h[n] is obtained by computing the correlation of the filter g[n] with the function
f [n].

X
h[n] = (g ⊗ f )[n] = g[k]f [k + n]
k=−∞
−→ If we flip the filter before the shift multiply add operation then we get a new result

X
hnew [n] = (gf lipped ⊗ f )[n] = gf lipped [k]f [k + n]
k=−∞
X∞ ∞
X
= g[−k]f [k + n] = g[k]f [n − k] = (g ∗ f )[n]
k=−∞ k=−∞
−→ (gf lipped ⊗ f )[n] = (g ∗ f )[n] ⇒ (g ⊗ f )[n] = (gf lipped ∗ f )[n].
−→ We can apply the filter g to the function f using convolution instead of correlation, but convolution
and correlation are slightly different operations.
Filtering of SMV functions
Function (f ) Filter (g)

216 195 57 58 124 239 207 100 126 198 68


14 77 245 153 104 245 146 28 116 121 2
143 78 88 234 74 154 27 50 223 115 191
191 198 88 66 113 156 103 167 179 171 109

−→ The function f is shifted so that the target index n = [1 5]T is aligned under the origin (anchor) =
[0 0]T .
−→ The multiply add operation is applied between the filter g and function f to get the result h at the
target index n.
Mathematical notation
Let V = dom(g), R ⊂ V : k ∈ / R ⇒ f [k] = 0, R is the active region of the filter.
X X
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[n]f [k + n] (filter g applied to f )
k∈R k∈V
Correlation and convolution (SMV functions)
−→ The shift multiply add operation (used for filtering) is called correlation denoted by ⊗.
−→ The filtering result h[n] is obtained by computing the correlation of the filter g[n] with the function
f [n].
X
h[n] = (g ⊗ f )[n] = g[k]f [k + n]
k∈V
−→ If we flip the filter before the shift multiply add operation then we get a new result
X
hnew [n] = (gf lipped ⊗ f )[n] = gf lipped [k]f [k + n]
k∈V
X X
= g[−k]f [k + n] = g[k]f [n − k] = (g ∗ f )[n]
k∈V k∈V
where V = dom(g)
−→ (gf lipped ⊗ f )[n] = (g ∗ f )[n] ⇒ (g ⊗ f )[n] = (gf lipped ∗ f )[n].
−→ We can do filtering of SMV functions using convolution instead of correlation, but SMV convolution
and SMV correlation are slightly different operations.
Filtering grayscale images
Example
159 195 93 95 113 144 125 100
156 105 112 246 170 9 151 242
64 30 196 192 189 235 135 189
81 219 2 70 113 115 133 214
The image i is to be processed so that the value at each pixel n = [n1 , n2 ] is replaced with the average value
of its eight neigbors.
(a) What filter should be used for doing filtering through correlation?
(b) Write down a mathematical expresion for the filtering operation in the form of a 2-variable summation.
Solution
(a) The filter will be a 3x3 sqaure with a 0 in the center (origin) and each other value as 1/8.
(b) Let R ⊂ dom(g) : k ∈ / R ⇒ f [k] = 0
Let h[n] be the processed image
X 1
X 1
X
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[k1 , k2 ]f [k1 + n1 , k2 + n2 ]
k∈R k1 =−1k2 =−1
Filtering grayscale images
Example

215 122 192 20 24 252 119 238


80 62 88 64 61 155 173 53
217 254 77 209 74 210 39 243
242 212 133 229 145 174 187 37

The image i[n], n = [n1 , n2 ] is to be processed so that it shifts 1 pixel to the right and 2 pixels upward.
(a) What filter g[n] should be used for doing filtering through convolution?
(b) Write down a mathematical expresion for the filtering operation in the form of a 2-variable summation.
(c) Compute the result h[n] (4x8 image) of part (b)
Solution
Try this yourself!
Why use convolution instead of correlation?
−→ Convolution has some nice properties which are not present in exactly the same manner in correlation.
Properties of convolution
SSV functions SMV functions
Commutative Property: Commutative Property:
(f ∗ g)[n] = (g ∗ f )[n] (f ∗ g)[n] = (g ∗ f )[n]
Associative Property: Associative Property:
(f ∗ g ∗ h)[n] = (f ∗ g) ∗ h[n] = f ∗ (g ∗ h)[n] (f ∗ g ∗ h)[n] = (f ∗ g) ∗ h[n] = f ∗ (g ∗ h)[n]
Homogenous Property: Homogenous Property:
(f ∗ kg)[n] = k(f ∗ g)[n], k ∈ R (f ∗ kg)[n] = k(f ∗ g)[n], k ∈ R
Additive Property: Additive Property:
(f ∗ (g + h))[n] = (f ∗ g)[n] + (f ∗ h)[n] (f ∗ (g + h))[n] = (f ∗ g)[n] + (f ∗ h)[n]
Homogenous + Additive ⇒ Linear Homogenous + Additive ⇒ Linear
(f ∗ (ag + bh))[n] = a(f ∗ g)[n] + b(f ∗ h)[n], (f ∗ (ag + bh))[n] = a(f ∗ g)[n] + b(f ∗ h)[n],
a, b ∈ R a, b ∈ R
Exercises
1. On slide 7 generate the 4x8 image h[n] by doing the following operations
(a) h[n] = (g ⊗ i)[n]
(b) h[n] = (gf lipped ∗ i)[n],
where g is the x1 partial derivative filter

2. On slide 10 prove that



X ∞
X
g[−k]f [k + n] = g[k]f [n − k] = (g ∗ f )[n]
k=−∞ k=−∞

3. On slide 12 prove that


X X
= g[−k]f [k + n] = g[k]f [n − k]
k∈V k∈V
where V = dom(g)

4. Revisit the problem on slide 14 and compute the 4x8 image (g ⊗ i)[n]
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 5
Processing discrete signals and images

Representations of discrete signals


SSV functions
Stem plot Array or matrix

35 126 200 101 41 219 108 4

Normalized Image
SMV functions
Stem plot Array or matrix

250 35 126 101 41 219 108 4


143 78 88 234 74 154 27 50
219 123 230 105 171 55 107 81

Normalized Image
Filtering of discrete signals
Filtering SSV signals
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
35 126 200 101 41 219 108 4 35 126 200 101 41 219 108 4

Filter g[n]: Filter gf lipped [n] = g[−n]:


-4 1 1 1 1 1 1 -4

Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n]:


361 287 -162 -439 -36 167 -764 -428 361 287 -162 -439 -36 167 -764 -428
−→ Correlation and convolution can be interchanged if the first function (filter) is flipped.
Filtering SSV signals with a symmetric filter
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
35 126 200 101 41 219 108 4 35 126 200 101 41 219 108 4

Filter g[n]: Filter gf lipped [n] = g[−n] = g[n]:


1 -1 2 -1 1 1 -1 2 -1 1

Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n] = (g ∗ f )[n]:


144 118 249 306 70 394 34 119 144 118 249 306 70 394 34 119

−→ Correlation and convolution are equivalent when the first function (filter) is symmetric.
Filtering SMV signals
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
250 35 126 101 41 219 108 4 250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50 143 78 88 234 74 154 27 50
219 123 230 105 171 55 107 81 219 123 230 105 171 55 107 81
Filter g[n]: Filter gf lipped [n] = g[−n]:
-1 1 1 1 2 -1
-1 2 1 1 1 -1
Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n]:
535 -54 318 117 200 505 1 -100 535 -54 318 117 200 505 1 -100
649 12 524 470 227 547 -157 -31 649 12 524 470 227 547 -157 -31
782 280 686 371 286 153 163 78 782 280 686 371 286 153 163 78
−→ Correlation and convolution can be interchanged if the first function (filter) is flipped.
Filtering SMV signals with a symmetric filter
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
250 35 126 101 41 219 108 4 250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50 143 78 88 234 74 154 27 50
219 123 230 105 171 55 107 81 219 123 230 105 171 55 107 81
Filter g[n]: Filter gf lipped [n] = g[−n] = g[n]:
-1 2 -1 -1 2 -1
1 3 1 1 3 1
-1 2 -1 -1 2 -1
Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n] = (g ∗ f )[n]:
993 406 993 406
1287 1287

−→ Correlation and convolution are equivalent when the filter is symmetric.


Smoothing a function
Smoothing an SSV function
Simple smoothing

Original function f [n] Smoothing result h[n] = (g ∗ f )[n]

1
1 1 1 Why are we doing (g ∗ f ) instead of (g ⊗ f )?
3
Smoothing filter g(n)
−→ The value at each pixel is replaced with the average of that of itself and that of its immediate (two)
neighbors, in oreder to smooth the profile of the function.

Blurring effect h[n] = f [n − 1 + f [n] + f [n + 1]

−→ Advantage of smoothing: The noise (random change) in the function is reduced.


−→ Disadvantage of smoothing: The function gets blurred. Sharp changes (details) are lost.
Weighted smoothing
−→ Give more weight to the target pixel n to reduce blurring effects

Original function f [n] Smoothing result h[n] = (f ∗ g)[n]

1
1 2 1
4
Smoothing filter g(n)
Gaussian smoothing
−→ Give more weight to the target pixel n using a normalized Gaussian distribution

Original function f [n] Smoothing result h[n] = (f ∗ g)[n]

0.0545 0.2442 0.4026 0.2442 0.0545


Smoothing filter g(n) = gauss5 [n]
Smoothing an MSV function
Simple smoothing

Original function f [n] Smoothing result h[n] = (f ∗ g)[n]

1 1 1
1
1 1 1
3
1 1 1
Smoothing filter g(n)
−→ Advantage of smoothing: The noise (random change) in the function is reduced.
−→ Disadvantage of smoothing: The function gets blurred. Sharp changes (details) are lost.
Weighted smoothing
−→ Give more weight to the target pixel n to reduce blurring effects

Original function f [n] Smoothing result h[n] = (f ∗ g)[n]

1 2 1
1
2 4 2
16
1 2 1
Smoothing filter g(n)
Gaussian smoothing
−→ Give more weight to the target pixel n using a Gaussian distribution

Original function f [n] Smoothing result h[n] = (f ∗ g)[n]


0.0030 0.0133 0.0219 0.0133 0.0030
0.0133 0.0596 0.0983 0.0596 0.0133
0.0219 0.0983 0.1621 0.0983 0.0219
0.0133 0.0596 0.0983 0.0596 0.0133
0.0030 0.0133 0.0219 0.0133 0.0030
Smoothing filter g(n) = gauss5x5 [n]
Image sharpening
What does blurring take away?

Original Smoothed 5x Difference = 5x Detail lost


−→ Blurring removes the edges (or fine details) from the image.
Let’s add it back!

Original 5x Detail lost Sum


−→ Adding back the lost details to the original image will sharpen it!
−→ For sharpening an image we can do
h = F + α(F − g ∗ F ),
where g is a Gaussian filter and α is a multiplier, which is 5 in this case.
Derivative and gradient in SSV functions

 
−→ ∇f (x) = df (x) df (x)
= x
b, x b = unit vector in the x direction.
dx dx
−→ The derivative is a part of
 the gradient. 
f [x + 1] + f [x − 1]
Domain point x ∇f (x) = ∥∇f (x)∥ dir(∇f (x))
2
0 [0] 0 any
5 [-0.3827] 0.3827 [-1]
10 [0.3827] 0.3827 [ 1]
Edge Detection
SSV function

Original (Stem plot) Normalized gradient magnitude (Stem plot)

Original (Image) Normalized gradient magnitude (Image)

−→ The edge occurs where ∥∇f ∥ is high.


SSV function

Original (Stem plot) Normalized gradient magnitude (Stem plot)

Original (Image) Normalized gradient magnitude (Image)


−→ The edge occurs where ∥∇f ∥ is high.
Why are there two indices for the edge in the stem plot? Why is the egde two pixels thick?
S2V function (which represents an images)

Original (Stem plot) Normalized gradient magnitude (Stem plot)

Original (Image) Normalized gradient magnitude (Image)


−→ The edge occurs where ∥∇f ∥ is high.
Why is the edge two pixels thick?
S2V function (which represents an images)

Original (Smooth interpolation Normalized gradient magnitude (Stem plot)

Original (Image) Normalized gradient magnitude (Image)

−→ The edge occurs where ∥∇f ∥ is high.


Edge detection through filtering
Original: f [n] or f (x) Detecting horizontal edges: x1 derivative

Detecting vertical edges: x2 derivative Detecting the magnitide (norm) of the gradient
Edge detection of noisy images

Original with noise added Normalized gradient magnitude


−→ The edges and noise both occur where ∥∇f ∥ is high.
−→ The image is first smoothed a little bit before calculating ∥∇f ∥ to supress the noise.
Sobel edge detector
−→ Uses a weighted smoothing filter before calculating the magnitude of the gradient.
x1 derivative x2 derivative

   
−1  1 1
∂f (x) 1   1   ∂f (x) 1 
= 0 ⊗ 1 2 1 ⊗ f [n] = −1 0 1 ⊗ 2 ⊗ f [n]
∂x1 2 4 ∂x2 2 4
1 1
   
1  1 1
1  1  1
= 0 ∗ 1 2 1 ∗ f [n] = 1 0 −1 ∗ 2 ∗ f [n]
2 4 2 4
−1 1
   
1 2 1 1 0 −1
1 1
=  0 0 0  ∗ f [n] = 2 0 −2 ∗ f [n]
8 8
−1 −2 −1 1 0 −1
s 2  2
∂f (x) ∂f (x)
∥∇f ∥ = +
∂x1 ∂x2
Output of the Sobel edge detector

Original with noise added Output of the Sobel edge detector


Derivative of Gaussian (DOG) edge detector
−→ Uses a Gaussian smoothing filter before calculating the magnitude of the gradient.
x1 derivative x2 derivative
∂f (x) 1 
 
−1 
∂f (x) 1   = −1 0 1 ⊗ gauss5x5 [n] ⊗ f [n]
= 0 ⊗ gauss5x5 [n] ⊗ f [n] ∂x2 2
∂x1 2 1
1 = 1 0 −1

∗ gauss5x5 [n] ∗ f [n]
 
1 2
1  
52 95 0 −95 −52

= 0 ∗ gauss5x5 [n] ∗ f [n]
2 232 425 0 −425 −232
−1 1  
  = 4 382 701 0 −701 −382  ∗ f [n]
52 232 382 232 52 10 
232 425 0 −425 −232

 95 425 701 425 95 
1   52 95 0 −95 −52
= 4 0 0 0 0 0  ∗ f [n]
10  
−95 −0425 −701 −425 −95
−52 −232 −382 −232 −52
s 2  2
∂f (x) ∂f (x)
∥∇f ∥ = +
∂x1 ∂x2
Output of the DoG detector using a 5x5 Gaussian kernel gauss5x5 [n]

Original with noise added Output of the DoG filter using a 5x5 Gaussian kernel
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 6
Gaussian functions
SSV Gaussian SMV Gaussian
1 (n−µ)2
 
1 − 2
gauss[n] = √ e 2 σ , n∈R
σ 2π
1 1 T −2
=p e− 2 (n−µ) σ (n−µ) , n ∈ R
(2π)1 |σ|

−→ In this course we will be taking Σ = σI,


where I is the k × k identity matrix.
1 1 T −2
gauss[n] = p e− 2 (n−µ) Σ (n−µ) , n ∈ Rk
(2π)k |Σ|
S2V Gaussian
1 1 T −2
gauss[n] = p e− 2 (n−µ) Σ (n−µ) , n ∈ R2
(2π) |Σ|
Revisit the DoG filter

Original Output of the DoG filter using a 5 × 5 Gaussian filter


−→ There are still some problems with the results.
Problem of thick edges
−→ Usually a number of pixels around the edge have a high value for ∥∇f ∥ and show up as edges.
−→ We use a technique called non maxima supression to make the edges one pixel thick.

Non maxima supression

−→ ∥∇f ∥ starts increasing as we move towards the edge at right angles (direction of +∇f or −∇f ).
−→ The true edge is at the pixel where ∥∇f ∥ is maximum. Eliminate the neighbors of this pixel (non
maxima).
Quantized gradient direction

Neighboring pixels corresponding to the 8 quantized directions Nearest neighbor in the direction of ∇f . The graylevels
indicate magnitude of the gradient.
−→ The gradient at any pixel n is classfied into one of 8 possible directions.
−→ The similarity of the gradient at pixel n is computed with the unit vector in each of the 8 directions.

Similarity (∇f ([n]), vbk ) =


∇f vbk· =
∇f vbk ·
, vbk is the unit vector in the direction k
∥∇f ∥ ∥b
vk ∥ ∥∇f ∥

−→ The direction with the highest smilarity is the quantized gradient direction.

dir (∇f ([n])) = argmax (Similarity (∇f ([n]), vbk ))


k
Pseudocode for non maxima suppression
for (each pixel n in the image f ) {
k = dir (∇f [n]) , k ∈ Z : 1 ≤ k ≤ 8
n0 = neighbor of n in the direction k
if (|∇f [n0 ]| > |∇f [n]|) {
separately mark n as a non maximum;
}
k = dir (−∇f [n]) , k ∈ Z : 1 ≤ k ≤ 8
n0 = neighbor of n in the direction k
if (|∇f [n0 ]| > |∇f [n]|) {
separately mark n as a non maximum;
}
}
for (each pixel n in the image f ) {
if (n is a non maximum) {
|∇f ([n])| = 0;
}
}
Results

Original f [n] DoG filter of size 5 × 5 applied to f [n] to Non maxima suppression applied after the
estimate ∇f . DoG filter
Problem of weak edges
−→ Noise within the image filtered by a DoG
filter usually shows up as weak edges.
−→ We use a technique called hysteresis
thresholding to remove such weak edges.

Hysteresis thresholding
−→ Edge pixels where the intensity of ∇f is below a certain percentile Plow are eliminated.
−→ Edge pixels where the intensity of ∇f is above a certain percentile Phigh are marked as strong edges.
−→ Edge pixels where the intensity of ∇f is between Plow and Phigh are marked as weak edges.
−→ If a weak edge has at least K neighbors which are a strong edge, then mark that weak edge as a
strong edege.
−→ Ultimately no weak edge should have K or more neighbors which are strong edges.
Pseudocode for hysteresis thresholding
for (each pixel n in the image f ) {
classify n as a strong, weak or non-exixtent edge using its percentile position based on ∇f ;
}
done = false;
while (!done) {
done = true;
for (each pixel n in the image f ) {
if(pixel n is a weak edge) {
if (pixel n has K or more strong edge neighbors) {
pixel n is now a strong edge;
if (done == true) {done = false; }
}
}
}
}
for (each pixel n in the image f ) {
if (pixel n is a strong edge) {∇f = 255;} else {∇f = 0;}
}
−→ Note that in this algorithm (the 'while' loop) can also be implemented recursively
Results

Original f [n] DoG filter of size 5×5 applied Non maxima suppression (of Hysteresis thresholding
to f [n] to estimate ∇f . ∇f ) applied after the DoG applied after non maxima
filter suppression.
Plow = 40, Phigh = 80,
K=1
Canny edge detector
−→ Detect edges using a DOG filter.
−→ Make edges one pixel thick using non maxima suppression.
−→ Use hysteresis threshholding to eliminate weak edges.
Background mathematics for future work
Null space (kernel) of a matrix
Consider a linear transformation matrix A ∈ Rk×d
−→ dom (A) = Rd
−→ target (A) = Rk
−→ ker (A) = V ⊂ dom (A) = Rd : x ∈ V ⇒ Ax = 0 ∈ Rk
−→ ker (A) is the set of vectors x such that Ax = 0 ∈ Rk .
−→ x = 0 ∈ Rd ⇒ x ∈ ker (A)

Computing the kernel of a matrix A ∈ Rk×d


1. Compute rref (A)  
Fp×q
2. Take the free columns (columns a, b, c, . . . ) of rref (A) to get a matrix free(A) =
0(k−p)×q
3. Create the matrix B by inserting the q × q identity matrix into similar rows (rows a, b, c, . . . ) of −Fp×q
−→ ker (A) = span {col(B)}
−→ Any linear combination of the columns of B will produce a member of ker (A).
Example  
1 2 3 4 5
Compute the kernel of A =
6 7 8 9 0
Solution
A ∈ R2×5
−→ dom (A) = R5
−→ target (A) = R2
−→ ker (A) = V ⊂ dom (A) = R5 : x ∈ V ⇒ Ax = 0
 
  -1 -2 -7  
1 0 −1 −2 −7 F 2×3
rref (A) = ⇒ free(A) =  2 3 6  =
0 1 2 3 6 00×3
       
1 2 7 
 1 2 7 
−2 −3 −6 

−2 −3 −6 
       
⇒B=  1 0 0  ⇒ ker (A) = span {col(B)} = span  1  ,  0  ,  0 
      
0 1 0 
 0   1   0  
 
0 0 1 0 0 1
 
Eigenspace, eigenvectors and eigenvalues
Consider a (self mapping) linear transformation matrix A ∈ Rd×d
−→ dom (A) = Rd
−→ target (A) = Rd
−→ eig (A) = V ⊂ dom (A) = Rd : x ∈ V ⇒ Ax = λx, λ is a constant.
−→ eig (A) is the set of vectors x such that Ax = λx, λ is a constant.
x is called an eigenvector of A with eigenvalue λ
−→ x = 0 ∈ Rd ⇒ x ∈ eig (A) , ∀λ ∈ C
−→ λ = 0 ⇒ Ax = 0 ∈ Rd ⇒ x ∈ ker (A)

Computing the eigenvalues of a matrix


If A ∈ Rd×d and x ∈ eig (A) ⇒ Ax = λx
⇒ (Ax − λx) = 0
⇒ (A − λI)x = 0, I = Id×d
If (A − λI) is invertible then x = (A − λI)−1 0 = 0
If x ̸= 0 then (A − λI) is non invertible
⇒ det (A − λI) = 0
Example
Compute
 the eigenvalues
 of the following matrix
3 −2 5
A = 1 0 7
0 0 2
Solution
−→ Slove det (A − λI) = 0, I = I3×3
   
3 −2 5 1 0 0
⇒ det (A − λI) = det 1 0 7 − λ 0 1 0
0 0 2 0 0 1
 
3 − λ −2 5
= det  1 −λ 7 
0 0 2−λ
= −λ3 + 5λ2 − 8λ + 4 = 0
⇒ λ = 1, 2, 2
⇒ The eigenvalues are 1, 2 and 2
Computing the eigenvectors of a matrix
−→ Eigenvectors are computed using the eignevalues.
If x ∈ eig (A) , with eigenvalue λ
⇒ Ax = λx
⇒ (A − λI)x = 0, I = Id×d
⇒ x ∈ ker (A − λI)
−→ Caluculate ker (A − λI) for each unique eigenvalue λ to get the eigenvectors of A.
Example
Compute
 the eigenvalues
 of the following matrix
3 −2 5
A = 1 0 7
0 0 2
Solution
−→ We have already computed the eigenvalues of A as 1, 2, and 2.
λ = 1: λ = 2:
−→ Need to compute ker (A − (1)I) −→ Need to compute ker (A − (2)I)
λ=1: λ=2:
   
2 −2 5 1 −2 5
(A − I) = 1 −1 7 (A − 2I) = 1 −2 7
0 0 1 0 0 0
   
1 −1 0 1 −2 0
⇒ rref (A − I) = 0 0
 1 ⇒ rref (A − 2I) = 0 0
 1
0 0 0 0 0 0
   
-1   -2  
F2×1 F2×1
free(A − I) =  0  = free(A − 2I) =  0  =
01×1 01×1
0 0
   
1 2
⇒ B = 1 ⇒ B = 1
0 0
⇒ ker (A − I) = span {col(B)} ⇒ ker (A − 2I) = span {col(B)}
   
 1   2 
= span 1 = span 1
0 0
   
Results

Matrix A Eigenvalue Algebraic Eigenvector(s) Geometric


multiplicity of multiplicity of
eigenvalue eigenvalue
 
  1
3 −2 5 λ1 = 1 1 x1 = 1 1
1 0 7
0
0 0 2 2
λ2 = 2 2 x2 = 1 1
0

−→ We can check the results


Ax1 = λ1 x1
Ax2 = λ2 x2

−→ det(A) = product of the eigenvalues of A


−→ Tr (A) = sum of the diagonal values of A = sum of the eigenvalues of A
−→ Tr (A) is called the trace of A
Exercises
1. Compute the eigenvalues
 and eigenvectors of the matrix A given below
1 0 0
A = 0 1 0
0 0 0
(a) What is the algebraic muiltiplicity of each eigen value?
(b) What is the geometric muiltiplicity of each eigen value?
(c) What are the eigenspaces of A?
Remember that ker (A − λI) is an eigenspace of A if λ is an eigenvalue of A.

2. Consider the following image f [n]. Will the pixel n0 = [2 3]T survive after non maxima suppression
of the image gradient’s magnitude? Explain your answer.
 
10 16 20 180 250 251 250 244
 4 8 10 170 182 248 246 240
 
 3 4 10 20 164 200 242 251
 
 1 2 16 24 31 190 250 120
2 9 10 22 18 23 16 34
3. Hysteresis thresholding is to be applied to the following matrix.
 
10 20 250 170 50 10
 8 16 180 248 40 6 
 
 4 18 100 200 250 20 
2 8 18 144 200 172
(a) Using a low percentile of 40 (Plow = 40) and high percentile of 80 (Phigh = 80) classify each
pixel as 'strong', 'weak' or 'eliminated'.
(b) By applying hysteresis thresholding to the result of part(a) using K = 1 neighbor reclassify each
pixel as 'strong' or 'eliminated'.
(c) How does your result in part(b) change if the value of K is changed from 1 to 3?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 7
Estimating local structure within an image

−→ We want to understand the local structure of an image around a given pixel n (shown in red color).
−→ We make a small patch (of let’s say 5 × 5 pixels) centered around n, and examine the image.
−→ We then move the patch a little bit away from n in a certain direction d and examine the image.
−→ If the new patch looks similar (shown in blue color) we say that the strucure is similar in the
direction d.
−→ If the new patch looks different (shown in green color) we say that the strucure is different in the
direction d.
−→ Why must the patch be small in size?
Three important questions

−→ Given a pixel n (red) and a direction d (arrow) in an image f [n], how can we quantitatively measure
the change in structure of f as we move in the direction d, from n to n + d?

−→ If we are standing at n, in which direction d should we move so that the structure of f changes the
most rapidly?
#
−→ If we are standing at n, in which direction d should we move so that the structure of f changes
the most slowly?
−→ To answer these questions we need to compute the structure tensor at pixel n.
Tensor
−→ The 'simplest' definition of a tensor is that it is a multdimensional array of numbers.
−→ The number of indices needed to fully describe an element within the tensor is called the rank of
the tensor.
−→ A simple one dimensional array of numbers = vector = tensor of rank 1
−→ A simple two dimensional array of numbers = matrix = tensor of rank 2
−→ A simple N dimensional array of numbers = tensor of rank n
What kind of a tensor is a colored image?

−→ A colored images is 'basically' a simple 3 dimensional array of numbers (R, G and B frames).
−→ We need 3 indices to fully describe the an element (pixels) within the tensor (image)
−→ Colored images are tensors of rank 3.
Structure tensor
Let x represent the x1 direction and ley y represent the x2 direction.
−→ The structure tensor S of image f [n] at pixel [n] is a 2 × 2 matrix and is given by

(w ⊗ fx2 )[n] (w ⊗ fx fy )[n] (w ∗ fx2 )[n] (w ∗ fx fy )[n]


   
S[n] = = if w is symmetric
(w ⊗ fx fy )[n] (w ⊗ fy2 )[n] (w ∗ fx fy )[n] (w ∗ fy2 )[n]

−→ fx = image containing partial derivative of f in the x direction (DoG filter)


−→ fy = image containing partial derivative of f in the y direction (DoG filter)
−→ w[n] = patch weights (usually taken from a Gaussian distribution). The target pixel n is given the
highest weight i.e. 0.1621. For a 5 × 5 patch the Gaussian weights would be
 
0.0030 0.0133 0.0219 0.0133 0.0030
0.0133 0.0596 0.0983 0.0596 0.0133
 
w[n] = 0.0219 0.0983 0.1621 0.0983 0.0219


0.0133 0.0596 0.0983 0.0596 0.0133
0.0030 0.0133 0.0219 0.0133 0.0030

−→ Note: There are two Gaussian distributions used in this process. The one in the DoG filter has a
standard deviation of σ and the one used in w[n] has a standard deviation of ρ.
Example
For the following image calculate the structure tensor at pixel n = [2 3]T , using a 5 × 5 patch with Gaussian
weights.

 
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
 
 0 0 0 0 0 0 255 255 
 
 0 0 0 0 0 0 255 255
 
 0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the first question

Given a pixel n (red) and a direction d (arrow) in an image f [n], how can we quantitatively measure the
change in structure of f as we move in the direction d, from n to n + d?
−→ The Structural Similarity Difference SSD in the direction d at pixel n is given by

SSDd [n] = (distance between patches centered at n and n + d)2


T
≈ d S[n]d
−→ The structure tensor at pixel n is used
Example
For the following image calculate the change in structure at the pixel n = [2 3]T in the direction db = [−1 0]T ,
using a 5 × 5 patch with Gaussian weights.

 
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
 
 0 0 0 0 0 0 255 255 
 
 0 0 0 0 0 0 255 255
 
 0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the second question


If we are standing at n, in which direction d should we move so that the structure of f changes the most
rapidly?
−→ The eigenvector corresponding to the larger eigenvalue λmax of the structure tensor at n gives the

c∗ as
direction d , in which the structure changes the most rapidly. We usually take the unit vector d
the direction.
−→ The structure tensor at pixel n is used
−→ Prove that SSDdc∗ [n] = λmax
Example
For the following image calculate the direction at the pixel n = [2 3]T in which the structure changes the most
rapidly, using a 5 × 5 patch with Gaussian weights. Also give a numerical value for the change in structure.

 
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
 
 0 0 0 0 0 0 255 255 
 
 0 0 0 0 0 0 255 255
 
 0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the third question

#
If we are standing at n, in which direction d should we move so that the structure of f changes the most
slowly?
−→ The eigenvector corresponding to the smaller eigenvalue λmin of the structure tensor at n gives the
# # as
direction d , in which the structure changes the most slowly. We usually take the unit vector dc
the direction.
−→ The structure tensor at pixel n is used
−→ Prove that SSDdc# [n] = λmin
−→ λmin = SSDdc# [n] ≤ SSDdb[n] ≤ SSDdc∗ [n] = λmax
Example
For the following image calculate the direction at the pixel n = [2 3]T in which the structure changes the most
slowly, using a 5 × 5 patch with Gaussian weights. Also give a numerical value for the change in structure.

 
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
 
 0 0 0 0 0 0 255 255 
 
 0 0 0 0 0 0 255 255
 
 0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
The importance of edges
What will an image without edges look like?

f [n] (Image)
−→ No edges means no change in structure
(flat region). SSDd [n] = 0, ∀d, n, why?
−→ Edges define structure.
−→ A sharp change in direction while moving
along an edge is called a corner.
−→ Corner sharpness is a measure of how
'quickly' the edge direction changes
The red pixel shows a corner
around a corner.
Flat, Edge or corner?
−→ The structure tensor can help us in classifying a pixel as belonging to a flat region, an edge or a
corner in the image
Flat region Edge Corner

−→ The maximum and −→ The maximum −→ The maximum and


minimum numerical numerical change in minimum numerical
changes in structure are structure is significant changes in structure are
almost equal to 0. whereas the minimum significant.
⇒ λmax ≈ λmin ≈ 0 change in numerical ⇒ λmax > λmin >> 0
structure is almost 0.
⇒ λmax >> λmin ≈ 0.
Corner detectors
−→ If a particular pixel n is a corner in the image, then the eignevalues λmax , λmin of S[n] will be such
that
λmax > λmin >> 0

−→ |S| = λmin λmax ≈ 0 in flat regions and at edges but is significant at corners.
−→ Tr (S) = λmin + λmax ≈ 0 in flat regions, is slightly significant at edges, but is very significant at
corners.
Name of corner detector Measure of corner sharpness Detection criterion
Simple corner detector λmin λmin > τ
Rohr corner detector |S| |S| > τ
|S| λmin λmax
Harris corner detector = |S| > τ
Tr (S) λmin + λmax
|S| λmin λmax
*Note = ≈ λmin at non-corners.
Tr (S) λmin + λmax
−→ The Rohr corner detector suggests an alternative measure to λmin , to bypass the computations
needed to get λmin whereas the Harris corner detector suggests a computationally lesser expensive
alternative to λmin .
Results

Harris corners detected with σ = 0.2, ρ = 2 and τ = 90th percentile of Tr (S)


Results

Harris corners detected with σ = 0.5, ρ = 2 and τ = 80th percentile of Tr (S)


Results

Rohr corners detected with σ = 1, ρ = 6 and τ = 98th percentile of |S|


Results

Rohr corners detected with σ = 3, ρ = 6 and τ = 95th percentile of |S|


Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 9
Structure Tensor
What is necessary and what is supplementary?
−→ The necessary things are:
▶ Computing the structure tensor of an image f at index n.
c∗ of the structure
#, d
▶ Computing the eigenvalues λmin , λmax and corresponding (unit) eigenvectors dc
tensor.
▶ dc c∗ give the respective directions for the minimum and maximum change in the structure
# and d

of the image f at pixel n.


▶ Comparing the eigen values of the structure tensor.
λmax ≈ λmin ≈ 0 ⇒ flat region
λmax >> λmin ≈ 0 ⇒ edge
λmax > λmin >> 0 ⇒ corner

−→ The rest is supplementary


Detecting straight lines in images

Parameters of a straight line


−→ The parameters of a stright line are (two) numbers which completely define the line. There are
different ways of specifying the parameters of a line.
Slope and intercept form (m − c form)

−→ The slope m and intercept c completely define the line


Image space and parameter space (m − c form)

−→ For a point m0 , c0 in the parameter space there is a line y = m0 x + c0 in the image space. They
correspond to each other.
−→ We now fix a point xi , yi on the line in the image space and allow the parameters m0 and c0 to be
variables i.e. m and c. This gives a line in the parameter space, with slope −xi and vertical intercept
yi .
−→ For a point xi , yi in the the image space there is a line c = −xi m + yi in the parameter space. They
correspond to each other.
−→ Multiple points in the image space will form multiple lines in the parameter space.
−→ Colinear points in the image space will form lines in the parameter space, that intersect at a common
point (m0 , c0 ).
−→ This point of intersection gives the parameters of the line connecting the original colinear points.
Detecting a single line using the slope intercept form

−→ Make a 2D array representing the parameter space and set each value to 0. This 2D array is called
the accumulator array, denoted by A(m, c) over here.
−→ For each pixel on the the original line in the image space draw the corresponding line in the
accumulator array (parameter space), giving a vote of +1 to each cell it passes through.
−→ The cell with the highest votes represents the parameters of the original line in the image space.
Problem of range with slope intercept form
−→ Vertical lines in the image space correspond to a slope of ±∞ in the paramter space.
−→ These vertical lines cannot be marked (as a point) in the parameter space.
−→ Almost vertical lines in the image space correspond to a slope whose magnitude is very large in the
parameter space.
−→ Almost vertical lines cannot be marked (as a point) in the parameter space unless the accumulator
array is very large.
−→ The solution is an alternative parameterization of the line in image space, which is called the angle
distance form.
Angle distance form (θ − ρ form)

−→ The angle θ and distance ρ (as shown in the figure) completely define the line.
−→ The distane ρ is usually measured in number of pixels.
−→ Here we are taking the center of the image as the origin. Alternatively we could take the top
left corner of the image as the origin, and parameterize about the center xc = [xc yc ]T to get the
following equation in image space (for the same ρ and θ as shown in the figure).
(x − xc ) cos (θ) + (y − yc ) sin (θ) = ρ
Image space and parameter space

−→ For a point θ0 , ρ0 in the parameter space there is a line x cos (θ0 ) + y sin (θ0 ) = ρ0 in the image
space. They correspond to each other.
−→ No matter which line we draw in the image space, its corresponding point θ0 , ρ0 in the parameter
space will be such that 0 ≤ θ0 ≤ π and −ρmax ≤ ρ0 ≤ ρmax . ρmax = half diagonal length of image.
−→ The range of the accumulator array is such that 0 ≤ θ0 ≤ π and −ρmax ≤ ρ0 ≤ ρmax .
−→ For a point (x0 , y0 ) in the image space there is a sinusoid ρ = x0 cos (θ) + y0 sin (θ) in the parameter
space. The point and sinusoid correspond to each other.
−→ All lines have an angle parameter that is between 0 and π if the radius parameter is allowed to be
negative.
−→ Multiple points in the image space will form multiple sinusoids in the parameter space.
−→ Colinear points in the image space will form sinusoids in the parameter space, that intersect at a
common point (θ0 , ρ0 ).
−→ This point of intersection gives the parameters of the original line in the image space.
Detecting a single line using the angle distance form (Hough transform)
−→ Make an accumulator array whose dimensions θ and ρ vary from 0 to π and from −ρmax to ρmax
respectively. Set each value in the accumulator array to 0.
−→ For each pixel on the original line in the image space array draw the corresponding sinusoid in the
accumalator array (parameter space), giving a vote of +1 to each index it passes through.
−→ The index with the highest votes represents the parameters of the line.
−→ We can now use this index as the parameters (θ0 , ρ0 ) to draw the original line in the image space.
Results
−→ This is a computationally expensive algorithm. Can we do better?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering

UET (Lahore)

Lecture 10
Detecting circles in images

Parameters of a circle
−→ The parameters of a circle are (three) numbers which completely define that circle. The most popular
parameters of a circle are its two center co-ordinates and its radius.
Center radius form (c − r form)

−→ The center c = [a b]T = (a, b) and radius r completely define the circle
Image space and parameter space (c − r form)

−→ For a point (a0 , b0 ), r0 in the parameter space there is a circle r02 = (x − a0 )2 + (y − b0 )2 in the image
space. They correspond to each other.
−→ The circle r02 = (x − a0 )2 + (y − b0 )2 is said to be parameterized in the center-radius form (c − r
form).
−→ We now fix a point xi , yi on the circle in the image space and allow the parameters (a0 , b0 ) to be
variables i.e. (a, b). This gives a circle in the parameter space, (with center (xi , yi ) and radius r0 ).
−→ For a point (xi , yi ) in the the image space there is a circle r02 = (a − xi )2 + (b − yi )2 in the parameter
space. They correspond to each other.
−→ Multiple points in the image space will form multiple circles in the parameter space.
(xi , yi ) 7→ r02 = (a − xi )2 + (b − yi )2 , 1 ≤ i ≤ n
−→ Concyclic points (with known radius r0 ) in the image space will form circles in the parameter space,
that intersect at a common point (a0 , b0 ).
−→ This point of intersection gives the center of the (orange) circle connecting the original concyclic
points.
Detecting a single circle using the center radius form
−→ Choose the radius r0 of the circle that is to be detected.
−→ Make an accumulator array A(a, b) representing the parameter space for the center of the circle and
set each value to 0.
−→ For each pixel (xi , yi ) on the the edges in the image space draw the corresponding circle r02 =
(x − a)2 + (y − b)2 in the accumulator array (parameter space), giving a vote of +1 to each cell it
passes through.
−→ The cell with the highest votes represents the center (a0 , b0 ) of the original circle in the image space.

What should be the size of the accumalator array?


Results
−→ This is a computationally expensive algorithm. Can we do better?

   
−→ a0 = xi ± r ∇f
d (xi , yi )
0
b0 yi
−→ For each point xi , yi on the edges in the image space mark two points on the corresponding circle in
the parameter space. These two points are given by the equation on the previous slide
−→ Concyclic points in the image space will vote for a common point (a0 , b0 ) in the parameter space.
−→ This common point is the center of the (orange) circle in the image space
−→ Instead of drawing whole circles in the parameter space we only mark two points on that circle
(where we 'expect' the maximum to occur)
−→ Can we make the line detection algorithm using the Hough transform more efficient in a similar
way?
−→ For each point xi , yi on the edges in the image space mark one point on the corresponding sinusoid
in the parameter space. This point is given by
(θ0 , ρ0 ), where
θa = ∠∇f (xi , yi ), θb = ∠∇f (xi , yi ) − π : 0 ≤ θa ≤ 2π, 0 ≤ θb ≤ 2π
θ0 = min(θa , θb )
ρ0 = xi cos (θ0 ) + yi sin (θ0 )
−→ Colinear points in the image space will vote for a common point (θ0 , ρ0 ) in the parameter space.
−→ This common point contains the parameters of the line in the image space.
−→ Instead of drawing whole sinusoids in the parameter space we only mark one point on that sinusoid
(where we 'expect' the maximum to occur)
Generalized Hough Transform
−→ The computer is first trained to recognize
the given object by learning the ϕ table of
the object.
Training the computer (learning the ϕ table)
1. The interval 0 to 2π is divided into n parts, (to
represent
 thedifferent possible edge directions)

ϕk = k .
n
2. Compute the centroid (xc , yc ) of the object.
3. Insert the data r in = (rni , αni ) (shown in red color)
of each pixel n on the boundary of the object into
row i of the ϕ table (see figure). The correct row i is
decided (based on the edge direction of the pixel n)
using
i = argmin (|∠(∇f [n]) − ϕk |), 0 ≤ ∠(∇f [n]) ≤ 2π
k
−→ The computer is now tested to recognize
the given object, using the object’s ϕ table.
Testing the computer (using the ϕ table)
1. For each pixel n on the edges in the image the
correct rows u and v in the ϕ table are decided using
u = argmin (|∠(+∇f [n]) − ϕk |)
k
v = argmin (|∠(−∇f [n]) − ϕk |)
k

2. Use each entry k in row u and row v of the ϕ


table to make estimates (in the parameter space) of
the centroid c = (xc , yc ) as follows
u v
   
cos (α ) cos (α )
c = n ± rku u
k , c = n ± rkv k
v
sin (αk ) sin (αk )
[rku αku ] ∈ row u and [rkv αkv ] ∈ row v

3. A maximum in the parameter space will indicate


the centroid of the object.
Results
Scale and orientation

−→ We need something called an image descriptor.


−→ SIFT - Scale Invariant Feature transform
Exercises
1. Consider the following image
 
10 16 20 180 250 251 250 244
 4 8 10 170 182 248 246 240
 
 3 4 10 20 164 200 242 251
 
 1 2 16 24 31 190 250 120
2 9 10 22 18 23 16 34
One unit of length is 10 pixels. Lines in the image space are to be detected using the Hough
transform.
(a) Give the equation of the line in parameter space corresponding to the pixel P with a value of 164
in the image space.
(b) Which cells in the accumulator array (parameter space) will be marked by P if the computationally
efficient version of the Hough transform is used?
2. Once again consider the image in Problem 1. One unit of length = 10 pixels. Circles of radius 2.4
units in the image space are to be detected using the Hough transform.
(a) Give the equation of the circle in parameter space corresponding to the pixel P with a value of 164
in the image space.
(b) Which cells in the accumulator array (parameter space) will be marked by P if the computationally
efficient version of the Hough transform is used?

3. Once again consider the image in Problem 1. One unit of length = 10 pixels. An object in the image
space with centroid (1.8, 2.2) is to be detected using the generalized Hough transform. The interval
from 0 to 2π is divided into 8 portions for the ϕ table.

(a) Which row in the ϕ table corresponding to the pixel P with a value of 164 will be updated?
(b) What will get stored in this row?

You might also like