Lecture1 Merged
Lecture1 Merged
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 1
Welcome to Image Processing and Computer Vision!
−→ If the R, G and B values are made equal then the pixel becomes a shade of grey
−→ Sometimes the R, G and B values are each stored as a 24-bit number. This is called true color.
18 20 . . . 30 23 25 . . . 34
−→ Each pixel has its own color. 18 20 . . . 24 23 25 . . . 30
−→ Color information is usually but not
.. .. .. ..
. . . .
always stored by saving the R, G and B
78 80 . . . 23 102 104 . . . 25
values of each pixel.
−→ The image is stored as three 2D arrays (of 27 29 . . . 23
27 29 . . . 30
bytes). R, G and B
.. .. channels
There is one array for R values, one array . .
for G values and one array for B values. 86 88 . . . 20
How the computer sees the Colored image
−→ Each channel is itself a grayscale image
G channel B channel
Understanding videos
...
−→ A video is just a sequence of images. Each image in the video is called a frame.
−→ A small amount of motion takes place between two consecutive frames
−→ When the frames are played very rapidly we get the illusion of motion. The number of images
played in one second is called the framerate.
−→ Videos usually have a framerate of 25 frames per second.
−→ A HD video 'usually' has a resolution of 1920x1080 pixels and a framerate of 25 frames per second.
How are videos stored in a computer?
Videos are stored as a sequence of image arrays.
• Black and white videos (a sequence of black and white image arrays)
• Greyscale videos (a sequence of grayscale image arrays)
• Colored videos (three sequences of grayscale image arrays, namely the R, G and B channels.
G channel
Binary image arrays R Channel
Grayscale videos
B channel
What do you see in the image? Do you see shapes in the left image?
Is it the number 2 or lots of colored circles? Are the regions A and B of different colors?
Both answers are technically correct but different. −→ We may see things which are not there
−→ Meaningful information can be subjective! because our brain 'fills in the blanks'.
−→ Meaningful information can be deceptive!
−→ In Computer Vision we are expecting a computer to take an array of numbers and see what we are
seeing even if those things are not there!
−→ Computer Vision is a challenging, subjective and deceptive problem to solve!
Extracting information from images
−→ Images are stored
as arrays (or matrices) inside the computer
x11 x12 . . . x1n
x
21 x22 . . . x2n
.. ..
. .
xm1 xm2 . . . xmn
Image Statistics
−→ For any portion of the image array we can compute things like the mean, the variance, a histogram,
a probabilty distribution etc . . . These are called image statistics.
Geometry within images
−→ For any portion of the image array we can check whether certain kinds of shapes are present within
the image. We can locate lines, circles and other kinds of shapes within that portion (Hough
transform). This called geometry within the images
−→ Computer vision algorithms are 'broadly' categorized into three types.
(i) Statistics based algorithms (use image statistics)
(ii) Geometry based algorithms (use geometry within images)
(iii)Algorithms which attempt to mimic the human visual systems, e.g. neural networks.
Some interesting problems entailing Computer Vision
Industrial Inspection
Course information
−→ Join the What’s App group ''CMPE-443/674 IPCV Fall24''
How to reach me
−→ Send me a text or audio message on What’s App (fastest)
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 2
Mathematical terms and their notation
Scalars, vectors and matrices
−→ Scalars are denoted by lower case letters such as s, a, b etc . . .
−→ vectors will be denoted with boldface lower case letter having a bar such as x, y.
−→ Matrices will be denoted with upper case letters such M, X etc . . .
−→ x will be treated as column vector in this course.
x1
x2
e.g. x ∈ R4 ⇒ x = x3
x4
p p
∥x∥ = x21 + x22 ∥x∥ = x21 + x22 + x23
−→ Magnitude is also called Euclidean norm
Unit vector
A vector whose Euclidean norm is 1 is called a unit vector. ∥x∥ = 1 ⇔ x is a unit vector.
b = [0.1 0.2 0.2]T
−→ Unit vectors will be denoted with boldface lower case letter having a hat. e.g. x
Normalization of a vector
−→ Dividing a vector by its magnitude will make it a unit vector (called normalization of a vector).
x
xb= is the normalization of x
∥x∥
·
Then x y = ⟨x, y⟩ = (x, y) = x1 y1 x2 y2 . . . xd yd = xT y = ∥x∥ ∥y∥ cos θ
−→ The dot product can be used to compute the similarity between two vectors based on the angle
between them.
θ = arcos
x y ·
∥x∥ ∥y∥
−→ Two vectors are said to be similar when
they are aligned.
·
−→ θ = 90◦ ⇒ x y = 0.
·
−→ If x and y are orthonormal vectors then ∥x∥ = 1, ∥y∥ = 1 and x y = xT y = 0.
−→ The dot product can also be used to compute the Euclidean norm of a vector.
T
Let x ∈ Rd ⇒ x = x1 x2 . . . xd
√ √
p
· p p
∥x∥ = x1 + x2 + · · · + xd = x x = ⟨x, x⟩ = (x, x) = xT x
2 2 2
T
x 2
x2 y1 x2 y2 . . . x2 yk
xy = y1 y2 . . . yk = .
.
. . . .. ..
xd xd y1 xd y2 . . . xd yk
I am standing at point X in the domain of a funtion f . In which direction should I move so that I experience
the maximum rate of increase in the function? Answer: Move in the direction of the gradient of f = ∇f .
−→ ∇f is a vector in the domain of the function f , whose direction points the maximum rate of increase
in the function, and whose magnitude gives the value of the maximum rate of change of the function.
Computing the gradient
If f (x) is a multivariate function where x ∈ Rd
∂f
∂x1
∂f
Then ∇f = ∂x
2
..
.
∂f
∂xd
Example
x
f (x) = sin (x1 ) + cos (x2 ) + 5, x = 1 ∈ R2
x
2
−5.588
We are standing at the point x0 = in the domain of f . In which direction should we move so that
−2.05
the function increases the most rapidly? What is the maximum rate of the change of the function?
Solution
∂f
∂x1 cos (x1 ) −5.588 cos (−5.588) 0.768
∇f =
∂f = − sin (x2 ) . At x = x0 = −2.05 , ∇f = − sin (−2.05) ≈ 0.887
∂x2
0.768
So we need to move in the direction
0.887
√
The maximum rate of change of the function = ∥∇f ∥ = 0.7682 + 0.8872 ≈ 1.71
−→ This means that if we are standing at x and move a small amount dl in the direction 0.768
0 the
0.887
function will increase by ≈ 1.71dl
−→ −∇f gives us the direction in which the function f decreases the most rapidly.
Example
x 1 1
f (x) = (x1 − 1)2 + (x2 − 2)2 + 4, x = ∈ R2 , We are at x0 = in the domain of f .
x2 2
In which direction should we move so that the function increases the most rapidly?
Solution
∂f
∂x1 2(x1 − 1)
∇f = ∂f = 2(x2 − 2)
∂x2
1 2(1 − 1) 0
At x = x0 = , ∇f = =
2 2(2 − 2) 0
The gradient (at x0 ) = 0. So which direction do we move in?
−→ When the gradient is 0 it means that the function is 'flat' at the given point. The function is (locally)
minimum or maximum at the given point.
Difference between gradient and derivative
1. A derivative is computed with respect to any one variable of the input vector, whereas the gradient is
computed with respect to all the variables of the input vector
2. The derivative is a scalar quantity whereas the gradient is a vector quantity.
Partial gradient
The gradient is computed with respect to only some of the variables in the input vector.
Example
x1
2
x2 x 1 x 3 x2
x3 , p = x2 , q = x4 , r = x3
f (x) = x1 + sin (x2 ) + cos (x3 ) + 8x4 , x =
x4
−→ ∇p f = partial gradient of f w.r.t. p
∂f ∂f ∂f
∂x1 2x 1
∂x3 − sin (x 3 ) ∂x2 cos (x 2 )
⇒ ∇p f =
∂f = cos (x2 ) , ∇q f = ∂f =
, ∇r f =
∂f = − sin (x3 )
8
∂x2 ∂x4 ∂x3
What does the partial gradient tell us?
Example
x1 2
x2
f (x) = x21 + sin (x2 ) + cos (x3 ) + 8x4 , x = , x0 = 3
x3 −1
x4 −2
x
We are standing at x0 . How should we move in the 2 direction so that the function decreses the most
x4
rapidly?
Solution
x2 x 3
Let t = . We need to move in the direction of −∇t f at 2 =
x4 x4 −2
∂f
∂x2 − cos (x2 ) x 3 − cos (3) −0.99
⇒ −∇t f = − ∂f =
. At 2 = , − ∇t f = ≈
−8 x4 −2 −8 −8
∂x4
Gradient of a vector single variable function
If we have a function
df1 (x)
f1 (x) dx
..
..
f (x) = . ⇒ ∇f = .
fn (x) df (x)
n
dx
−→ ∇x y T x = ∇x xT y = y
−→ ∇x M x = M T
−→ ∇x xT Ax = (A + AT )x
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 3
Discrete domain and continuous domain functions
Examples of Scalar Single Variable (SSV) functions
Discrete domain
πn
f [n] = cos
8
Continuous domain
πx
f (x) = cos
8
Examples of Scalar Multi-Variable (SMV) functions
Discrete domain
πn πn
1 2
f [n] = cos cos
8 4
Continuous domain
πx πx
1 2
f (x) = cos cos
8 4
Examples of Vector Single Variable (VSV) functions
Continuous domain Discrete domain
f1 (x) sin (πx/8) f1 [n] sin (πn/8)
f (x) = = f [n] = =
f2 (x) e−|x| f2 [n] e−|n|
Examples of Vector Multi-Variable (VMV) functions
Continuous domain Discrete domain
−|x | −|n |
f1 (x) e 2 cos (0.2πx1 ) x1 f1 [n] e 2
cos (0.2πn1 ) n
f (x) = = −|x1 | ,x = ∈ R2 f (n) = = −|n1 | ,n = 1
f2 (x) e sin (0.4πx2 ) x2 f2 [n] e sin (0.4πn2 ) n2
Grayscale images
−→ A grayscale image is a discrete domain SMV function.
−→ A colored
image is
a VMV function
−→ r(n) r(n1, n2)
i(n) = g(n) = g(n1, n2)
b(n) b(n1, n2)
Colored Image
R channel
G channel
B channel
Linear Transformations
−→ These are VMV functions of the form f (x) = M x or f [n] = M n, where M is a matrix of
constants.
−→ If the matrix M is kxd then it will map a d-dimensional vector to a k-dimensional vector.
Continuous domain Discrete domain
f (x) = M x = x b f [n] = M n = n b
m11 m12 . . . m1d x1 x
b1 m11 m12 . . . m1d n1 n
b1
m m . . . m m m . . . m
21 22 2d x2 x
b2 21 22 2d n2 n
b2
.. .. = .. .. =
. . . . . . . . . . . . . . . .
mk1 mk2 . . . mkd xd x
bk mk1 mk2 . . . mkd nd n
bk
M = 0 1 0
0 k 1
−→ f (x) = M x
Eigenvectors
−→ The mapped vector is 'similar' to the original vector.
−→ The mapped vector is a constant times the original vector. This constant is called the eigenvalue of
the eigenvector.
Example of an eigenvector in 2D
−4 1 4
x= is an eigenvector of M = because M x = −2x
3 3 2
The eigenvalue is − 2
Example of an eigenvector in 3D
1 3 4 −2
x = 1 is an eigenvector of M = 1 4 1 because M x = 3x
2 2 6 1
The eigenvalue is 3
Image Processing through linear transformations
Rotating images by θ about a point c in the image plane
Rotating grayscale images
−→ M is a matrix for rotation by angle θ in the image plane.
cos (θ) − sin (θ) −1 cos (−θ) − sin (−θ)
M= ⇒M =
sin (θ) cos (θ) sin (−θ) cos (−θ)
−→ The pixel n0 moves to the pixel position n = f [n0 ] = c + M (n0 − c).
−1
−→ The pixel n comes from the pixel position n0 = f [n] = c + M −1 (n − c).
−→ The new image i at the target pixel n = the original image i0 at the source pixel N N (n0 ).
i[n] = i0 [N N (n0 )], N N (n0 ) is the nearest neighbor of n0
−→ The value of the original image i0 at pixels outside the image frame is taken to be 0.
−→ To rotate a colored image each channel is rotated separately in the same manner.
Scaling images by k with reference to a point c in the image plane
Scaling grayscale images
−→ M is a matrix for scaling by a factor k in the image plane.
−1
k 0 k 0
M= ⇒ M −1 =
0 k 0 k −1
−→ The pixel n0 moves to the pixel position n = f [n0 ] = c + M (n0 − c).
−1
−→ The pixel n comes from the pixel position n0 = f [n] = c + M −1 (n − c).
−→ The new image i at the target pixel n = the original image i0 at the source pixel N N (n0 ).
i[n] = i0 [N N (n0 )], N N (n0 ) is the nearest neighbor of n0
−→ The value of the original image i0 at pixels outside the image frame is taken to be 0.
−→ To scale a colored image each channel is rotated separately in the same manner.
Exercises
1. An image pixel n is to be rotated about a point c by θrad and the vector corresponding to the
resulting point scaled with reference to c by a factor of 0.25, through a function f [n].
(a) Write down a mathematical expression for f [n]
(b) Write down a mathematical expression for the resulting pixel p
2. An (continuous domain) image pixel x is to be rotated about a point c1 by θrad and the vector
corresponding to the resulting point scaled with reference to c2 by a factor of 0.25, through a
function f (x). Write down a mathematical expression for f (x)
3. A 1280p video frame i[n] is rotated by 90◦ about the point c ∈ R2 which is the 'perfect' center of
the frame to give a new 1280p frame bi[n].
(a) What are the coordinates of c?
(b) Write down bi[n] in terms of i[n].
(c) How many pixels have an unknown value in bi[n]?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 4
Using a continuous domain equivalent of a discrete domain function
Continuous domain equivalent of a discrete domain SSV function
Let f [n] be a discrete domain SSV function.
−→ We assume that the samples i.e. f [n] are
1 unit apart, (in the dimension n) w.r.t. a
continuous variable x.
−→ We now imagine a smooth function f (x)
connecting the 'heads' of f [n].
−→ The value of f (x) is only known 'correctly'
when x = n.
Continuous domain equivalent of a discrete domain SMV function
Let f [n], n = [n1 , n2 ]T be a discrete domain SMV
function.
−→ We assume that the samples i.e. f [n] are
1 unit apart (in each dimension n1 , n2 ),
w.r.t. a continuous variable x = [x1 , x2 ]T .
−→ We now imagine a smooth function f (x)
connecting the 'heads' of f [n].
−→ The value of f (x) is only known
'correctly' when x = n.
Estimating derivatives from a discrete domain function
Estimating the derivatives of a SSV function f (x) from its samples f [n]
Goal
To be able to estimate the derivatives of the function
f (x) from its samples f [n], at any value of n.
Solution
−→ f [n] is replaced with its continuous
domain equivalent function f (x).
−→ The Taylor series of f (x) is used to
estimate the derivatives of f at any value
of n.
Using the Taylor series to estimate derivatives of a SSV function.
Let f [n] be a discrete domain SSV function, and let f (x) be its continuous domain equivalent.
f (x) ≈ f [n]
−→ df (x) df f [n + 1] − f [n − 1]
= [n] ≈
dx n dx 2
2
−→ d f (x) d2 f
= [n] ≈ f [n + 1] − 2f [n] + f [n − 1]
dx2 n dx2
Estimating the gradient of a SMV function f (x) from its samples f [n]
Goal
To be able to estimate the partial derivatives (and
hence gradient) of the function f [n] at any discrete
point.
Solution
−→ f [n] is replaced with its continuous
domain equivalent function f (x).
−→ The Taylor series approximation for the
derivatives of an SSV function is used to
calculate the partial derivatives of f (x).
Using the Taylor series to estimate gradient of a SMV function.
Let f [n] be a discrete domain SMV function, and let f (x) be its continuous domain equivalent.
f (x) ≈ f [n]
−→ We will skip the derivation
−→ The image i is shifted so that the target index n = [1 5]T is aligned under the origin (anchor) =
[0 0]T of the filter.
−→ The multiply add operation is applied between the filter g and image i to get the result h at the target
pixel n.
Mathematical notation
h[n] = (g ⊗ i)[n] (filter g applied to f )
Dealing with boundaries
Image
Filter
112 45 75 128 189 216 214 40
250 35 126 101 41 219 108 4
250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50
97 13 207 229 111 216 56 178
219 123 230 105 171 55 107 81
219 123 230 105 171 55 107 81
If parts of the filter need to operate on image values ouside the image boundary then what do we do?
−→ There are two popular ways to deal with this problem (for pixels outside the image boundary).
• Simply use 0 for the value of the image.
• Copy the value of the image value from the nearest pixel in the image.
−→ The choice of how we deal with this problem will depend on the scenario
.
Filtering
Filtering of SSV functions
Function (f ) Filter (g)
−→ The function f is shifted so that the target index n = 3 is aligned under the origin (anchor) = 0 of
the filter.
−→ The multiply add operation is applied between the filter g and function f to get the result h at the
target index n.
Mathematical notation
X2 X∞
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[k]f [k + n] (filter g applied to f )
k=−1 k=−∞
Correlation and convolution (SSV functions)
−→ The shift multiply add operation (used for filtering) is called correlation denoted by ⊗.
−→ The filtering result h[n] is obtained by computing the correlation of the filter g[n] with the function
f [n].
∞
X
h[n] = (g ⊗ f )[n] = g[k]f [k + n]
k=−∞
−→ If we flip the filter before the shift multiply add operation then we get a new result
∞
X
hnew [n] = (gf lipped ⊗ f )[n] = gf lipped [k]f [k + n]
k=−∞
X∞ ∞
X
= g[−k]f [k + n] = g[k]f [n − k] = (g ∗ f )[n]
k=−∞ k=−∞
−→ (gf lipped ⊗ f )[n] = (g ∗ f )[n] ⇒ (g ⊗ f )[n] = (gf lipped ∗ f )[n].
−→ We can apply the filter g to the function f using convolution instead of correlation, but convolution
and correlation are slightly different operations.
Filtering of SMV functions
Function (f ) Filter (g)
−→ The function f is shifted so that the target index n = [1 5]T is aligned under the origin (anchor) =
[0 0]T .
−→ The multiply add operation is applied between the filter g and function f to get the result h at the
target index n.
Mathematical notation
Let V = dom(g), R ⊂ V : k ∈ / R ⇒ f [k] = 0, R is the active region of the filter.
X X
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[n]f [k + n] (filter g applied to f )
k∈R k∈V
Correlation and convolution (SMV functions)
−→ The shift multiply add operation (used for filtering) is called correlation denoted by ⊗.
−→ The filtering result h[n] is obtained by computing the correlation of the filter g[n] with the function
f [n].
X
h[n] = (g ⊗ f )[n] = g[k]f [k + n]
k∈V
−→ If we flip the filter before the shift multiply add operation then we get a new result
X
hnew [n] = (gf lipped ⊗ f )[n] = gf lipped [k]f [k + n]
k∈V
X X
= g[−k]f [k + n] = g[k]f [n − k] = (g ∗ f )[n]
k∈V k∈V
where V = dom(g)
−→ (gf lipped ⊗ f )[n] = (g ∗ f )[n] ⇒ (g ⊗ f )[n] = (gf lipped ∗ f )[n].
−→ We can do filtering of SMV functions using convolution instead of correlation, but SMV convolution
and SMV correlation are slightly different operations.
Filtering grayscale images
Example
159 195 93 95 113 144 125 100
156 105 112 246 170 9 151 242
64 30 196 192 189 235 135 189
81 219 2 70 113 115 133 214
The image i is to be processed so that the value at each pixel n = [n1 , n2 ] is replaced with the average value
of its eight neigbors.
(a) What filter should be used for doing filtering through correlation?
(b) Write down a mathematical expresion for the filtering operation in the form of a 2-variable summation.
Solution
(a) The filter will be a 3x3 sqaure with a 0 in the center (origin) and each other value as 1/8.
(b) Let R ⊂ dom(g) : k ∈ / R ⇒ f [k] = 0
Let h[n] be the processed image
X 1
X 1
X
h[n] = (g ⊗ f )[n] = g[k]f [k + n] = g[k1 , k2 ]f [k1 + n1 , k2 + n2 ]
k∈R k1 =−1k2 =−1
Filtering grayscale images
Example
The image i[n], n = [n1 , n2 ] is to be processed so that it shifts 1 pixel to the right and 2 pixels upward.
(a) What filter g[n] should be used for doing filtering through convolution?
(b) Write down a mathematical expresion for the filtering operation in the form of a 2-variable summation.
(c) Compute the result h[n] (4x8 image) of part (b)
Solution
Try this yourself!
Why use convolution instead of correlation?
−→ Convolution has some nice properties which are not present in exactly the same manner in correlation.
Properties of convolution
SSV functions SMV functions
Commutative Property: Commutative Property:
(f ∗ g)[n] = (g ∗ f )[n] (f ∗ g)[n] = (g ∗ f )[n]
Associative Property: Associative Property:
(f ∗ g ∗ h)[n] = (f ∗ g) ∗ h[n] = f ∗ (g ∗ h)[n] (f ∗ g ∗ h)[n] = (f ∗ g) ∗ h[n] = f ∗ (g ∗ h)[n]
Homogenous Property: Homogenous Property:
(f ∗ kg)[n] = k(f ∗ g)[n], k ∈ R (f ∗ kg)[n] = k(f ∗ g)[n], k ∈ R
Additive Property: Additive Property:
(f ∗ (g + h))[n] = (f ∗ g)[n] + (f ∗ h)[n] (f ∗ (g + h))[n] = (f ∗ g)[n] + (f ∗ h)[n]
Homogenous + Additive ⇒ Linear Homogenous + Additive ⇒ Linear
(f ∗ (ag + bh))[n] = a(f ∗ g)[n] + b(f ∗ h)[n], (f ∗ (ag + bh))[n] = a(f ∗ g)[n] + b(f ∗ h)[n],
a, b ∈ R a, b ∈ R
Exercises
1. On slide 7 generate the 4x8 image h[n] by doing the following operations
(a) h[n] = (g ⊗ i)[n]
(b) h[n] = (gf lipped ∗ i)[n],
where g is the x1 partial derivative filter
4. Revisit the problem on slide 14 and compute the 4x8 image (g ⊗ i)[n]
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 5
Processing discrete signals and images
Normalized Image
SMV functions
Stem plot Array or matrix
Normalized Image
Filtering of discrete signals
Filtering SSV signals
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
35 126 200 101 41 219 108 4 35 126 200 101 41 219 108 4
−→ Correlation and convolution are equivalent when the first function (filter) is symmetric.
Filtering SMV signals
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
250 35 126 101 41 219 108 4 250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50 143 78 88 234 74 154 27 50
219 123 230 105 171 55 107 81 219 123 230 105 171 55 107 81
Filter g[n]: Filter gf lipped [n] = g[−n]:
-1 1 1 1 2 -1
-1 2 1 1 1 -1
Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n]:
535 -54 318 117 200 505 1 -100 535 -54 318 117 200 505 1 -100
649 12 524 470 227 547 -157 -31 649 12 524 470 227 547 -157 -31
782 280 686 371 286 153 163 78 782 280 686 371 286 153 163 78
−→ Correlation and convolution can be interchanged if the first function (filter) is flipped.
Filtering SMV signals with a symmetric filter
Filtering through correlation Filtering through convolution
Function f [n]: Function f [n]:
250 35 126 101 41 219 108 4 250 35 126 101 41 219 108 4
143 78 88 234 74 154 27 50 143 78 88 234 74 154 27 50
219 123 230 105 171 55 107 81 219 123 230 105 171 55 107 81
Filter g[n]: Filter gf lipped [n] = g[−n] = g[n]:
-1 2 -1 -1 2 -1
1 3 1 1 3 1
-1 2 -1 -1 2 -1
Result h[n] = (g ⊗ f )[n]: Result h[n] = (gf lipped ∗ f )[n] = (g ∗ f )[n]:
993 406 993 406
1287 1287
1
1 1 1 Why are we doing (g ∗ f ) instead of (g ⊗ f )?
3
Smoothing filter g(n)
−→ The value at each pixel is replaced with the average of that of itself and that of its immediate (two)
neighbors, in oreder to smooth the profile of the function.
1
1 2 1
4
Smoothing filter g(n)
Gaussian smoothing
−→ Give more weight to the target pixel n using a normalized Gaussian distribution
1 1 1
1
1 1 1
3
1 1 1
Smoothing filter g(n)
−→ Advantage of smoothing: The noise (random change) in the function is reduced.
−→ Disadvantage of smoothing: The function gets blurred. Sharp changes (details) are lost.
Weighted smoothing
−→ Give more weight to the target pixel n to reduce blurring effects
1 2 1
1
2 4 2
16
1 2 1
Smoothing filter g(n)
Gaussian smoothing
−→ Give more weight to the target pixel n using a Gaussian distribution
−→ ∇f (x) = df (x) df (x)
= x
b, x b = unit vector in the x direction.
dx dx
−→ The derivative is a part of
the gradient.
f [x + 1] + f [x − 1]
Domain point x ∇f (x) = ∥∇f (x)∥ dir(∇f (x))
2
0 [0] 0 any
5 [-0.3827] 0.3827 [-1]
10 [0.3827] 0.3827 [ 1]
Edge Detection
SSV function
Detecting vertical edges: x2 derivative Detecting the magnitide (norm) of the gradient
Edge detection of noisy images
−1 1 1
∂f (x) 1 1 ∂f (x) 1
= 0 ⊗ 1 2 1 ⊗ f [n] = −1 0 1 ⊗ 2 ⊗ f [n]
∂x1 2 4 ∂x2 2 4
1 1
1 1 1
1 1 1
= 0 ∗ 1 2 1 ∗ f [n] = 1 0 −1 ∗ 2 ∗ f [n]
2 4 2 4
−1 1
1 2 1 1 0 −1
1 1
= 0 0 0 ∗ f [n] = 2 0 −2 ∗ f [n]
8 8
−1 −2 −1 1 0 −1
s 2 2
∂f (x) ∂f (x)
∥∇f ∥ = +
∂x1 ∂x2
Output of the Sobel edge detector
Original with noise added Output of the DoG filter using a 5x5 Gaussian kernel
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 6
Gaussian functions
SSV Gaussian SMV Gaussian
1 (n−µ)2
1 − 2
gauss[n] = √ e 2 σ , n∈R
σ 2π
1 1 T −2
=p e− 2 (n−µ) σ (n−µ) , n ∈ R
(2π)1 |σ|
−→ ∥∇f ∥ starts increasing as we move towards the edge at right angles (direction of +∇f or −∇f ).
−→ The true edge is at the pixel where ∥∇f ∥ is maximum. Eliminate the neighbors of this pixel (non
maxima).
Quantized gradient direction
Neighboring pixels corresponding to the 8 quantized directions Nearest neighbor in the direction of ∇f . The graylevels
indicate magnitude of the gradient.
−→ The gradient at any pixel n is classfied into one of 8 possible directions.
−→ The similarity of the gradient at pixel n is computed with the unit vector in each of the 8 directions.
−→ The direction with the highest smilarity is the quantized gradient direction.
Original f [n] DoG filter of size 5 × 5 applied to f [n] to Non maxima suppression applied after the
estimate ∇f . DoG filter
Problem of weak edges
−→ Noise within the image filtered by a DoG
filter usually shows up as weak edges.
−→ We use a technique called hysteresis
thresholding to remove such weak edges.
Hysteresis thresholding
−→ Edge pixels where the intensity of ∇f is below a certain percentile Plow are eliminated.
−→ Edge pixels where the intensity of ∇f is above a certain percentile Phigh are marked as strong edges.
−→ Edge pixels where the intensity of ∇f is between Plow and Phigh are marked as weak edges.
−→ If a weak edge has at least K neighbors which are a strong edge, then mark that weak edge as a
strong edege.
−→ Ultimately no weak edge should have K or more neighbors which are strong edges.
Pseudocode for hysteresis thresholding
for (each pixel n in the image f ) {
classify n as a strong, weak or non-exixtent edge using its percentile position based on ∇f ;
}
done = false;
while (!done) {
done = true;
for (each pixel n in the image f ) {
if(pixel n is a weak edge) {
if (pixel n has K or more strong edge neighbors) {
pixel n is now a strong edge;
if (done == true) {done = false; }
}
}
}
}
for (each pixel n in the image f ) {
if (pixel n is a strong edge) {∇f = 255;} else {∇f = 0;}
}
−→ Note that in this algorithm (the 'while' loop) can also be implemented recursively
Results
Original f [n] DoG filter of size 5×5 applied Non maxima suppression (of Hysteresis thresholding
to f [n] to estimate ∇f . ∇f ) applied after the DoG applied after non maxima
filter suppression.
Plow = 40, Phigh = 80,
K=1
Canny edge detector
−→ Detect edges using a DOG filter.
−→ Make edges one pixel thick using non maxima suppression.
−→ Use hysteresis threshholding to eliminate weak edges.
Background mathematics for future work
Null space (kernel) of a matrix
Consider a linear transformation matrix A ∈ Rk×d
−→ dom (A) = Rd
−→ target (A) = Rk
−→ ker (A) = V ⊂ dom (A) = Rd : x ∈ V ⇒ Ax = 0 ∈ Rk
−→ ker (A) is the set of vectors x such that Ax = 0 ∈ Rk .
−→ x = 0 ∈ Rd ⇒ x ∈ ker (A)
2. Consider the following image f [n]. Will the pixel n0 = [2 3]T survive after non maxima suppression
of the image gradient’s magnitude? Explain your answer.
10 16 20 180 250 251 250 244
4 8 10 170 182 248 246 240
3 4 10 20 164 200 242 251
1 2 16 24 31 190 250 120
2 9 10 22 18 23 16 34
3. Hysteresis thresholding is to be applied to the following matrix.
10 20 250 170 50 10
8 16 180 248 40 6
4 18 100 200 250 20
2 8 18 144 200 172
(a) Using a low percentile of 40 (Plow = 40) and high percentile of 80 (Phigh = 80) classify each
pixel as 'strong', 'weak' or 'eliminated'.
(b) By applying hysteresis thresholding to the result of part(a) using K = 1 neighbor reclassify each
pixel as 'strong' or 'eliminated'.
(c) How does your result in part(b) change if the value of K is changed from 1 to 3?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 7
Estimating local structure within an image
−→ We want to understand the local structure of an image around a given pixel n (shown in red color).
−→ We make a small patch (of let’s say 5 × 5 pixels) centered around n, and examine the image.
−→ We then move the patch a little bit away from n in a certain direction d and examine the image.
−→ If the new patch looks similar (shown in blue color) we say that the strucure is similar in the
direction d.
−→ If the new patch looks different (shown in green color) we say that the strucure is different in the
direction d.
−→ Why must the patch be small in size?
Three important questions
−→ Given a pixel n (red) and a direction d (arrow) in an image f [n], how can we quantitatively measure
the change in structure of f as we move in the direction d, from n to n + d?
∗
−→ If we are standing at n, in which direction d should we move so that the structure of f changes the
most rapidly?
#
−→ If we are standing at n, in which direction d should we move so that the structure of f changes
the most slowly?
−→ To answer these questions we need to compute the structure tensor at pixel n.
Tensor
−→ The 'simplest' definition of a tensor is that it is a multdimensional array of numbers.
−→ The number of indices needed to fully describe an element within the tensor is called the rank of
the tensor.
−→ A simple one dimensional array of numbers = vector = tensor of rank 1
−→ A simple two dimensional array of numbers = matrix = tensor of rank 2
−→ A simple N dimensional array of numbers = tensor of rank n
What kind of a tensor is a colored image?
−→ A colored images is 'basically' a simple 3 dimensional array of numbers (R, G and B frames).
−→ We need 3 indices to fully describe the an element (pixels) within the tensor (image)
−→ Colored images are tensors of rank 3.
Structure tensor
Let x represent the x1 direction and ley y represent the x2 direction.
−→ The structure tensor S of image f [n] at pixel [n] is a 2 × 2 matrix and is given by
−→ Note: There are two Gaussian distributions used in this process. The one in the DoG filter has a
standard deviation of σ and the one used in w[n] has a standard deviation of ρ.
Example
For the following image calculate the structure tensor at pixel n = [2 3]T , using a 5 × 5 patch with Gaussian
weights.
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the first question
Given a pixel n (red) and a direction d (arrow) in an image f [n], how can we quantitatively measure the
change in structure of f as we move in the direction d, from n to n + d?
−→ The Structural Similarity Difference SSD in the direction d at pixel n is given by
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the second question
∗
If we are standing at n, in which direction d should we move so that the structure of f changes the most
rapidly?
−→ The eigenvector corresponding to the larger eigenvalue λmax of the structure tensor at n gives the
∗
c∗ as
direction d , in which the structure changes the most rapidly. We usually take the unit vector d
the direction.
−→ The structure tensor at pixel n is used
−→ Prove that SSDdc∗ [n] = λmax
Example
For the following image calculate the direction at the pixel n = [2 3]T in which the structure changes the most
rapidly, using a 5 × 5 patch with Gaussian weights. Also give a numerical value for the change in structure.
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
Answer to the third question
#
If we are standing at n, in which direction d should we move so that the structure of f changes the most
slowly?
−→ The eigenvector corresponding to the smaller eigenvalue λmin of the structure tensor at n gives the
# # as
direction d , in which the structure changes the most slowly. We usually take the unit vector dc
the direction.
−→ The structure tensor at pixel n is used
−→ Prove that SSDdc# [n] = λmin
−→ λmin = SSDdc# [n] ≤ SSDdb[n] ≤ SSDdc∗ [n] = λmax
Example
For the following image calculate the direction at the pixel n = [2 3]T in which the structure changes the most
slowly, using a 5 × 5 patch with Gaussian weights. Also give a numerical value for the change in structure.
255 255 255 255 255 255 255 255
255 255 255 255 255 255 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
0 0 0 0 0 0 255 255
The importance of edges
What will an image without edges look like?
f [n] (Image)
−→ No edges means no change in structure
(flat region). SSDd [n] = 0, ∀d, n, why?
−→ Edges define structure.
−→ A sharp change in direction while moving
along an edge is called a corner.
−→ Corner sharpness is a measure of how
'quickly' the edge direction changes
The red pixel shows a corner
around a corner.
Flat, Edge or corner?
−→ The structure tensor can help us in classifying a pixel as belonging to a flat region, an edge or a
corner in the image
Flat region Edge Corner
−→ |S| = λmin λmax ≈ 0 in flat regions and at edges but is significant at corners.
−→ Tr (S) = λmin + λmax ≈ 0 in flat regions, is slightly significant at edges, but is very significant at
corners.
Name of corner detector Measure of corner sharpness Detection criterion
Simple corner detector λmin λmin > τ
Rohr corner detector |S| |S| > τ
|S| λmin λmax
Harris corner detector = |S| > τ
Tr (S) λmin + λmax
|S| λmin λmax
*Note = ≈ λmin at non-corners.
Tr (S) λmin + λmax
−→ The Rohr corner detector suggests an alternative measure to λmin , to bypass the computations
needed to get λmin whereas the Harris corner detector suggests a computationally lesser expensive
alternative to λmin .
Results
UET (Lahore)
Lecture 9
Structure Tensor
What is necessary and what is supplementary?
−→ The necessary things are:
▶ Computing the structure tensor of an image f at index n.
c∗ of the structure
#, d
▶ Computing the eigenvalues λmin , λmax and corresponding (unit) eigenvectors dc
tensor.
▶ dc c∗ give the respective directions for the minimum and maximum change in the structure
# and d
−→ For a point m0 , c0 in the parameter space there is a line y = m0 x + c0 in the image space. They
correspond to each other.
−→ We now fix a point xi , yi on the line in the image space and allow the parameters m0 and c0 to be
variables i.e. m and c. This gives a line in the parameter space, with slope −xi and vertical intercept
yi .
−→ For a point xi , yi in the the image space there is a line c = −xi m + yi in the parameter space. They
correspond to each other.
−→ Multiple points in the image space will form multiple lines in the parameter space.
−→ Colinear points in the image space will form lines in the parameter space, that intersect at a common
point (m0 , c0 ).
−→ This point of intersection gives the parameters of the line connecting the original colinear points.
Detecting a single line using the slope intercept form
−→ Make a 2D array representing the parameter space and set each value to 0. This 2D array is called
the accumulator array, denoted by A(m, c) over here.
−→ For each pixel on the the original line in the image space draw the corresponding line in the
accumulator array (parameter space), giving a vote of +1 to each cell it passes through.
−→ The cell with the highest votes represents the parameters of the original line in the image space.
Problem of range with slope intercept form
−→ Vertical lines in the image space correspond to a slope of ±∞ in the paramter space.
−→ These vertical lines cannot be marked (as a point) in the parameter space.
−→ Almost vertical lines in the image space correspond to a slope whose magnitude is very large in the
parameter space.
−→ Almost vertical lines cannot be marked (as a point) in the parameter space unless the accumulator
array is very large.
−→ The solution is an alternative parameterization of the line in image space, which is called the angle
distance form.
Angle distance form (θ − ρ form)
−→ The angle θ and distance ρ (as shown in the figure) completely define the line.
−→ The distane ρ is usually measured in number of pixels.
−→ Here we are taking the center of the image as the origin. Alternatively we could take the top
left corner of the image as the origin, and parameterize about the center xc = [xc yc ]T to get the
following equation in image space (for the same ρ and θ as shown in the figure).
(x − xc ) cos (θ) + (y − yc ) sin (θ) = ρ
Image space and parameter space
−→ For a point θ0 , ρ0 in the parameter space there is a line x cos (θ0 ) + y sin (θ0 ) = ρ0 in the image
space. They correspond to each other.
−→ No matter which line we draw in the image space, its corresponding point θ0 , ρ0 in the parameter
space will be such that 0 ≤ θ0 ≤ π and −ρmax ≤ ρ0 ≤ ρmax . ρmax = half diagonal length of image.
−→ The range of the accumulator array is such that 0 ≤ θ0 ≤ π and −ρmax ≤ ρ0 ≤ ρmax .
−→ For a point (x0 , y0 ) in the image space there is a sinusoid ρ = x0 cos (θ) + y0 sin (θ) in the parameter
space. The point and sinusoid correspond to each other.
−→ All lines have an angle parameter that is between 0 and π if the radius parameter is allowed to be
negative.
−→ Multiple points in the image space will form multiple sinusoids in the parameter space.
−→ Colinear points in the image space will form sinusoids in the parameter space, that intersect at a
common point (θ0 , ρ0 ).
−→ This point of intersection gives the parameters of the original line in the image space.
Detecting a single line using the angle distance form (Hough transform)
−→ Make an accumulator array whose dimensions θ and ρ vary from 0 to π and from −ρmax to ρmax
respectively. Set each value in the accumulator array to 0.
−→ For each pixel on the original line in the image space array draw the corresponding sinusoid in the
accumalator array (parameter space), giving a vote of +1 to each index it passes through.
−→ The index with the highest votes represents the parameters of the line.
−→ We can now use this index as the parameters (θ0 , ρ0 ) to draw the original line in the image space.
Results
−→ This is a computationally expensive algorithm. Can we do better?
Image Processing and Computer Vision
CMPE-443/674
Department of Computer Engineering
UET (Lahore)
Lecture 10
Detecting circles in images
Parameters of a circle
−→ The parameters of a circle are (three) numbers which completely define that circle. The most popular
parameters of a circle are its two center co-ordinates and its radius.
Center radius form (c − r form)
−→ The center c = [a b]T = (a, b) and radius r completely define the circle
Image space and parameter space (c − r form)
−→ For a point (a0 , b0 ), r0 in the parameter space there is a circle r02 = (x − a0 )2 + (y − b0 )2 in the image
space. They correspond to each other.
−→ The circle r02 = (x − a0 )2 + (y − b0 )2 is said to be parameterized in the center-radius form (c − r
form).
−→ We now fix a point xi , yi on the circle in the image space and allow the parameters (a0 , b0 ) to be
variables i.e. (a, b). This gives a circle in the parameter space, (with center (xi , yi ) and radius r0 ).
−→ For a point (xi , yi ) in the the image space there is a circle r02 = (a − xi )2 + (b − yi )2 in the parameter
space. They correspond to each other.
−→ Multiple points in the image space will form multiple circles in the parameter space.
(xi , yi ) 7→ r02 = (a − xi )2 + (b − yi )2 , 1 ≤ i ≤ n
−→ Concyclic points (with known radius r0 ) in the image space will form circles in the parameter space,
that intersect at a common point (a0 , b0 ).
−→ This point of intersection gives the center of the (orange) circle connecting the original concyclic
points.
Detecting a single circle using the center radius form
−→ Choose the radius r0 of the circle that is to be detected.
−→ Make an accumulator array A(a, b) representing the parameter space for the center of the circle and
set each value to 0.
−→ For each pixel (xi , yi ) on the the edges in the image space draw the corresponding circle r02 =
(x − a)2 + (y − b)2 in the accumulator array (parameter space), giving a vote of +1 to each cell it
passes through.
−→ The cell with the highest votes represents the center (a0 , b0 ) of the original circle in the image space.
−→ a0 = xi ± r ∇f
d (xi , yi )
0
b0 yi
−→ For each point xi , yi on the edges in the image space mark two points on the corresponding circle in
the parameter space. These two points are given by the equation on the previous slide
−→ Concyclic points in the image space will vote for a common point (a0 , b0 ) in the parameter space.
−→ This common point is the center of the (orange) circle in the image space
−→ Instead of drawing whole circles in the parameter space we only mark two points on that circle
(where we 'expect' the maximum to occur)
−→ Can we make the line detection algorithm using the Hough transform more efficient in a similar
way?
−→ For each point xi , yi on the edges in the image space mark one point on the corresponding sinusoid
in the parameter space. This point is given by
(θ0 , ρ0 ), where
θa = ∠∇f (xi , yi ), θb = ∠∇f (xi , yi ) − π : 0 ≤ θa ≤ 2π, 0 ≤ θb ≤ 2π
θ0 = min(θa , θb )
ρ0 = xi cos (θ0 ) + yi sin (θ0 )
−→ Colinear points in the image space will vote for a common point (θ0 , ρ0 ) in the parameter space.
−→ This common point contains the parameters of the line in the image space.
−→ Instead of drawing whole sinusoids in the parameter space we only mark one point on that sinusoid
(where we 'expect' the maximum to occur)
Generalized Hough Transform
−→ The computer is first trained to recognize
the given object by learning the ϕ table of
the object.
Training the computer (learning the ϕ table)
1. The interval 0 to 2π is divided into n parts, (to
represent
thedifferent possible edge directions)
2π
ϕk = k .
n
2. Compute the centroid (xc , yc ) of the object.
3. Insert the data r in = (rni , αni ) (shown in red color)
of each pixel n on the boundary of the object into
row i of the ϕ table (see figure). The correct row i is
decided (based on the edge direction of the pixel n)
using
i = argmin (|∠(∇f [n]) − ϕk |), 0 ≤ ∠(∇f [n]) ≤ 2π
k
−→ The computer is now tested to recognize
the given object, using the object’s ϕ table.
Testing the computer (using the ϕ table)
1. For each pixel n on the edges in the image the
correct rows u and v in the ϕ table are decided using
u = argmin (|∠(+∇f [n]) − ϕk |)
k
v = argmin (|∠(−∇f [n]) − ϕk |)
k
3. Once again consider the image in Problem 1. One unit of length = 10 pixels. An object in the image
space with centroid (1.8, 2.2) is to be detected using the generalized Hough transform. The interval
from 0 to 2π is divided into 8 portions for the ϕ table.
(a) Which row in the ϕ table corresponding to the pixel P with a value of 164 will be updated?
(b) What will get stored in this row?