0% found this document useful (0 votes)
35 views130 pages

Classical Computer Vision - Session 1

Uploaded by

tahatarek7770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views130 pages

Classical Computer Vision - Session 1

Uploaded by

tahatarek7770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

CLASSICAL

COMPUTER
VISION
MEET THE INSTRUCTOR

Nezar Ahmed
Machine Learning Lead
Synapse Analytics
Master’s Student
Computer Communication and Engineering
Cairo University
AI Instructor
ITI / Epsilon AI / AMIT
DISCLAIMER AND ACKNOWLEDGMENT
Some of the slides are taken from:
Phd Mayada Hadhoud, Computer Engineering Department, Cairo
University
Computer Vision: Foundations and Applications Course.
Stanford Vision and Learning Lab
Various courses, articles, and tutorials on computer vision as
Tutorial point, PyImageSearch, AnalyticsVidhya, Medium and
Towards Datascience.
CLASSICAL COMPUTER VISION
What is computer vision?
Computer vision is a branch of AI that enables computers and systems to process,
analyze and interpret the visual data as images, videos, point clouds, and other visual
inputs the same way the humans do. The actions we take as humans after perceiving a
visual input is mimicked within our software to perform certain action or
recommendations based on the information perceived. The main difference between the
machines and us that we train the machine to perform certain tasks so it doesn’t have
the ability to make general intelligence like humans. However in the specified tasks it is
trained on, it can outperform the humans in the accuracy and the speed of the
decisions as well as the scalability of applying such use cases.
CLASSICAL COMPUTER VISION
What do we mean by the word classical (traditional)?
The word classical (traditional) means that there are no neural networks included in the
process of extraction and taking decisions but only image processing is done to extract
the features like detecting the colors, the edges, the corners and objects. These features
are human engineered where you have to choose which features to look for which is the
opposite of deep learning that gives the ability to extract the important feature by itself
to differentiate between images like in classification tasks.
What is the difference between computer vision and image processing?
Image processing is a subset of computer vision where the computer vision uses the
image processing algorithms to perform emulation of human vision.
IMAGE PROCESSING VS COMPUTER VISION
Image Processing: Apply mathematical function and transformations on image.
Input ⇒ Image
Output ⇒ Image
Computer Vision: Emulate the human vision.
Input ⇒ Image or Video
Output ⇒ Decision (Classification, Detection, Tracking, Segmentation)

If the goal is to enhance an image for future use, hence image processing.
If the goal is to recognize an object, hence computer vision.
COMPUTER VISION SYSTEM
Image enhancements and Classification
translations as noise removal, Detection Tracking,
contrast enhancements, Image Matching
normalization, cropping, shearing Segmentation

Preprocessing Decision / Task

Image Feature
Postprocessing
Acquisition Extraction
Image Edge detection, Regions, Interest The algorithm that is built
Multiple Images Points, Textures, Geometrical Shapes upon the decision to apply
Color patterns countless applications
Video
COMPUTER VISION APPLICATIONS
The applications of computer vision is everywhere around us as:
● Face Detection
● Retail applications like video analytics.
● Biometrics like IRIS, face, and fingerprint Recognition
● Optical Character Recognition
● Vision Based Interaction Games like XBOX Kinect
● Biomedical application like cancer detection and X-Ray analysis
● Self driving cars
● Object Counting
● Parking Occupancy Detection
● Flow Analysis like traffic flow analysis
● Industrial Application like defect inspection and reading barcodes
● Agriculture application like crop and yield monitoring
IMAGE DATA STRUCTURE
There are several image data structure that can be used which are:
● Matrices
● Chains (as chain codes)
● Pyramids (as Matrix pyramids, Tree pyramid, Gaussian pyramid, Laplacian Pyramid)

The most common data structure that is used with images is the matrices where we can
perform matrix operations to change their values based on the processing needed as we
will take along the course.

Mostly we use the numpy array to represent the image in python. To represent an image
in native python, we can think of it as list of lists (2-D list or 3-D list)
IMAGE TYPES
Images are made of units called pixels where these pixels are normally like any data in
the computer represented in 0s and 1s. However this raises our curiosity of how can we
represent images just by making all of its units represented in 0s and 1s?
Image Types:
Binary Image is an image where all of its pixels are represented by the most basic
representation as 0s and 1s where 0 means black and 1 means white. (Only 1 bit/pixel)
Grayscale image is an image where the pixels are represented as intensities where the
least intensity represents black while the highest represent white. This can be done by
making each pixel represented by 1 byte (8-bits) hence can take value from 0 to 28 (255).
RGB image is an image where the pixels are represented as colors which introduces the
idea of channels. Each pixel is represented by 3 numbers where each number is from 0
to 255 noting that the final color is the mixture of 3 colors which are red, green and blue.
IMAGE TYPES
Th image types are as shown:
IMAGE TYPES
Example: How is the pixel represented in binary image?

Note: Normally 1 means white and 0 means black but in this example they are switched
where they represent the C letter by 1s.
IMAGE TYPES
Example: How is the pixel represented in grayscale image?
IMAGE TYPES
Example: How is the pixel represented in rgb image?
CONVERSION OF RGB TO GRAYSCALE
Conversion of RGB to Grayscale can be done in 3 different ways:
Lightness method
The average value of the components of the highest and lowest values.

Average method
Take the average of the 3 components (red, green, and blue).

Luminosity method (The best one)

The luminosity method is the best method as it is based on research on human vision.
The research proposes that our eyes react to each color in a different manner.
Specifically, our eyes are more sensitive to green, then to red, and finally to blue.
CONVERSION OF RGB TO GRAYSCALE
IMAGE RESOLUTION VS. IMAGE SIZE
Image resolution is the number of pixels (dots) represented in a linear inch (ppi on
screen or dpi on printed paper) hence when we say an image with a 72 resolution means
that every 1 inch of the image has 72 pixels to represent this inch. The higher the
resolution of the image, the better the quality of the image is.
On the other hand, the image size means the height and the width of the image itself.
For example, we can have 4 screens or TVs with the same display size (40 inch), however
each of them provides a different resolution as HD (1280x720), FHD (1920x1080), 2K
(2560x1440), and 4K (3840x2160). This means the for the same screen size, we had more
pixels which means every inch of the screen will have more pixels per inch (ppi) which
by default means smaller pixel size to fit in one inch.
But wait, why smaller pixel size is
better and gives us better quality?
Let’s illustrate by an example.
IMAGE RESOLUTION VS. IMAGE SIZE

Let us now assume we have 5x5 pixels as shown ⇒


(Note: Your screen is made of these pixels and can
be seen if you looked closely to the screen)

If we try to make a circle form this 5x5 pixels ⇒

This is not the greatest circle but the best we can do with only 25 pixels. Let us try the
same thing again if we made them 10x10 pixels with the same size of pixel.
IMAGE RESOLUTION VS. IMAGE SIZE
By using 4 times the space we had with the same size pixel, we were able to have a
better looking circle. However, it is unfortunately took more space which in the example
of the screen means a bigger screen size. But what if we want to put more pixels in the
same size of screen?

The answer is simple: Make the pixels smaller

This is the intuition behind the resolution term.


what if we made 20x20 pixels but with the same screen size?
This means that we will make pixels smaller by the half.
IMAGE RESOLUTION VS. IMAGE SIZE
You can see now that with the same size but with more resolution (smaller pixel size or
higher ppi or cramming more pixels in the same area), we can see better quality circle.
RESOLUTION AND SIZE IN COMPUTER VISION
In computer vision, the case is a little bit tweaked where the resolution and the size are
used to express the number of the pixels in the image while not expressing at all the
quality of the image (the information inside the image).
Example:
If we had an image with a resolution (size) of 600 x 800 and we resized the image with
the photoshop to 3840x2160 (4k), does this mean that we are having a higher quality
image now or we have more information than the original image. Of course not.

Hence simply we can say that image size or resolution in computer vision represents the
number of pixels not the real quality but the quality is coming from the resolution of
the camera that the image was taken with in this case. (Noting that no compression
techniques are applied on it)
COLOR MODELS/SPACES
The color model (space) is the mathematical representation of colors as numbers in
co-ordinate system where each number represents a characteristic of a color based on
the model used.
Note: Color models and spaces are different where sRGB and AdobeRGB are color spaces
based on the RGB color model. In our course, we will deal with color models mainly but
we can use the word space interchangeably with it.
Color Models:
● RGB
● CYMK
● HSV / HSL
● L*a*b
COLOR MODELS/SPACES
RGB
The rgb model is a triple set of numbers to express the red, green and blue color
(primary colors) combination of pixel. The primary colors at full intensities (255, 255, 255)
is the white light and at zero intensities is the black light (0, 0, 0).
This color model is additive model where by adding more of each color, the brighter the
pixel become.
Since each pixel is from 0⇒255, hence we have a
combination of 256x256x256=16,777,216 (16M) color
that can be created.
COLOR MODELS/SPACES
CYM(K)
The CYM(K) model is a triple (quad) set of numbers to express the cyan, yellow and
magenta colors (secondary colors) combination of pixel in addition to the black color. It
is mainly used in the color print production. It is a subset of RGB model.
It is a subtractive model as the cyan is the white color minus red color (cyan = 255-red)
and similarly yellow is complementary of blue and magenta complementary of green.
COLOR MODELS/SPACES
HSV/HSL
The HSV color model is transforming the rgb model from the cube model to the
cylindrical one. As we have seen in the rgb that the lightness (white) of a color is an
additive combination of red, green and blue while in HSV or HSL it has its own
dimension.
Hue represents the color itself as an angle where 0o represents red, 120o represents
green and 240o represents blue.
Saturation represents the purity (chroma) of the color which means 0 is the white color
while 1 (100%) is the fully saturated pure color.
Value / Lightness represents the luminance of the color where 0 means black (no light)
while 1 represents the full light.
Note: 1 in value means the color is fully shining while 1 in lightness means the color is
white due to the much light. To get full shining color, we make ½ in lightness.
COLOR MODELS/SPACES
COLOR MODELS/SPACES
L*a*b
The Lab color model is actually a triple set of numbers that represents the following:
L is the lightness where the highest value is white and the least value is black while gray
is at the center (similar to grayscale).
a originates from the center of L where one side represents the green and the other
represents the red color.
b originates from the center of L where one side represents the blue while the other
represents the yellow color.

L*a*b is not as intuitive as RGB, CYMK, and HSV/HSL however it is heavily used in many
computer vision application.
COLOR MODELS/SPACES
COLOR MODELS/SPACES
The question that should be in your mind right now is what is the importance of the
color models other than RGB which we all know?
The CYMK model is not important in the computer vision field however it is the main
model used in the printers as RGB colors resulted from transmitted light while the CYMK
colors are resulted from reflected lights as those in the printers that is why we need ink
of those colors to be used. (If you didn’t get this paragraph no problem at all)
The HSV/HSL is mainly used when we want to track a specific color range as it is far
easier to define a range using HSV than using the RGB.
The L*a*b color model is mainly used to overcome various light condition problems
where in RGB and HSV it has no meaning to measure the distance between to colors as
the difference in colors is not perceptual and has no meaning. Unlike RGB and HSV/HSL,
the difference between colors (for example, Euclidean distance) is meaningful in L*a*b
where if difference between 2 colors is small, this means that these 2 colors are actually
close to each other perceptually (‫)ادراﻛﯾﺎ‬
HISTOGRAM
Histogram is a horizontal or vertical bar chart graph showing the frequency of
occurrence of any event where the x-axis has the event (value) you want to count while
the y-axis contains the frequency.
HISTOGRAM
What is histogram in images?
Histogram in images shows the frequency of pixels intensity values where x-axis
represent the pixel value while the y- axis represent the frequency of this pixel.

Grayscale RGB (Colored)


HISTOGRAM
Applications of histogram
● Analyzing images as many properties can be studied from the histogram.
● Brightness of an image can be adjusted by knowing details of histogram.
● The contrast of an image can be computed and adjusted by the histogram.
● Can be used for image equalization (will be explained later)
● Can be used in thresholding

Let us now come across the two terminologies of brightness and contrast.
BRIGHTNESS
Brightness is a relative term and subjective property where source is reflecting light
leading to an increase in the pixels intensity.
Note: Subjective means based on personal perspective or preference of person. As some
people can say that this image is bright while others see that it is dark.

Is pixel of intensity 200 in a gray scale color means a bright pixel?


As we have said that the brightness is relative hence we should compare it to another
pixel intensity if it is lower then it is darker and vice versa.
How can we calculate the brightness of a pixel in colored image?
We average the pixel intensities of the rgb values hence brightness = (r + g + b) / 3
When the brightness is decreased, the color appears dull, and when brightness
increases, the color is clearer.
BRIGHTNESS
How can we make an image brighter?
We add the same value to all of its pixels (in case of rgb to all channels as well) so the
image appears brighter.
Note: that if number exceeded a 255 you should clip to 255 value.
CONTRAST
Contrast is the amount of differentiation in colors in rgb/grayscale that exists between
various image features where higher contrast levels means higher color or grayscale
variation in the image.
Mathematically, contrast is the difference between maximum and minimum intensity
values in an image. (Contrast = max. pixel intensity - min. pixel intensity)

What is the contrast of this image?


Since all pixels’ intensities are the same, hence 0 contrast.

Note: To calculate for a colored image, we either convert it to the grayscale and then
calculate the contrast or calculate each channel on its own and report them as contrast
for each channel or even averaging them to get a single-valued contrast.
CONTRAST
Do you think that the difference between maximum and minimum values is a good
representation of contrast of an image?
This methodology is not good in case of outliers, but what does outliers means in an
image? Outliers here means that we have some values that have low frequencies on at
the min max values which can give you wrong quantifying of contrast.
Example:
Assume having an image where one of its pixels is black and one of its pixels is white,
this will lead to an image with full contrast regardless of all other pixels. Hence we
should take the histogram of the image into consideration and ignore a certain
percentage of pixels to account for outliers.
One of the famous approach to use as an indicator of contrast
is the root mean square method by calculating the standard
deviation of the histogram. (Contrast = 2σ)
BRIGHTNESS AND CONTRAST
To change the value of the brightness and contrast of an image, you can simply apply
the alpha-beta rule on the its array of pixels as suggested by openCV:

new image = α * old image + β

α: gain parameter which controls the contrast level.


β: bias parameter which controls the brightness level.
Notes:
● Theoretically α range is 0 < α < infinity where at α=1, there is no change in
contrast.
● Theoretically β range is 0 <= β < 2bpp where at β=0, there is no change in
brightness.
HISTOGRAM SLIDING
Histogram sliding is shifting the whole histogram to the right or the left as shown below.

Can you guess what is the importance of histogram sliding?


HISTOGRAM SLIDING
Histogram sliding is controlling the brightness of the image.
HISTOGRAM STRETCHING
Histogram stretching means that we fully stretch our histogram to both sides as shown:

Can you guess what is the importance of histogram stretching?


HISTOGRAM STRETCHING
Histogram stretching is used to increase the contrast of an image
PMF AND CDF
PMF is the probability mass function where it gives the probability of the occurrence of
each number to the total number of occurrences while CDF is the cumulative
distribution function that calculates the cumulative sum of the PMF.
Assuming having the following 5x5 image with 3 bpp (0⇒7):

2 5 2 1 3

3 4 1 0 0

2 5 7 5 7

3 2 7 7 2

1 2 3 1 3

Let us now calculate the PMF of such an image?


PMF AND CDF
. Pixel Intensity Occurrence Count PMF CDF

0 2 2/25 2/25

1 4 4/25 6/25

2 6 6/25 12/25

3 5 5/25 17/25

4 1 1/25 18/25

5 3 3/25 21/25

6 0 0/25 21/25

7 4 4/25 25/25
PMF AND CDF
PMF is not monotonically increasing while the CDF is monotonically increasing where if
we tried to graph any cdf function it will something like that:

As we can see that the cdf is monotonically increasing which is important in our next
topic which is histogram equalization. PMF and CDF are the initial steps of histogram
equalization where then we will use the CDF obtained to equalize.
HISTOGRAM EQUALIZATION
Histogram equalization is mainly a practice to enhance contrast of an image by almost
flattening the histogram for all values.
HISTOGRAM EQUALIZATION
Equalization algorithm steps (the example is 3 bpp):
1- Calculate the PMF of the image
2- Calculate the CDF from the PMF of the image.
3- Multiply the CDF of each pixel by numbers of levels we have - 1 ⇒ (mostly 255)
4- Replace each pixel intensity by its corresponding flooring new value from steps 3.
HISTOGRAM EQUALIZATION
ADAPTIVE THRESHOLDING
Adaptive thresholding means that rather than concentrate on the image as whole
(global), we gonna concentrate on each part on its own so that we can improve the local
contrast rather than the global one. This is done by gridding the image (MxN grids) and
apply same histogram equalization as before on each grid. The neighboring grids are
then combined using bilinear interpolation to remove the artificial boundaries.
CLAHE
CLAHE stands for contrast limited adaptive histogram equalization which is similar to
adaptive equalization except for that it limit the contrast of each grid to certain value to
avoid over contrast amplification. The excess values over the limit is clipped and
redistributed again on the remaining pixels that is not over the limit.
CLAHE
GEOMETRIC TRANSFORMATION
Geometric transformations are essential in all computer vision systems. We will
concentrate here on the spatial affine transformations
Types of affine transformations:
● Translation
● Resizing (magnification)
● Rotation
● Shearing

We will come across the homography transformation as well later in the session.
GEOMETRIC TRANSFORMATION
The affine transformation in general is any transformation that preserves collinearity
and ratios of distances.
Preserve collinearity means that all points lying on a line initially still lies on a line after
the transformation.
Preserve ratios of distance means the midpoint of the line still the midpoint after
transformation.

Let us assume we have (x, y) coordinate of pixel before transformation, and (u, v) as
coordinate of saame pixel after transformation:
u = c11x + c12y + c13
v = c21x + c22y + c23
Can you guess for translations, resizing, shearing and rotation, which coordinates affect
them?
GEOMETRIC TRANSFORMATION

For translation: c13 and c23 are the responsible (noting that c11=c22=1 and c12=c21=0)
For resizing: c11 and c22 are the responsible (noting that c12=c21=c13=c23=0)
For shearing and rotation: all of the coefficients are responsible.
GEOMETRIC TRANSFORMATION
The previous equation can be represented as affine matrix as shown:
Explain visually (Important): link
GEOMETRIC TRANSFORMATION
Examples on combined transformations:
Translate to origin ⇒ Rotation with 23o ⇒ Translate to original location
HOMOGRAPHY
Homography (perspective transform) is a way to transform image through some matrix
multiplications where we map the points from one image to corresponding points in
another image using 3x3 homography matrix.
Consider the following 2 images where red dot represents the same point physically in
both images:
HOMOGRAPHY
We can notice that the coordinates in both images are not the same as the book was
captured from different perspective (angle) each time. So if we have a matrix that can be
multiplied such that the coordinates from the first image is mapped to the coordinates
of the second image, we can say that this matrix is the homography matrix.

H here is the homography matrix, but the question is how homography matrix is
constructed? To answer this question we need to understand 2 terminologies:
Homogeneous Coordinates and Projective Space
HOMOGRAPHY
Homogeneous coordinates are a system of coordinates used in the projective space
where you can think of projective space as a plane located at Z=1 in the 3D space.
The coordinates that we used to deal with is the cartesian coordinates (xc, yc) while in
the homogeneous coordinates it is (xh, yh, wh) where xc = xh/wh and yc = yh / wh.
Note: the cartesian coordinates is the same as homogeneous when wh = 1.
Example:
The point (0.62, 1.39) is the same as (0.62, 1.39, 1).
Or (1.24, 2.78, 2).
Or (0.31, 0.695, 0.5).
Or even (620, 1390, 1000).

Returning to the same link again for visualizing projecting: link


HOMOGRAPHY
HOMOGRAPHY
HOMOGRAPHY
Hence we can notice that the homography matrix is similar to what we have done in the
affine transformation but this time the with changing the last row coordinates to be in a
different space called the projection space.
In computer graphics and computer visions, the Homogeneous coordinates in the
projective space offer a few advantages compared to the Cartesian coordinates system
in the Euclidean space. One advantage is that it allows us to combine image
transformations like rotate and scale with translate as one matrix multiplication instead
of a matrix multiplication then vectors addition. It means that we can then chain
complex matrices into one single transformation matrix, which helps computers perform
fewer calculations.
Hence now we can say that the homography matrix is calculated by noticing all the
transformations done including the projection to the projective space.
Note: This illustration is simplified as there is concept called Pinhole model that we use
to calculate the intrinsic and extrinsic matrices to project 3D space to 2D space.
HOMOGRAPHY
Applications of Homography:
● Augmented Reality
● Perspective correction (Very Important)

● Stitching of images ⇒
LAB 1
CONVOLUTION
Convolution is the most important topic in image processing where it is a mathematical
operation that is done between 2 arrays of numbers. The arrays can be 1D, 2D or 3D
however in images it is mostly 2D arrays where an array is the image pixels and the
other array is called Kernel (Filter). Both of the 2 arrays should be of the same size.

Note: Filters are thought as stack of kernels but we will use them interchangeably.
Convolution operation in image processing can be used in many functionalities like
getting the derivatives, detecting the edges, applying blurs, noise reduction, etc..
The kernel numbers (matrix) and its pattern defines the functionality done and mostly it
is nxn where n is odd number. (3x3), (5x5), (7x7), (9x9), (11x11), etc..
CONVOLUTION
Convolution operator is * so we can say output_image = input_image * kernel
Basic Example

(0x2) + (-1x2) + (0x2) + (-1x2) + (5x3) + (-1x2) + (0x2) + (-1x2) + (0x2) = 7

How to perform convolution operation on image:


● Slide the kernel onto the image.
● Multiply the corresponding elements then adds them as done in the example.
● Repeat the procedure until all values of the image is calculated.
CONVOLUTION
https://commons.wikimedia.org/wiki/File:2D_Convolution_Animation.gif
PADDING
Padding values can be as shown in the previous slides or just 0s as shown below:
https://theano-pymc.readthedocs.io/en/latest/tutorial/conv_arithmetic.html
STRIDES
Strides is the step size to move the filter horizontally and vertically along the image.
https://miro.medium.com/max/1400/1*[email protected]
PADDING/STRIDES AND OUTPUT SIZES
There are 2 famous types of padding called valid and same padding.
https://i.stack.imgur.com/0rs9l.gif

Valid Same
PADDING/STRIDES AND OUTPUT SIZES
We have noticed in the previous 2 slides that both padding and strides affects the
output size of the convolution, let us formulate this output size:

Note: Padding values can be 0s or resemblance of the neighbouring values as we have


shown before.
CONVOLUTION ON COLORED IMAGES
Everything we have said are applied the same in the RGB images with 2D convolution as
shown below:
CONVOLUTION ON COLORED IMAGES
Everything we have said are applied the same in the RGB images noting that the values
of the 3 channels after convolution are added together in 3D convolution as shown
below:
KERNELS (FILTERS)
Filters are mainly of two types:
● Low pass filter which is commonly smoothing filters and noise reduction filters
○ All values are positive.
○ The sum of all values are equal to 1.
○ The edges are reduced due to the blurring effect.
○ As the size of filter grows, more smoothing effect takes place.
● High pass filter which is commonly edge detection filters and sharpening filters.
○ Values are both positive and negative.
○ The sum of all values are equal to 0.
○ The edges are stressed due to the derivative effect.
○ As the size of the filter grows, more edge content is stressed.
SMOOTHING (BLURRING) FILTERS
Smoothing means that edges are not sharp and clear but blurred so that the transition
from one color to another is very smooth. Blurring effect can be sensed when we zoom
too much on a photo however it is not a true blurring but it gives the same sense.
Common filter types (Assuming 3x3 filter but it can be of any size):

Mean Filter Weighted Average filter Gaussian Filter

1/9 1/9 1/9 1/18 1/18 1/18 1/16 2/16 1/16

1/9 1/9 1/9 2/16 4/16 2/16


1/18 10/18 1/18
1/9 1/9 1/9
1/16 2/16 1/16
1/18 1/18 1/18
SMOOTHING (BLURRING) FILTERS
By increasing the filter size, what happens?

3x3 5x5 7x7 9x9


SMOOTHING (BLURRING) FILTERS
What is the difference between the 3 types of smoothing filters?
As we can notice that all of the 3 filters apply blurring and reduce the noise however the
weighted average and Gaussian filters concentrate more on the center pixel than the
mean one which is logically better. Between the Gaussian and the weighted average, we
can notice that Gaussian gives more weight to the neighbours of the pixel than the far
ones which makes it more solid in blurring and applying the idea of weight average in a
more logical way.
For this 5x5 Gaussian filter, we can notice that weight is higher as we go nearer to the
center pixel.
SMOOTHING (BLURRING) FILTERS
One of the problems of all filters we have came across is that while trying to remove the
noise from the images, they wash out the edges as well as we have said before. That is
why a filter called bilateral filter came out to remove noise (blur) while keeping the edge
information as much as possible.
For an image, domain is set of all possible pixel locations while range is set of all
possible pixel intensity values. All the filters we have talked about are called domain
filters where the weights are assigned using spatial closeness (domain) which is an
issue as it won’t consider whether this group of pixel contains an edge or not. It doesn’t
care about it and this why it wash out edges. Taking the values intensities (range) into
consideration along with the location (domain) solves it. It is named Bi as the domain
filter makes sure that only nearby pixels are considered for blurring and range filter will
make sure that the weights values of the filter is changing based on the pixels intensity.
Note: Bilateral filters doesn’t have constant weights but changes with each convolution
based on the pixel values as we have said.
SMOOTHING (BLURRING) FILTERS
Bilateral Filter vs Gaussian Filter

Note: Although bilateral filter is more powerful in many cases but it is much slower as it
computes edges before assigning weight values to the kernel (filter).
EDGE (DERIVATIVE) FILTERS
Edges are one of the basic features of any images. Edges are generally horizontal,
vertical or diagonal edges. By getting the places where the edges are in the images, we
can sharpen our images and make them more clearer. Moreover edges can be used
together to form curves and shapes that act as features of an image.
Common filter types (Assuming 3x3 filter but it can be of any size):
Note: Prewitt and Sobel filters are vertical edges detector and can be horizontal by
transposing the filter. Laplacian is positive and can be negative by x-1 of all values.
Prewitt Filter Sobel filter Laplacian Filter
-1 0 1 -1 0 1 0 1 0

-1 0 1 -2 0 2 1 -4 1

-1 0 1 -1 0 1 0 1 0
EDGE (DERIVATIVE) FILTERS
Edges Types:

Abrupt change from one Abrupt change in value but


value to another. return back to original one
Step Edge Ridge Edge

Ramp Edge Roof Edge


Change is not instantaneous Change is not instantaneous
but occurs over finite but occurs over finite
distance distance then back again to
original value
EDGE (DERIVATIVE) FILTERS
Edges Types Examples:
EDGE (DERIVATIVE) FILTERS
Edges Types Examples: What is the type of edge in each picture?
EDGE (DERIVATIVE) FILTERS
Can you guess how edges are detected?
Edges are generally detected
● local extremes in first derivative
● zero crossings in the second derivative.

Let us see the following image ⇒


EDGE (DERIVATIVE) FILTERS
How Prewitt and Sobel filters (local extremes of 1st derivative) works
Since one side of the filter is -ve and the other is +ve, hence we can say that it computes
the gradient making the edge more visible. Let us take a solid stressed example to
understand more:
EDGE (DERIVATIVE) FILTERS
To decide whether the change should be considered as edge or not, we compare it to a
certain threshold as shown below:
EDGE (DERIVATIVE) FILTERS
As we have seen in the previous slide that if the color intensity is not changed (no
edges) the output will be of all zeros while if there is a significant change (edge) there
will be a high intensity between two low intensity showing the edges values.

⇒ ⇒

Original Vertical Horizontal


EDGE (DERIVATIVE) FILTERS
Can you tell what is difference between Prewitt and Sobel?
Sobel stresses on the edges more than Prewit.
EDGE (DERIVATIVE) FILTERS
Can you guess now any filter to get the diagonal edges?
Since we need to compute the gradient in diagonal way hence we put 0s on the
diagonal while keeping a gradient off the diagonal as shown below:

0 -1 -2 2 1 0

1 0 -1 1 0 -1

2 1 0 0 -1 -2

Can you guess how have we constructed these filters?


Just add the horizontal and vertical Prewitt filters. You can construct a diagonal
edge filter using Sobel as well by the same way.
EDGE (DERIVATIVE) FILTERS
Laplacian Filter (zero crossing of 2nd derivatives)
Laplacian filters are edge detector filters as well but we can use them as sharpening
filters that are used to enhance the edges of objects by highlighting them to make the
details more significant.Sharpening can be done by applying laplacian filter on image to
get detailed image then add it once more on the original image to give a filtered one.
Notes:
● Sharpening enhance the edges as well as noise associated so we need to make
noise reduction before it or otherwise it will be highlighted as well.
● Laplacian filter is actually similar to subtracting original image from smoothed
(blurred) image to give a detailed image
Original image - Smoothed image = Detailed image (Edges)
Original image + Detailed image (Edges) = Sharpened image
Check the next slide to understand visually.
EDGE (DERIVATIVE) FILTERS
EDGE (DERIVATIVE) FILTERS
Difference between filters depending 1st or 2nd derivatives?
EDGE (DERIVATIVE) FILTERS
Laplace is better in getting the exact edge as the edges in laplace are zeros between two
local maximas which is more compact and to the edge.
TEMPLATE MATCHING
Template matching is a method where we search for a template image inside a large
image that matches exactly.
Can you suggest me a filter to find the following template image?
CANNY EDGE DETECTOR
Canny edge detector is a multi-stage edge detector that can detect wide range of edges
in any input image. Canny edge detector depends on the 1st derivative as Sobel and
Prewitt filters.
Steps of Canny edge detector:
● Removing Noise (Noise Reduction).
● Gradient calculations as we have said before (get the 1st derivative).
● Non-max suppression.
● Double thresholding
● Edge Tracking using Hysteresis thresholding.

We normally convert RGB images to Grayscale images to apply canny edge detector.
CANNY EDGE DETECTOR
Step1: Noise Reduction
Noise reduction should be applied as the noise affects the gradients that we build our
edge detection on, so we apply Gaussian blur by using any kernel-size (3x3, 5x5, 7x7, ..).
Mostly a 5x5 filter is used.
Note: Bilateral filter can be used instead but with more computations and time.
Remember: The Gaussian equation to create the filter is
CANNY EDGE DETECTOR
Step2: Compute the gradients
As we have said before we apply filters that actually acts as gradient filters to detect a
change in the intensity of pixels (edges). This is done by convolving the image 2 times
with any edge filter as Sobel filter; one for horizontal and one for vertical. This will result
in 2 images Ix and Iy which are images containing vertical and horizontal edges
respectively.

Vertical Horizontal
CANNY EDGE DETECTOR
Now let us compute the magnitude and direction of the gradient (output 2 images),

This can be done as follows ⇒

The magnitude |G| of the image results in the following image ⇒

The result is as expected normally but we the problem here is that some edges are thick
while the others are thin so we will use the non-max suppression to thin the edges by
mitigating the thick ones. Ideally the image should have thin edges for better detection
CANNY EDGE DETECTOR
Step3: Non-max Suppression: Each pixel has an intensity and angle from what we have
calculated from the gradient step so we will zoom in for example on the upper left red
box in the next slide.
CANNY EDGE DETECTOR
We can notice that the pixel (i, j) has a direction of -pi (horizontal from right to left). The
algorithm checks the neighbouring pixels to it in the same direction which are pixels (i,
j-1) and (i, j+1) annotated by blue boxes and If one those two pixels are more intense
than the one being processed, then only the more intense one is kept.
CANNY EDGE DETECTOR
By checking the two blue box pixels, we can notice that pixel(i, j-1) is the most intense as
it has value of 255 (white pixel) hence intensity of pixel (i, j) will be set to 0. If there are
no pixels in the edge direction having more intense values, then the value of the current
pixel is kept.
CANNY EDGE DETECTOR
Now it is your turn to solve, what do you expect to do with the following pixel:
CANNY EDGE DETECTOR
Since pixel (i-1, j+1) is more intense then intensity of pixel (i, j) should be set to 0 as well.
We will keep iterating on all pixels and check for the pixels in the same direction and
either set them to 0 or keep them as they are based on the values of the neighbours.
CANNY EDGE DETECTOR
The result will be now as shown below (All edges are thin now):
CANNY EDGE DETECTOR
Step4: Double thresholding
The double thresholding step aims at identifying 3 kinds of pixel which are strong, weak
and non-relevant.
Here we set two thresholds called high and low thresholds as hyper-parameters where:
● Strong pixel if the pixel value is above the high threshold.
● Non-relevant pixel if the pixel value is lower than the low threshold.
● Weak pixel if the pixel value is between the low and high threshold. The weak pixel
will be identified in the next step as either strong or non-relevant based on the
hysteresis mechanism.
The strong pixels will be set to 255. The non-relevant pixels will be set to 0. The weak
pixels will be set to a value in between (25 or 50, or 100, ..).
CANNY EDGE DETECTOR
The output should be something like this (before and after):
Note: it is not the same image we were working on but this image shows the same idea.
CANNY EDGE DETECTOR
Step5: Hysteresis Tracking Mechanism
The weak pixels in the previous image will be set either as strong (set to 255) or
non-relevant (set to 0) to decide whether it is an edge or not.
The weak pixels are considered strong if and only if at least one of the pixels around it
is a strong one or connected via weak pixels to a strong pixel.
Assuming an image of 3x3 pixels, hysteresis is as shown:
CANNY EDGE DETECTOR
The final output now will be as follows:
CANNY EDGE DETECTOR
What are the main parameters affecting the canny edge detector operation?
● The size of the Gaussian kernel used in the smoothing phase.
● The upper threshold of the double thresholding phase.
● The lower threshold of the double thresholding phase.
What happens if the smoothing filter size increases?
It will detect larger edges and ignoring more finer edges. If the kernel size is small,
hence it will be able to detect fine features.
CANNY EDGE DETECTOR
What happens if the upper threshold decreases (set too low)?
This will increase the number of undesirable edge fragments by assuming any noise
beside each other as edge.
What happens if the lower threshold increases (set too high)?
If edges exist but having some noise, it wouldn’t be considered as edges.
Tight range means the the lower threshold is high and upper threshold is low so the
range is small and vice versa for the wide range.
MORPHOLOGY
The word ‘Morphology’ generally represents a branch of biology that deals with the form
and structure of animals and plants. Morphological operations are simple
transformations that are applied to shapes and structures in binary or grayscale images.
Applications in morphology:
● Removing unwanted effects in post-processing
○ Remove small objects that can be assumed as noise.
○ Smoothing edges of larger objects.
○ Fill some holes in objects or regions.
○ Link objects together.
● Object description and analysis
○ Locating objects with certain structure.
○ Locating patterns in image.
○ Locating boundary of an object
MORPHOLOGY
There is a slight overlap between Morphology and Image Segmentation. Morphology
consists of methods that can be used to pre-process the input data of Image
Segmentation or to post-process the output of the Image Segmentation stage. In other
words, once the segmentation is complete, morphological operations can be used to
remove imperfections in the segmented image and deliver information on the shape
and structure of the image as shown in the following figure.
MORPHOLOGY
Morphology in mathematics
Thinking of mathematical morphology makes us go to the set theory where sets in
mathematical morphology is corresponding to objects in image. How are objects
represented in binary and grayscale images?
● Binary images the element of the set(object) is the coordinate (x, y) of the pixel
belonging to the object. Z2
● Grayscale images the element of the set(object) is the coordinate (x, y) of the pixel
belonging to the object as well as the gray level. Z3
Have we taken the set theory before? (Remember the primary school)
Yes we have taken it which is the set of numbers and the operations done on them as
union, intersection, complement and difference which we were representing them on
venn diagrams. Check the following slides.
In this part we will take them from images perspective.
MORPHOLOGY
MORPHOLOGY
From the image perspective:
Structuring Element(SE): small sets or subimages used to investigate an image under
study for properties of interest. SE is actually a matrix with specific dimension and a
pattern of 0s and 1s (in binary images) or 0-255 (in grayscale images) that specify the
shape of the structuring element. For example, the following SEs of 3x3 size:

What are we going to do with these structuring elements? Let us discuss the 3
terminologies we will use with structuring elements (Fit, Hit, and Miss)
MORPHOLOGY
The structuring element is said to be:
Fit: If all pixels of value 1 in the SE is matching a pixel of value 1 in the image.
Hit: If one of the pixels of value 1 in the SE is matching a pixel of value 1 in the image.
Miss: If none of the pixels of value 1 in the SE is matching a pixel of value 1 in the image
Notes:
● Zero-valued pixels in the SE are ignored.
● For the grayscale images, instead of the value of 1 in the image, we search for any
non-zero in the image to say whether hit or fit. We will cover this part again at the
end of the morphology part.
MORPHOLOGY
MORPHOLOGY
Now we are ready to take the main 2 concepts of morphology.
Erosion is the removal of structures of certain shapes and sizes that are given by the SE
(A process of shrinking) ⇒ keep only the pixels that fits the SE.
MORPHOLOGY
Dilation is the filling of the structure of certain shapes and sizes that are given by the
SE. (A process of growing) ⇒ Keep only the pixels that hits the SE.
MORPHOLOGY
Importance of Erosion:
● Shrinks or thins objects in images.
● Remove image components.
● Strip away extrusions,
● Splits joint objects.
Importance of Dilation:
● Grows or thickens objects in images.
● Bridge gaps between objects.
● Repair intrusions
● Fill small holes in objects.
MORPHOLOGY
Dilation Examples
MORPHOLOGY
Erosion Examples
MORPHOLOGY
Combining Erosion and Dilation
Opening is erosion followed by dilation with the same SE.
Closing is dilation followed by erosion with the same SE.
MORPHOLOGY
Opening is idempotent which means that once an image has been opened, subsequent
openings with the same structuring element have no further effect on that image.
Opening is called so because it can open up a gap between objects connected by a thin
bridge of pixels. Any regions that have survived the erosion are restored to their original
size by the dilation.

Similarly Closing is idempotent which means that once an image has been closed,
subsequent closings with the same structuring element have no further effect on that
image.
Closing is so called because it can fill holes in the regions while keeping the initial
region sizes.
MORPHOLOGY

If we made erosion of original image, noise is removed but undesired thinning of


fingerprint takes place but with opening we removed noise while keeping fingerprint.
However if you looked closely you will notice some discontinuities in the fingerprint, if
we make dilation to cure this, some lines of the finger print will be joined together, but
with closing after the opening, we reached a perfect fingerprint lines.
MORPHOLOGY
Boundary Extraction
Boundary extraction can be done by subtracting original image from eroded image as by
erosion of an image it removes the edges and keep the fillings inside so if we
subtracted the fillings from the original image, it will give us the borders (boundaries).
MORPHOLOGY
MORPHOLOGY
Erosion in Grayscale:
The erosion of image by a structuring element SE at any location (x, y) is defined as the
minimum value of the image in the region coincides with SE when the origin of SE is at
(x, y).
Dilation in Grayscale:
The dilation of image by a structuring element SE at any location (x, y) is defined as the
maximum value of the image in the region coincides with SE when the origin of SE is at
(x, y).
Effects of erosion and dilation in Grayscale:
● The output of image tends to be darker with erosion and vice versa for dilation.
● In dilation, dark details tends to be reduced or eliminated as the brighter areas
tends to grow more.
● In erosion, the bright details that are smaller than SE tends to be reduced.
MORPHOLOGY
In the opening of a gray-scale image, we remove small light details, while relatively
undisturbed overall gray levels and larger bright features.
In the closing of a gray-scale image, we remove small dark details, while relatively
undisturbed overall gray levels and larger dark features.
Didn’t get what is said in the previous sentences? Check the figure in the following slide.
MORPHOLOGY
LAB 2

You might also like