Classical Computer Vision - Session 1
Classical Computer Vision - Session 1
COMPUTER
VISION
MEET THE INSTRUCTOR
Nezar Ahmed
Machine Learning Lead
Synapse Analytics
Master’s Student
Computer Communication and Engineering
Cairo University
AI Instructor
ITI / Epsilon AI / AMIT
DISCLAIMER AND ACKNOWLEDGMENT
Some of the slides are taken from:
Phd Mayada Hadhoud, Computer Engineering Department, Cairo
University
Computer Vision: Foundations and Applications Course.
Stanford Vision and Learning Lab
Various courses, articles, and tutorials on computer vision as
Tutorial point, PyImageSearch, AnalyticsVidhya, Medium and
Towards Datascience.
CLASSICAL COMPUTER VISION
What is computer vision?
Computer vision is a branch of AI that enables computers and systems to process,
analyze and interpret the visual data as images, videos, point clouds, and other visual
inputs the same way the humans do. The actions we take as humans after perceiving a
visual input is mimicked within our software to perform certain action or
recommendations based on the information perceived. The main difference between the
machines and us that we train the machine to perform certain tasks so it doesn’t have
the ability to make general intelligence like humans. However in the specified tasks it is
trained on, it can outperform the humans in the accuracy and the speed of the
decisions as well as the scalability of applying such use cases.
CLASSICAL COMPUTER VISION
What do we mean by the word classical (traditional)?
The word classical (traditional) means that there are no neural networks included in the
process of extraction and taking decisions but only image processing is done to extract
the features like detecting the colors, the edges, the corners and objects. These features
are human engineered where you have to choose which features to look for which is the
opposite of deep learning that gives the ability to extract the important feature by itself
to differentiate between images like in classification tasks.
What is the difference between computer vision and image processing?
Image processing is a subset of computer vision where the computer vision uses the
image processing algorithms to perform emulation of human vision.
IMAGE PROCESSING VS COMPUTER VISION
Image Processing: Apply mathematical function and transformations on image.
Input ⇒ Image
Output ⇒ Image
Computer Vision: Emulate the human vision.
Input ⇒ Image or Video
Output ⇒ Decision (Classification, Detection, Tracking, Segmentation)
If the goal is to enhance an image for future use, hence image processing.
If the goal is to recognize an object, hence computer vision.
COMPUTER VISION SYSTEM
Image enhancements and Classification
translations as noise removal, Detection Tracking,
contrast enhancements, Image Matching
normalization, cropping, shearing Segmentation
Image Feature
Postprocessing
Acquisition Extraction
Image Edge detection, Regions, Interest The algorithm that is built
Multiple Images Points, Textures, Geometrical Shapes upon the decision to apply
Color patterns countless applications
Video
COMPUTER VISION APPLICATIONS
The applications of computer vision is everywhere around us as:
● Face Detection
● Retail applications like video analytics.
● Biometrics like IRIS, face, and fingerprint Recognition
● Optical Character Recognition
● Vision Based Interaction Games like XBOX Kinect
● Biomedical application like cancer detection and X-Ray analysis
● Self driving cars
● Object Counting
● Parking Occupancy Detection
● Flow Analysis like traffic flow analysis
● Industrial Application like defect inspection and reading barcodes
● Agriculture application like crop and yield monitoring
IMAGE DATA STRUCTURE
There are several image data structure that can be used which are:
● Matrices
● Chains (as chain codes)
● Pyramids (as Matrix pyramids, Tree pyramid, Gaussian pyramid, Laplacian Pyramid)
The most common data structure that is used with images is the matrices where we can
perform matrix operations to change their values based on the processing needed as we
will take along the course.
Mostly we use the numpy array to represent the image in python. To represent an image
in native python, we can think of it as list of lists (2-D list or 3-D list)
IMAGE TYPES
Images are made of units called pixels where these pixels are normally like any data in
the computer represented in 0s and 1s. However this raises our curiosity of how can we
represent images just by making all of its units represented in 0s and 1s?
Image Types:
Binary Image is an image where all of its pixels are represented by the most basic
representation as 0s and 1s where 0 means black and 1 means white. (Only 1 bit/pixel)
Grayscale image is an image where the pixels are represented as intensities where the
least intensity represents black while the highest represent white. This can be done by
making each pixel represented by 1 byte (8-bits) hence can take value from 0 to 28 (255).
RGB image is an image where the pixels are represented as colors which introduces the
idea of channels. Each pixel is represented by 3 numbers where each number is from 0
to 255 noting that the final color is the mixture of 3 colors which are red, green and blue.
IMAGE TYPES
Th image types are as shown:
IMAGE TYPES
Example: How is the pixel represented in binary image?
Note: Normally 1 means white and 0 means black but in this example they are switched
where they represent the C letter by 1s.
IMAGE TYPES
Example: How is the pixel represented in grayscale image?
IMAGE TYPES
Example: How is the pixel represented in rgb image?
CONVERSION OF RGB TO GRAYSCALE
Conversion of RGB to Grayscale can be done in 3 different ways:
Lightness method
The average value of the components of the highest and lowest values.
Average method
Take the average of the 3 components (red, green, and blue).
The luminosity method is the best method as it is based on research on human vision.
The research proposes that our eyes react to each color in a different manner.
Specifically, our eyes are more sensitive to green, then to red, and finally to blue.
CONVERSION OF RGB TO GRAYSCALE
IMAGE RESOLUTION VS. IMAGE SIZE
Image resolution is the number of pixels (dots) represented in a linear inch (ppi on
screen or dpi on printed paper) hence when we say an image with a 72 resolution means
that every 1 inch of the image has 72 pixels to represent this inch. The higher the
resolution of the image, the better the quality of the image is.
On the other hand, the image size means the height and the width of the image itself.
For example, we can have 4 screens or TVs with the same display size (40 inch), however
each of them provides a different resolution as HD (1280x720), FHD (1920x1080), 2K
(2560x1440), and 4K (3840x2160). This means the for the same screen size, we had more
pixels which means every inch of the screen will have more pixels per inch (ppi) which
by default means smaller pixel size to fit in one inch.
But wait, why smaller pixel size is
better and gives us better quality?
Let’s illustrate by an example.
IMAGE RESOLUTION VS. IMAGE SIZE
This is not the greatest circle but the best we can do with only 25 pixels. Let us try the
same thing again if we made them 10x10 pixels with the same size of pixel.
IMAGE RESOLUTION VS. IMAGE SIZE
By using 4 times the space we had with the same size pixel, we were able to have a
better looking circle. However, it is unfortunately took more space which in the example
of the screen means a bigger screen size. But what if we want to put more pixels in the
same size of screen?
Hence simply we can say that image size or resolution in computer vision represents the
number of pixels not the real quality but the quality is coming from the resolution of
the camera that the image was taken with in this case. (Noting that no compression
techniques are applied on it)
COLOR MODELS/SPACES
The color model (space) is the mathematical representation of colors as numbers in
co-ordinate system where each number represents a characteristic of a color based on
the model used.
Note: Color models and spaces are different where sRGB and AdobeRGB are color spaces
based on the RGB color model. In our course, we will deal with color models mainly but
we can use the word space interchangeably with it.
Color Models:
● RGB
● CYMK
● HSV / HSL
● L*a*b
COLOR MODELS/SPACES
RGB
The rgb model is a triple set of numbers to express the red, green and blue color
(primary colors) combination of pixel. The primary colors at full intensities (255, 255, 255)
is the white light and at zero intensities is the black light (0, 0, 0).
This color model is additive model where by adding more of each color, the brighter the
pixel become.
Since each pixel is from 0⇒255, hence we have a
combination of 256x256x256=16,777,216 (16M) color
that can be created.
COLOR MODELS/SPACES
CYM(K)
The CYM(K) model is a triple (quad) set of numbers to express the cyan, yellow and
magenta colors (secondary colors) combination of pixel in addition to the black color. It
is mainly used in the color print production. It is a subset of RGB model.
It is a subtractive model as the cyan is the white color minus red color (cyan = 255-red)
and similarly yellow is complementary of blue and magenta complementary of green.
COLOR MODELS/SPACES
HSV/HSL
The HSV color model is transforming the rgb model from the cube model to the
cylindrical one. As we have seen in the rgb that the lightness (white) of a color is an
additive combination of red, green and blue while in HSV or HSL it has its own
dimension.
Hue represents the color itself as an angle where 0o represents red, 120o represents
green and 240o represents blue.
Saturation represents the purity (chroma) of the color which means 0 is the white color
while 1 (100%) is the fully saturated pure color.
Value / Lightness represents the luminance of the color where 0 means black (no light)
while 1 represents the full light.
Note: 1 in value means the color is fully shining while 1 in lightness means the color is
white due to the much light. To get full shining color, we make ½ in lightness.
COLOR MODELS/SPACES
COLOR MODELS/SPACES
L*a*b
The Lab color model is actually a triple set of numbers that represents the following:
L is the lightness where the highest value is white and the least value is black while gray
is at the center (similar to grayscale).
a originates from the center of L where one side represents the green and the other
represents the red color.
b originates from the center of L where one side represents the blue while the other
represents the yellow color.
L*a*b is not as intuitive as RGB, CYMK, and HSV/HSL however it is heavily used in many
computer vision application.
COLOR MODELS/SPACES
COLOR MODELS/SPACES
The question that should be in your mind right now is what is the importance of the
color models other than RGB which we all know?
The CYMK model is not important in the computer vision field however it is the main
model used in the printers as RGB colors resulted from transmitted light while the CYMK
colors are resulted from reflected lights as those in the printers that is why we need ink
of those colors to be used. (If you didn’t get this paragraph no problem at all)
The HSV/HSL is mainly used when we want to track a specific color range as it is far
easier to define a range using HSV than using the RGB.
The L*a*b color model is mainly used to overcome various light condition problems
where in RGB and HSV it has no meaning to measure the distance between to colors as
the difference in colors is not perceptual and has no meaning. Unlike RGB and HSV/HSL,
the difference between colors (for example, Euclidean distance) is meaningful in L*a*b
where if difference between 2 colors is small, this means that these 2 colors are actually
close to each other perceptually ()ادراﻛﯾﺎ
HISTOGRAM
Histogram is a horizontal or vertical bar chart graph showing the frequency of
occurrence of any event where the x-axis has the event (value) you want to count while
the y-axis contains the frequency.
HISTOGRAM
What is histogram in images?
Histogram in images shows the frequency of pixels intensity values where x-axis
represent the pixel value while the y- axis represent the frequency of this pixel.
Let us now come across the two terminologies of brightness and contrast.
BRIGHTNESS
Brightness is a relative term and subjective property where source is reflecting light
leading to an increase in the pixels intensity.
Note: Subjective means based on personal perspective or preference of person. As some
people can say that this image is bright while others see that it is dark.
Note: To calculate for a colored image, we either convert it to the grayscale and then
calculate the contrast or calculate each channel on its own and report them as contrast
for each channel or even averaging them to get a single-valued contrast.
CONTRAST
Do you think that the difference between maximum and minimum values is a good
representation of contrast of an image?
This methodology is not good in case of outliers, but what does outliers means in an
image? Outliers here means that we have some values that have low frequencies on at
the min max values which can give you wrong quantifying of contrast.
Example:
Assume having an image where one of its pixels is black and one of its pixels is white,
this will lead to an image with full contrast regardless of all other pixels. Hence we
should take the histogram of the image into consideration and ignore a certain
percentage of pixels to account for outliers.
One of the famous approach to use as an indicator of contrast
is the root mean square method by calculating the standard
deviation of the histogram. (Contrast = 2σ)
BRIGHTNESS AND CONTRAST
To change the value of the brightness and contrast of an image, you can simply apply
the alpha-beta rule on the its array of pixels as suggested by openCV:
2 5 2 1 3
3 4 1 0 0
2 5 7 5 7
3 2 7 7 2
1 2 3 1 3
0 2 2/25 2/25
1 4 4/25 6/25
2 6 6/25 12/25
3 5 5/25 17/25
4 1 1/25 18/25
5 3 3/25 21/25
6 0 0/25 21/25
7 4 4/25 25/25
PMF AND CDF
PMF is not monotonically increasing while the CDF is monotonically increasing where if
we tried to graph any cdf function it will something like that:
As we can see that the cdf is monotonically increasing which is important in our next
topic which is histogram equalization. PMF and CDF are the initial steps of histogram
equalization where then we will use the CDF obtained to equalize.
HISTOGRAM EQUALIZATION
Histogram equalization is mainly a practice to enhance contrast of an image by almost
flattening the histogram for all values.
HISTOGRAM EQUALIZATION
Equalization algorithm steps (the example is 3 bpp):
1- Calculate the PMF of the image
2- Calculate the CDF from the PMF of the image.
3- Multiply the CDF of each pixel by numbers of levels we have - 1 ⇒ (mostly 255)
4- Replace each pixel intensity by its corresponding flooring new value from steps 3.
HISTOGRAM EQUALIZATION
ADAPTIVE THRESHOLDING
Adaptive thresholding means that rather than concentrate on the image as whole
(global), we gonna concentrate on each part on its own so that we can improve the local
contrast rather than the global one. This is done by gridding the image (MxN grids) and
apply same histogram equalization as before on each grid. The neighboring grids are
then combined using bilinear interpolation to remove the artificial boundaries.
CLAHE
CLAHE stands for contrast limited adaptive histogram equalization which is similar to
adaptive equalization except for that it limit the contrast of each grid to certain value to
avoid over contrast amplification. The excess values over the limit is clipped and
redistributed again on the remaining pixels that is not over the limit.
CLAHE
GEOMETRIC TRANSFORMATION
Geometric transformations are essential in all computer vision systems. We will
concentrate here on the spatial affine transformations
Types of affine transformations:
● Translation
● Resizing (magnification)
● Rotation
● Shearing
We will come across the homography transformation as well later in the session.
GEOMETRIC TRANSFORMATION
The affine transformation in general is any transformation that preserves collinearity
and ratios of distances.
Preserve collinearity means that all points lying on a line initially still lies on a line after
the transformation.
Preserve ratios of distance means the midpoint of the line still the midpoint after
transformation.
Let us assume we have (x, y) coordinate of pixel before transformation, and (u, v) as
coordinate of saame pixel after transformation:
u = c11x + c12y + c13
v = c21x + c22y + c23
Can you guess for translations, resizing, shearing and rotation, which coordinates affect
them?
GEOMETRIC TRANSFORMATION
For translation: c13 and c23 are the responsible (noting that c11=c22=1 and c12=c21=0)
For resizing: c11 and c22 are the responsible (noting that c12=c21=c13=c23=0)
For shearing and rotation: all of the coefficients are responsible.
GEOMETRIC TRANSFORMATION
The previous equation can be represented as affine matrix as shown:
Explain visually (Important): link
GEOMETRIC TRANSFORMATION
Examples on combined transformations:
Translate to origin ⇒ Rotation with 23o ⇒ Translate to original location
HOMOGRAPHY
Homography (perspective transform) is a way to transform image through some matrix
multiplications where we map the points from one image to corresponding points in
another image using 3x3 homography matrix.
Consider the following 2 images where red dot represents the same point physically in
both images:
HOMOGRAPHY
We can notice that the coordinates in both images are not the same as the book was
captured from different perspective (angle) each time. So if we have a matrix that can be
multiplied such that the coordinates from the first image is mapped to the coordinates
of the second image, we can say that this matrix is the homography matrix.
H here is the homography matrix, but the question is how homography matrix is
constructed? To answer this question we need to understand 2 terminologies:
Homogeneous Coordinates and Projective Space
HOMOGRAPHY
Homogeneous coordinates are a system of coordinates used in the projective space
where you can think of projective space as a plane located at Z=1 in the 3D space.
The coordinates that we used to deal with is the cartesian coordinates (xc, yc) while in
the homogeneous coordinates it is (xh, yh, wh) where xc = xh/wh and yc = yh / wh.
Note: the cartesian coordinates is the same as homogeneous when wh = 1.
Example:
The point (0.62, 1.39) is the same as (0.62, 1.39, 1).
Or (1.24, 2.78, 2).
Or (0.31, 0.695, 0.5).
Or even (620, 1390, 1000).
● Stitching of images ⇒
LAB 1
CONVOLUTION
Convolution is the most important topic in image processing where it is a mathematical
operation that is done between 2 arrays of numbers. The arrays can be 1D, 2D or 3D
however in images it is mostly 2D arrays where an array is the image pixels and the
other array is called Kernel (Filter). Both of the 2 arrays should be of the same size.
Note: Filters are thought as stack of kernels but we will use them interchangeably.
Convolution operation in image processing can be used in many functionalities like
getting the derivatives, detecting the edges, applying blurs, noise reduction, etc..
The kernel numbers (matrix) and its pattern defines the functionality done and mostly it
is nxn where n is odd number. (3x3), (5x5), (7x7), (9x9), (11x11), etc..
CONVOLUTION
Convolution operator is * so we can say output_image = input_image * kernel
Basic Example
Valid Same
PADDING/STRIDES AND OUTPUT SIZES
We have noticed in the previous 2 slides that both padding and strides affects the
output size of the convolution, let us formulate this output size:
Note: Although bilateral filter is more powerful in many cases but it is much slower as it
computes edges before assigning weight values to the kernel (filter).
EDGE (DERIVATIVE) FILTERS
Edges are one of the basic features of any images. Edges are generally horizontal,
vertical or diagonal edges. By getting the places where the edges are in the images, we
can sharpen our images and make them more clearer. Moreover edges can be used
together to form curves and shapes that act as features of an image.
Common filter types (Assuming 3x3 filter but it can be of any size):
Note: Prewitt and Sobel filters are vertical edges detector and can be horizontal by
transposing the filter. Laplacian is positive and can be negative by x-1 of all values.
Prewitt Filter Sobel filter Laplacian Filter
-1 0 1 -1 0 1 0 1 0
-1 0 1 -2 0 2 1 -4 1
-1 0 1 -1 0 1 0 1 0
EDGE (DERIVATIVE) FILTERS
Edges Types:
⇒ ⇒
0 -1 -2 2 1 0
1 0 -1 1 0 -1
2 1 0 0 -1 -2
We normally convert RGB images to Grayscale images to apply canny edge detector.
CANNY EDGE DETECTOR
Step1: Noise Reduction
Noise reduction should be applied as the noise affects the gradients that we build our
edge detection on, so we apply Gaussian blur by using any kernel-size (3x3, 5x5, 7x7, ..).
Mostly a 5x5 filter is used.
Note: Bilateral filter can be used instead but with more computations and time.
Remember: The Gaussian equation to create the filter is
CANNY EDGE DETECTOR
Step2: Compute the gradients
As we have said before we apply filters that actually acts as gradient filters to detect a
change in the intensity of pixels (edges). This is done by convolving the image 2 times
with any edge filter as Sobel filter; one for horizontal and one for vertical. This will result
in 2 images Ix and Iy which are images containing vertical and horizontal edges
respectively.
Vertical Horizontal
CANNY EDGE DETECTOR
Now let us compute the magnitude and direction of the gradient (output 2 images),
The result is as expected normally but we the problem here is that some edges are thick
while the others are thin so we will use the non-max suppression to thin the edges by
mitigating the thick ones. Ideally the image should have thin edges for better detection
CANNY EDGE DETECTOR
Step3: Non-max Suppression: Each pixel has an intensity and angle from what we have
calculated from the gradient step so we will zoom in for example on the upper left red
box in the next slide.
CANNY EDGE DETECTOR
We can notice that the pixel (i, j) has a direction of -pi (horizontal from right to left). The
algorithm checks the neighbouring pixels to it in the same direction which are pixels (i,
j-1) and (i, j+1) annotated by blue boxes and If one those two pixels are more intense
than the one being processed, then only the more intense one is kept.
CANNY EDGE DETECTOR
By checking the two blue box pixels, we can notice that pixel(i, j-1) is the most intense as
it has value of 255 (white pixel) hence intensity of pixel (i, j) will be set to 0. If there are
no pixels in the edge direction having more intense values, then the value of the current
pixel is kept.
CANNY EDGE DETECTOR
Now it is your turn to solve, what do you expect to do with the following pixel:
CANNY EDGE DETECTOR
Since pixel (i-1, j+1) is more intense then intensity of pixel (i, j) should be set to 0 as well.
We will keep iterating on all pixels and check for the pixels in the same direction and
either set them to 0 or keep them as they are based on the values of the neighbours.
CANNY EDGE DETECTOR
The result will be now as shown below (All edges are thin now):
CANNY EDGE DETECTOR
Step4: Double thresholding
The double thresholding step aims at identifying 3 kinds of pixel which are strong, weak
and non-relevant.
Here we set two thresholds called high and low thresholds as hyper-parameters where:
● Strong pixel if the pixel value is above the high threshold.
● Non-relevant pixel if the pixel value is lower than the low threshold.
● Weak pixel if the pixel value is between the low and high threshold. The weak pixel
will be identified in the next step as either strong or non-relevant based on the
hysteresis mechanism.
The strong pixels will be set to 255. The non-relevant pixels will be set to 0. The weak
pixels will be set to a value in between (25 or 50, or 100, ..).
CANNY EDGE DETECTOR
The output should be something like this (before and after):
Note: it is not the same image we were working on but this image shows the same idea.
CANNY EDGE DETECTOR
Step5: Hysteresis Tracking Mechanism
The weak pixels in the previous image will be set either as strong (set to 255) or
non-relevant (set to 0) to decide whether it is an edge or not.
The weak pixels are considered strong if and only if at least one of the pixels around it
is a strong one or connected via weak pixels to a strong pixel.
Assuming an image of 3x3 pixels, hysteresis is as shown:
CANNY EDGE DETECTOR
The final output now will be as follows:
CANNY EDGE DETECTOR
What are the main parameters affecting the canny edge detector operation?
● The size of the Gaussian kernel used in the smoothing phase.
● The upper threshold of the double thresholding phase.
● The lower threshold of the double thresholding phase.
What happens if the smoothing filter size increases?
It will detect larger edges and ignoring more finer edges. If the kernel size is small,
hence it will be able to detect fine features.
CANNY EDGE DETECTOR
What happens if the upper threshold decreases (set too low)?
This will increase the number of undesirable edge fragments by assuming any noise
beside each other as edge.
What happens if the lower threshold increases (set too high)?
If edges exist but having some noise, it wouldn’t be considered as edges.
Tight range means the the lower threshold is high and upper threshold is low so the
range is small and vice versa for the wide range.
MORPHOLOGY
The word ‘Morphology’ generally represents a branch of biology that deals with the form
and structure of animals and plants. Morphological operations are simple
transformations that are applied to shapes and structures in binary or grayscale images.
Applications in morphology:
● Removing unwanted effects in post-processing
○ Remove small objects that can be assumed as noise.
○ Smoothing edges of larger objects.
○ Fill some holes in objects or regions.
○ Link objects together.
● Object description and analysis
○ Locating objects with certain structure.
○ Locating patterns in image.
○ Locating boundary of an object
MORPHOLOGY
There is a slight overlap between Morphology and Image Segmentation. Morphology
consists of methods that can be used to pre-process the input data of Image
Segmentation or to post-process the output of the Image Segmentation stage. In other
words, once the segmentation is complete, morphological operations can be used to
remove imperfections in the segmented image and deliver information on the shape
and structure of the image as shown in the following figure.
MORPHOLOGY
Morphology in mathematics
Thinking of mathematical morphology makes us go to the set theory where sets in
mathematical morphology is corresponding to objects in image. How are objects
represented in binary and grayscale images?
● Binary images the element of the set(object) is the coordinate (x, y) of the pixel
belonging to the object. Z2
● Grayscale images the element of the set(object) is the coordinate (x, y) of the pixel
belonging to the object as well as the gray level. Z3
Have we taken the set theory before? (Remember the primary school)
Yes we have taken it which is the set of numbers and the operations done on them as
union, intersection, complement and difference which we were representing them on
venn diagrams. Check the following slides.
In this part we will take them from images perspective.
MORPHOLOGY
MORPHOLOGY
From the image perspective:
Structuring Element(SE): small sets or subimages used to investigate an image under
study for properties of interest. SE is actually a matrix with specific dimension and a
pattern of 0s and 1s (in binary images) or 0-255 (in grayscale images) that specify the
shape of the structuring element. For example, the following SEs of 3x3 size:
What are we going to do with these structuring elements? Let us discuss the 3
terminologies we will use with structuring elements (Fit, Hit, and Miss)
MORPHOLOGY
The structuring element is said to be:
Fit: If all pixels of value 1 in the SE is matching a pixel of value 1 in the image.
Hit: If one of the pixels of value 1 in the SE is matching a pixel of value 1 in the image.
Miss: If none of the pixels of value 1 in the SE is matching a pixel of value 1 in the image
Notes:
● Zero-valued pixels in the SE are ignored.
● For the grayscale images, instead of the value of 1 in the image, we search for any
non-zero in the image to say whether hit or fit. We will cover this part again at the
end of the morphology part.
MORPHOLOGY
MORPHOLOGY
Now we are ready to take the main 2 concepts of morphology.
Erosion is the removal of structures of certain shapes and sizes that are given by the SE
(A process of shrinking) ⇒ keep only the pixels that fits the SE.
MORPHOLOGY
Dilation is the filling of the structure of certain shapes and sizes that are given by the
SE. (A process of growing) ⇒ Keep only the pixels that hits the SE.
MORPHOLOGY
Importance of Erosion:
● Shrinks or thins objects in images.
● Remove image components.
● Strip away extrusions,
● Splits joint objects.
Importance of Dilation:
● Grows or thickens objects in images.
● Bridge gaps between objects.
● Repair intrusions
● Fill small holes in objects.
MORPHOLOGY
Dilation Examples
MORPHOLOGY
Erosion Examples
MORPHOLOGY
Combining Erosion and Dilation
Opening is erosion followed by dilation with the same SE.
Closing is dilation followed by erosion with the same SE.
MORPHOLOGY
Opening is idempotent which means that once an image has been opened, subsequent
openings with the same structuring element have no further effect on that image.
Opening is called so because it can open up a gap between objects connected by a thin
bridge of pixels. Any regions that have survived the erosion are restored to their original
size by the dilation.
Similarly Closing is idempotent which means that once an image has been closed,
subsequent closings with the same structuring element have no further effect on that
image.
Closing is so called because it can fill holes in the regions while keeping the initial
region sizes.
MORPHOLOGY