Image and Video Processing All Slides
Image and Video Processing All Slides
Lect. Week
6th Jan to 10th Jan 2025 2
Motivation
3
Application of Image Processing
1. Remote sensing.
2. Medical Domain.
4. Industrial automation.
5. Film and entertainment industry. To add the special effects and create artificial
environment.
4
Application of Image Processing
5
Application of Image Processing
6
Application of Image Processing
Medical Domain
7
Application of Image Processing
Remote Sensing
8
CSET344 - Syllabus
9
Course Overview : Module 1
1. Analog-to-Digital Image Conversion:
Sampling and Quantization
Histogram Equalization: A specific histogram processing method that aims to redistribute pixel
intensities to achieve a uniform distribution.
3. Convolution:
A fundamental operation in image processing where a kernel (filter) is slide over the image, and an
output pixel is computed as a weighted sum of the input pixels within the kernel's region.
4. Image Smoothing:
Mean Filter, Median Filter, Gaussian Filter
5. Edge Detection:
Prewitt Operator, Sobel Operator, Laplacian Operator, Laplacian of Gaussian (LoG) Operator
And Canny Edge Detector: 10
Course Overview: Module 2
1.Line and Circle Detection using the Hough Transform:
Corner Detection: A technique for identifying image locations where two edges intersect,
forming a sharp corner.
Color Models: Mathematical representations of color, defining how colors are represented
numerically. Common examples include RGB, HSV, and CIELAB.
Color Transforms: Algorithms for converting colors between different color models, enabling
tasks like color correction, image segmentation, and color analysis.
4. Morphological Operations
A technique for estimating the apparent motion of objects between two consecutive image
frames. It involves calculating the pixel-wise displacement vectors that represent the motion
of objects in the scene.
12
Course Overview: Module 4
1. Different Methods of Face Detection
2. PCA for Dimensionality Reduction and other Feature Extractors like HOG, SIFT
13
CSET344 – Course Evaluation (Tentative)
1. Mid-Semester: 20 marks
2. End-Semester: 40 marks
3. Project Work: 20 marks
1. Presentation and Q&A (Individual Student) : 10 marks
2. Functionality and Working Condition: 10 marks
4. Laboratory Continuous Assessment: 20 marks
5. Programming Environment: All experiments will be conducted using the Python programming language
with OpenCV on the Google Colab platform or Visual Studio Code.
6. Module Coverage:
1. Before the Mid-Semester : Modules 1 and 2 will be completed.
2. After the Mid-Semester: Modules 3 and 4 will be covered.
7. Question Design: All questions will emphasize logical reasoning and problem-solving.
14
EM Spectrum
Refer: https://www.lumitex.com/blog/visible-light-spectrum
15
Image, Intensity or grey level and Pixel.
It’s a two dimensional function f(x , y) where x, y are the spatial coordinate and the amplitude at that particular
coordinate will be the intensity or grey level.
720 1080
x,y
HD 1920
FULL HD
1320
1320 x 720 1920 x 1080
2100
Image
3800
ULTRA 4k
3800 x 2100
16
Type of Images
0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 255 255 255 0
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
R,G,B
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
0 0 0 0 0 0 0 0 0 0
17
CSET344/CMCA544
Image and Video Processing
Module 1
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay
Scanners
Image Acquisition Cameras
Hard Drives
Image Storage Solid State Drives (SSD)
Image Enhancement
Image Restoration
Image Processing Image Segmentation
Image Analysis
Refer: https://www.lumitex.com/blog/visible-light-spectrum
20
Image Sampling and Quantization
21
Image Sampling and Quantization
Z
Sampling : Discretizing the time axis or coordinate.
A B
22
Representation of Digital images
0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 255 255 255 0
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
R,G,B
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
0 0 0 0 0 0 0 0 0 0
23
Representation of digital image
It’s a two dimensional function f(x , y) where x, y are the spatial coordinate and the amplitude at that particular
coordinate will be the intensity or grey level.
24
Basic Terminologies
Pixel (dots) - It’s a smallest unit of a digital image.
0 0 0 0 0 720 1080
3800
ULTRA 4k
3800 x 2100
25
Basic Terminologies
Spatial Domain
Lets say size of the image is M x N and its a 8-bit image. Total bits required = M x N x k.
No. of intensity levels = 2^k. if 8 bit image then 2^8 = 256 intensity levels or Grey levels (0 to 255).
26
Basic Terminologies
Dynamic Range, Contrast and Contrast Ratio
Dynamic Range: Dynamic range of any image processing system is the ratio of maximum measurable intensity
to minimum detectable intensity.
Dynamic range in terms of images or image contrast or contrast ratio: Difference between highest and lowest
intensity levels in an image.
Resolution means how many Dots per inch (DPI) or Pixel per inch (PPI). It directly refer to the clarity of the
image. If resolution is high, it means more information or details can be identified in better way.
27
Basic Terminologies
Spatial Resolution
Spatial Resolution: Capability of sensor to distinguish between two closely spaced objects.
Higher Spatial Resolution: Pixel size is small and one can see more details.
Lower Spatial Resolution: Pixel size is big and one can not distinguish between two closely spaced objects.
28
Basic Terminologies
Intensity Resolution
29
Basic Relationship Between Pixels
(x-1,y)
(x,y-1) p(x,y) (x,y+1) 8 Neighbors of P N8(p) = N4(P) + ND(P)
(x+1,y)
(x-1,y-1) (x-1,y) (x-1,y+1)
(x,y-1) p(x,y) (x+1,y+1)
Four Diagonal Neighbors of P,ND(P)
(x+1,y-1) (x+1,y) (x+1,y+1)
(x-1,y-1) (x-1,y+1)
p(x,y)
(x+1,y-1) (x+1,y+1)
30
Distance Measures
1.Euclidean Distance:
[(x-s)^2 + (y-t)^2]1/2
31
Basic Relationship Between Pixels
2 2 2 2 2 2
2 1 2 2 1 1 1 2
2 1 0 1 2 2 1 0 1 2
2 1 2 2 1 1 1 2
2 2 2 2 2 2
32
Image Enhancement in the Spatial Domain
Spatial Domain
34
Intensity Transformation
35
Intensity Transformation: Image Negative
Motivation : These kind of transformation is used to enhance grey level information embedded in the dark region of
an image or it is required when black area is dominant in size as compared to white region.
S=L–1–r
where L = Maximum Intensity Level
r = Input Intensity Level
S = Output Intensity Level
Consider 8-bit (0 - 255). Where L = 255.
10
20
30
40
36
Intensity Transformation: Log Transformations
Motivation : These transformation used to expand the dark pixel in an image while compressing the higher-level
values.
S = clog(1+r), where L = Maximum Intensity Level
r = Input Intensity Level
S = Output Intensity Level, C = constant. C = L-1/log(1+rmax)
r S = C log(1+r)
Consider 8-bit image. Where L = 255 and rmax = 255
0
Calculate S value ?
1
5
200
220
240
37
Intensity Transformation: Power Law Transformations (Gamma Correction)
Motivation : Visual quality of image may be hampered by illumination condition or wrong setting of camera sensor.
To rectify the same, one can utilized power law transformation or Gamma Corrections.
Basic idea is to raise the pixel value with certain power to improve the overall brightness and contrast of
the image.
S = c r^y,
where
3 x 3 Input Image
10 200 150
S = c r^y
20 100 90
70 50 220 C = 255
r = image(x,y)/ 255
39
Piecewise Linear Transformation Function: Contrast Stretching
Motivation : Low Contrast image can result from poor illumination, lack of dynamic range in the imaging sensor
or even thought the wrong setting of a lens aperture during image acquisition.
Contrast stretching expands the intensity range to utilize the full dynamic range of the sensor.
40
Piecewise Linear Transformation Function: Contrast Stretching
S = (r – Imin) x ((Omax-Omin)/(Imax - Imin)) + Omin S = Output Intensity Level
r = Input Intensity Level
Omax = Maximum Output
Omin = Minimum Output
Before Transformation Imax = Maximum input
Imin = Minimum input
10 5 150
20 100 90 Apply Contrast Stretching for r =10
70 50 30
Omax = 255, Omin = 0
After Transformation Imax = 150, Imin = 5
41
Piecewise Linear Transformation Function: Thresholding
output Intensity s
If r1 = r2
S1 = 0 and S2 = L-1
Input Intensity r
42
CSET344/CMCA544
Image and Video Processing
P(rk) = h(rk) / MN
Application
1. Image Enhancement
2. Image Thresholding
3. Image Segmentation
4. Image Analysis
44
Histogram Examples
45
Histogram Example
Histogram ?
Intensity or Grey Level Frequency Normalized
Histogram
For a 3-bit image and size is 3x 3
1 3 3/9
1 2 6
2 1 1/9
6 1 3
3 1 1/9
1 6 6
6 4 4/9
46
Histogram Equalization
Its a technique used in image processing to improve the contrast of an image. It works by redistributing the intensity values
of the pixels in an image so that the histogram becomes more uniform.
Sk = T(rk)
Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1
Where –
L = number of possible intensity levels
Sk = output intensity levels
rk = input intensity levels
Why it is required ?
To enhance the contrast of an image, especially when the pixel intensity values are concentrated in a narrow range
(e.g., very dark or very bright images).
48
Histogram Equalization
49
Histogram Equalization Example
Consider a 3 bit image of size 64x64 (4096) with intensity distribution shown in the
below table. Calculate the Equalized histogram.
Sk = T(rk)
S = L-1x(Pk) = 7x0.19 = 1.33
Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1 50
Histogram Equalization Example
Consider a 3 bit image of size 64x64 with intensity distribution shown in the below
table. Calculate the Equalized histogram.
rk nk P(rk) Sk Approximate Value Updated nk Updated
of Sk P(rk)
0 790 0.19 1.33 1 790 790/4096
1 1023 0.25 3.08 3 1023
2 850 0.21 4.55 5 850
3 656 0.16 5.67 6 656+329 = 985
4 329 0.08 6.23 6
5 245 0.06 6.65 7 245+122+81= 448
6 122 0.03 6.86 7
7 81 0.02 7.00 7
Sk = T(rk)
Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1
51
Concept of Kernal or Filter or Convolution Mask
X1
W1
W1X1 + W2X2
W2
X2
52
Spatial Correlation and Convolution: Padding size (M-1)/2 or (N-1)/2
Input Image
0 0 0 0 0 0 0
0 0 0 0 0 Kernel
0 0 0 0 0 0 0
0 0 0 0 0 1 2 3 0 0 0 0 0 0 0
0 0 1 0 0 4 5 6 0 0 0 1 0 0 0
0 0 0 0 0 7 8 9 0 0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
Correlation Convolution
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 9 8 7 0 0 0 0 1 2 3 0 0
0 0 6 5 4 0 0 0 0 4 5 6 0 0
0 0 3 2 1 0 0 0 0 7 8 9 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 53
Smoothing Spatial Filters
Smoothing Spatial Filter
Median Filters
Box Car or Mean
Filter
Max Filter
Weighted Avg.
Filter
Min Filter
Gaussian Filters
54
Image Smoothening: Box Car Filter
A filter that computes the average of pixels in the neighborhood blurs an image. Computing an average is analogous
to integration.
Or
A filter that reduces the sharp transition in the intensity called as smoothening or low pass filtering.
Convolving a smoothening kernel with image result in image blurring and the amount of blurring is always depends on
the size of the kernel.
Kernal Normalized Kernal
1 1 1 0.11 0.11 0.11
1/9 x 1 1 1 = 0.11 0.11 0.11
1 1 1 0.11 0.11 0.11
1 1 1 1
1 1 1 1
1/16 x
1 1 1 1
1 1 1 1 55
Image Smoothening: Box Car Filter
Comparison of outputs between Normalized and Non-Normalized Kernel
Image Normalized Kernel
1 2 3 0.11 0.11 0.11
0.11+0.22+0.33+0.44+0.55+0.66+0.77+0.88+0.99 = 4.95
4 5 6 0.11 0.11 0.11
7 8 9 0.11 0.11 0.11
56
Working Example: Padding size (M-1)/2 or (N-1)/2
Input Image
1 2 5 3 4 Kernel
5 6 7 8 9 0.11 0.11 0.11
2 3 4 5 6 0.11 0.11 0.11
3 6 8 4 2 0.11 0.11 0.11
1 5 6 8 7
1 2 1
1/16 x 2 4 2
1 2 1
57
Image Smoothening: Gaussian Filter
Consider SD = 1
X=0,y=0 : SD =1 : 1/2pi =
59
CSET344/CMCA544
Image and Video Processing
61
Image Smoothening: Gaussian Filter
Calculate the Gaussian Kernel for size
62
Image Smoothening: Gaussian Filter
Sigma =1 Sigma =2 Sigma =3
63
Smoothing Spatial Filters
Smoothing Spatial Filter
64
Max Filter, Min Filter, Median Filter
65
Examples : Max Filter, Min Filter. Assume Kernel size is 3 x 3
3 7 17 18 13 0 0 0 0 0 0 0
10 5 2 20 5 0 3 7 17 18 13 0
9 8 13 1 7 0 10 5 2 20 5 0
16 8 7 20 19 0 9 8 13 1 7 0
14 19 3 30 10 0 16 8 7 20 19 0
0 14 19 3 30 10 0
0 0 0 0 0 0 0
Input Image After Padding Zeros
Second Step
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 3 7 17 18 13 0 0 3 7 17 18 13 0
0 10 5 2 20 5 0 0 10 5 2 20 5 0
0 9 8 13 1 7 0 0 9 8 13 1 7 0
0 16 8 7 20 19 0 0 16 8 7 20 19 0
0 14 19 3 30 10 0 0 14 19 3 30 10 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 66
Examples : Max Filter, Min Filter. Assume Kernel size is 3 x 3
0 0 0 0 0 0 0
0 3 7 17 18 13 0
0 10 5 2 20 5 0 Min. Filter Output
0 9 8 13 1 7 0
0 0 0 0 0
0 16 8 7 20 19 0
0 2 1 1 0
0 14 19 3 30 10 0
0 2 1 1 0
0 0 0 0 0 0 0
0 3 1 1 0
0 0 0 0 0
0 0 0 0 0 0 0
0 3 7 17 18 13 0
0 10 5 2 20 5 0
0 9 8 13 1 7 0
0 16 8 7 20 19 0 Median Filter Output
0 14 19 3 30 10 0 0 3 5 5 0
0 0 0 0 0 0 0 5 8 8 13 5
68
Comparison Between Max., Min. and Median Filters output
69
Image Denoising
70
Image Denoising
71
Edge Detection
Prewitt
Sobel
Edge Detector
Laplacian
Edge: Edge pixels are the pixels at which intensity of an image Changes abruptly.
Line: it may be viewed as a thin edge segment where intensity of background on either side of line is either
much higher or much lower.
Sample Image
0 0 0 7 7 7 0 0 0
Step Edge
First Derivative
Second Derivative
Zero Crossing
76
Edge
77
Edge
78
Edge
Observations
2. Second order derivative have the stronger response for point, thin lines and noise.
3. Second order derivative produces double edge response at ramp and step transition in
intensity.
4. The sign of second derivative used to determine whether a transition into an edge from light
79
Edge
Original Image First Derivative Second Derivative
Gradient Vector
83
Edge Detection Image Gradient and its properties
Edge : An Abrupt or Sudden change in intensity which will help to identify the edges.
0 1 1 Z1 Z2 Z3
0 0 1 Z4 Z5 Z6
0 0 0 Z7 Z8 Z9
85
Edge Detectors
Prewitt
Sobel
Edge Detector
Laplacian
-1 -1 -1 -1 0 1
0 0 0 -1 0 1
1 1 1 -1 0 1
-1 -2 -1 -1 0 1
0 0 0 -2 0 2
1 2 1 -1 0 1
-1 -2 -1 -1 0 1 50 50 150
0 0 0 -2 0 2 50 50 150
1 2 1 -1 0 1
Note: Sobel operator has better noise
suppression capability as compared to
Prewitt. 88
Sobel Edge Detectors
Original Image Gradient in x direction
Thresholding
helps to avoid
the Minor Edges.
Note: When interest lies in high lightening the principal edge and maintaining as much connectivity
as possible, it is common practice to use both smoothening and thresholding.
91
Laplacian of Gaussian (LoG)
Limitation of Prewitt and Sobel Operators
1. They only able to detect the edges in horizontal and vertical direction.
2. For different type of images kernel size will be different.
1.Second derivative could able to localize edges in the better way as compared to the first derivative.
3. Edge detector should be isotropic i.e. it could detect edges in all the directions.
4.Larger Kernel should able to detect blurry edges and small kernel could able to detect the sharply focused fine
details.
LoG : Gaussian function first smooth the image and Laplacian will find the second derivative of gaussian
function which directly helps to locate the edge.
92
Laplacian of Gaussian (LoG)
LoG : Gaussian function first smooth the image and Laplacian
will find the second derivative of gaussian function which
directly helps to locate the edge.
Laplacian Function
Laplacian of Gaussian
93
Laplacian of Gaussian (LoG)
10 10 10 10 10
10 10 10 10 10 -1 -1 -1
100 100 100 100 100 -1 8 -1
100 100 100 100 100 -1 -1 -1
10 10 10 10 10
3 by 3 Kernel
Image
Output Image
94
Laplacian of Gaussian (LoG)
10 10 10 10 10
10 10 10 10 10 -1 -1 -1
100 100 100 100 100 -1 8 -1
100 100 100 100 100 -1 -1 -1
10 10 10 10 10
3 by 3 Kernel
Image
95
CSET344/CMCA544
Image and Video Processing
Module 2
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay
1. Good Detection: Find all the real edges and ignoring noise or artifacts.
3. Single Response: Return one point only for each true edge point.
Gradient Magnitude
High Threshold
(H)
4. Hysteresis Thresholding.
If M > H: Low Threshold
(L)
Edge
elif L< M<H:
Check is it connected with strong or weak edge
else:
No edge 97
Canny Edge Detector Example
Laplacian of Gaussian Canny Edge Detector Laplacian of Gaussian Canny Edge Detector
98
Comparison of all Edge Detector
Laplacian of
Feature Prewitt Sobel Canny
Gaussian (LOG)
Weighted difference (more Second derivative (finds zero Optimized gradient &
Gradient Operator Simple difference
robust) crossings) sophisticated criteria
Noise Sensitivity High Moderate High (but Gaussian blur helps) Low (due to Gaussian blur)
Computational Cost Low Slightly higher than Prewitt Higher (due to convolution) Highest
Simple image analysis, basic General-purpose edge Images with low noise, blob High-quality edge detection,
Typical Use Cases
edge detection detection detection computer vision
99
Harris Corner Detector
•Image Stitching: Finding corresponding corners in multiple images to align and stitch them together.
•Object Tracking: Tracking objects in videos by identifying and following their corners.
•3D Reconstruction: Using corners as features to reconstruct 3D models from multiple images.
100
Harris Corner Detector
101
Here M is Harris Matrix
Harris Corner Detector
0 0 1 4 9
1 0 5 7 11
1 4 9 12 16
3 8 11 14 16
8 10 15 16 20
Change in X direction
-1 0 1
-1
Change in Y direction 0
1
4 7 6 4 8 8
8 8 7 8 6 7
8 6 5 6 6 4
Ix = 4^2+7^2+6^2+8^2+8^2+8^2+7^2+8^2+6^2+5^2 = 403
Iy = 4^2+8^2+8^2+8^2+6^2+7^2+6^2+6^2+4^2 = 381
104
Hough Transforms: Line Detection
Point in Image Space will be a line in Parameter Space.
c
y Line from a non colinear point.
x m
Image Space Parameter Space c = -mx+y
Y = mx+c
m = slope
c = Intercept
1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1
0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0
0 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 3 1 1
0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1
106
m
Hough Transforms: Line Detection:
Working Example:
c
Point (1,2) and (2,3) are they colinear ?
For (1,2) –
107
Hough Transforms: Line Detection: Polar form
Issues:
Solution:
Polar form. Ro (rho) is distance from origin and theta is the angle for the same.
108
Hough Transforms: Circle Detection
y b
x a
Image Plane Parameter Plane
For each circle point in the image plane their will be a Accumulator Array filing will be circular
circle in the parameter space. b
0 1 1 1 0
(x-a)^2 + (y-b)^2 = r^2 1 0 0 0 1
1 0 3 0 1
a,b = centre of the circle.
r = radius of the circle. 1 0 0 0 1
1 1 1 1 0
If r is known then parameter space contains 2
parameters a and b. a
109
Problems with Hough Transform
1.Computational Cost:
2.Parameter Sensitivity:
3.Memory Usage
110
Morphological Operations
1. Morphological operations are a set of techniques used in image processing to analyze and modify the shape and
structure of objects within an image.
2. Morphological operations are defined in terms of sets. Specifically in Image Processing, Morphology uses two types of
set of pixels. Structuring element can be specified in terms of both foreground and background pixels.
111
Dilation
Dilation generally grow or increase thickness. Full match = 1
Partial match = 1
No match = 0
For Example
0 0 0 0 0 0 0 0 1 1 0 0
0 0 1 1 0 0 1 0 1 1 1 1 0
0 1 1 1 1 0 1 0 1 1 1 1 0
0 0 1 1 0 0 1 0 1 1 1 1 0
0 0 0 0 0 0 0 0 1 1 0 0
112
Dilation Examples
Application of dilation
113
Erosion
Erosion generally shrink or decrease thickness. Full match = 1
Partial match = 0
No match = 0
For Example
1 1 1 1 1 1 0 0 0 0 0 0
1 1 0 0 1 1 1 1 0 0 0 0 1
1 0 0 0 0 1 1 1 0 0 0 0 1
1 1 0 0 1 1 1 1 0 0 0 0 1
1 1 1 1 1 1 0 0 0 0 0 0
114
Erosion Examples
Application of erosion
1. Noise Removal.
2. Object Separation and feature extraction
115
Opening
Erosion followed by Dilation
0 0 0 0 0 0 0
0 1 0 0 0 1 0
For Example 0 0 0 0 0 0 0
Erosion
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1
1 1 1 0 1 1 1 1 1 1
1 1 1 0 1 1 1
Input Image Structuring Element 1 1 1 0 1 1 1
Input Image 1 1 1 1 1 1
Structuring Element 1 1 1 1 1 1
Application of erosion Dilation followed by Erosion
117
CSET344/CMCA544
Image and Video Processing
119
Hit or Miss Transform – (HMT)
Detecting a shape D
Structuring element
completely fits on
several locations.
120
Hit or Miss Transform – (HMT)
Q. Instead of going for two structuring elements. Can we detect the same shape by using a single structuring element ?
Solution. Use a structuring element B which is exactly same shape as D with additional border of background elements
with a width of one pixel thick.
121
Hit or Miss Transform – (HMT)
One Structuring Element
124
Basic Morphological Algorithms
X0 X1 X2 X3
1 1 1 1 1 1 1
1 1 1 1 1
1
1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1
X6 X7 X8 X8 U A 125
Basic Morphological Algorithms
Connected Components – It generally used to extract all the connected components in original image A.
Erosion Shrinking - Make the objects much thinner or highlight the core part of object.
Remove Noise – Eliminates noise along the edge of the objects.
Texture Analysis
Statistical Approaches or Spectral Approaches or
Techniques Techniques
1 1
5 1
Grey Level Co-occurrence Matrix (GLCM)
1 7 2 1
2 3 4 5
7 2 1 7
8 2 1 7
GLCM Example
Construct a GLCM matrix for the below matrix
1 7 2 1
2 3 4 5
7 2 1 7 1 2 3 4 5 6 7 8
8 2 1 7 1 3
2 3 1
3 1
4 1
5
6
7 2
8 1
CSET344/CMCA544
Image and Video Processing
Assigning Colors to grey scale intensity Image Acquired using a full color sensor like
or range of intensities. digital camera or color scanner.
Color Fundamentals
1. Basically the color human and other animal perceive in an object is determined by the nature of the light
reflected from the target.
2. Chromatic Light spans the electromagnetic spectrum from approximately 400nm to 700nm.
Three basic quantities used to describe the quality of chromatic light source are :
Radiance.
Luminance.
Brightness.
Color Fundamentals
Radiance: Total amount of energy flows from a light source. It is measured in Watt.
Luminance: Total amount of energy that an observer perceive from the light source. It is measured
in lumens (lm). For example infrared.
Cones are the sensors in the human eye responsible for color vision.
Example :
High Saturation: Lets say, if you are using pure red paint then its fully
saturated.
Low Saturation: If you are gradually mixing white paint inside the red then it
changed from pure red to faded red or pink.
Brightness : it always refer how light and dark color will appear.
Tristimulus Values – The amount of red green and blue required to form any particular color is known as tristimulus values
and it is denoted by X, Y, Z.
2. There should be a subspace within system, such that each color in the model is represented by a single point contained
in it.
Motivation: Human can differentiate between different colors and intensities as compared to different shades of grey.
Color Image
Pseudo Image Processing
Intensity Slicing and Color Coding
Grey Scale Image Intensity Slicing using 8 Color coded image of the weld,
colors X-ray image of the weld where one color assign to 255
intensity levels and another color
to all other intensity levels.
Regions that appear of constant intensity in grey
scale is quite variable in the color sliced image.
It will make the quality inspector job easy, as a result
low error rate.
Pseudo Image Processing
Intensity to color transformations : it is better than the simple slicing techniques.
NIR of Landsat RGB Color Composite using IR,G,B RGB Color composite using R,IR,B
Basics of Full Color Image Processing
Full Color Image Processing
Process each Grey scale color Process or work with the color pixel
image components individually directly.
and then form a composite color
image
Color Transformations
Tone Correction
Image Negative
Color Complements
CSET344/CMCA544
Image and Video Processing
Module 3
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay
6. Homomorphic Filtering
7. Selective filters.
156
Frequency Domain Fundamentals
1. Fourier Series: Any periodic function can be
expressed as a sum of sines and/or cosines of
different frequencies, each multiplied by a different
co-efficient.
Where –
A function f(t) of a continuous variable, t that is
periodic with period T.
It’s a Fourier transform pair which indicate forward and Inverse Fourier transform.
158
Frequency Domain Fundamentals
Working Example
160
Frequency Domain Fundamentals
Sampling Theorem: A continuous bandlimited signal can be recovered
completely from the set of its samples if the samples are acquired at a Example -
rate exceeding twice the highest frequency content of the function. m(t) = sin2Πt + sin3Πt + sin4Πt
Fs < 2fm (Under sampling or aliasing effect) Similarly f = 1.5 and 2 for other two
cases respectively.
Where, Fs = sampling frequency and fm = highest frequency present in
the signal
161
Frequency Domain Fundamentals
Aliasing: It’s a phenomena where different signals are indistinguishable from Anti Aliasing :
Aliasing can be reduced by smoothening (Low
one another after sampling. Pass filter) input function to attenuate the higher
Fs < 2fm (Under sampling or aliasing effect) frequencies.
163
1-D DFT and Inverse DFT working example
164
Frequency Domain Filters
Low Pass Filters (LPF) High Pass Filters (HPF) Homomorphic Filtering Selective Filtering
165
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)
A 2D filter which passes all the frequencies within a circle of radius from the origin and cut off or attenuate all the
frequencies which are outside to this circle.
166
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)
(a) Ideal LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section
167
Test Pattern image Circle with radi 10,30,60,160,460
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)
a b c
a – Original Image
b – ILPF with cut off Frequency set at
radii value - 10
c – ILPF with cut off Frequency set at
radii value - 30
d – ILPF with cut off Frequency set at
radii value - 60
e – ILPF with cut off Frequency set at
radii value - 160
f – ILPF with cut off Frequency set at
radii value - 460
d e f
168
Frequency Domain Filters: Gaussian Low Pass Filter (GLPF)
It is specified by the transfer function H(u,v)
(a) Gaussian LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section with various value of D0
169
Frequency Domain Filters: Gaussian Low Pass Filter (GLPF)
a b c
a – Original Image
b – GLPF with cut off Frequency set at
radii value - 10
c – GLPF with cut off Frequency set at
radii value - 30
d – GLPF with cut off Frequency set at
radii value - 60
e – GLPF with cut off Frequency set at
radii value - 160
f – GLPF with cut off Frequency set at
radii value - 460
d e f
170
Frequency Domain Filters: Butterworth Low Pass Filter (BLPF)
It is specified by the transfer function H(u,v)
(a) Gaussian LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section with orders 1 to 4.
171
Frequency Domain Filters: Butterworth Low Pass Filter (BLPF)
a b c
a – Original Image
b – BLPF with cut off Frequency set at
radii value - 10
c – BLPF with cut off Frequency set at
radii value - 30
d – BLPF with cut off Frequency set at
radii value - 60
e – BLPF with cut off Frequency set at
radii value - 160
f – BLPF with cut off Frequency set at
radii value - 460
d e f
172
Comparative Analysis Between ILPF, GLBF and BLPF
Image Sharpening can be achieved in the frequency domain by passing the high frequency components (i.e Edges
or other sharp transitions) and attenuate the low frequency components.
Where
n = order
174
Frequency Domain Filters: Image Sharpening using High PF
Transfer Function (a) HPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section
175
Frequency Domain Filters: Image Sharpening using High PF
a b c
Filtered with (a) IHPF, (b) GHPF, (c) BHPF with D0 =60
Filtered with (d) IHPF, (e) GHPF, (f) BHPF with D0 =160
d e f
176
CSET344/CMCA544
Image and Video Processing
6. Homomorphic Filtering
178
Frequency Domain Filters: Homomorphic Filtering
Objective – Overall objective is to separate illumination and reflectance components to manipulate them
independently.
180
Frequency Domain Filters: Working Examples
1-D Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT)
DFT
IDFT
181
1-D DFT and Inverse DFT working example
182
1-D DFT and Inverse DFT working example
183
Spatial domain and Frequency Domain Filters Analogy
Convolution Multiplication
184
Frequency domain filtering : Flow Chart
3 4 5
Filter Function
Fourier Transform Inverse Fourier Transform
H(u,v)
F(u,v)
F(x,y) G(x,y)
Input Image Enhanced Image
1 7
185
Frequency Domain Filters: Working Examples
Problem Question
Input Image
Step 1: Multiply the input image by (-1)^x+y to shift the Centre from (0,0) to (2,2).
186
Frequency Domain Filters: Working Examples
(-1)^0,0 = 1, (-1)^0,1 = -1 (-1)^0,2 = 1 (-1)^0,3 = -1
0,0 0,1 0,2 0,3
(-1)^1,0 = -1, (-1)^1,1 = 1 (-1)^1,2 = -1 (-1)^1,3 = 1
1,0 1,1 1,2 1,3
2,0 2,1 2,2 2,3 (-1)^2,0 = 1, (-1)^2,1 = -1 (-1)^2,2 = 1 (-1)^2,3 = -1
3,0 3,1 3,2 3,3
(-1)^3,0 = -1, (-1)^3,1 = 1 (-1)^3,2 = -1 (-1)^3,3 = 1
1 0 1 0 1 -1 1 -1 1 0 1 0
1 0 1 0 -1 1 -1 1 -1 0 -1 0
1 0 1 0 X 1 -1 1 -1 = 1 0 1 0
1 0 1 0 -1 1 -1 1 -1 0 -1 0
1 1 1 1 1 0 1 0 1 1 1 1
1 -j -1 j -1 0 -1 0 1 -j -1 j
X X =
1 -1 1 -1 1 0 1 0 1 -1 1 -1
1 j -1 -j -1 0 -1 0 1 j -1 -j
0 0 0 0
0 0 0 0
DFT =
16 0 16 0
0 0 0 0
1 1 1 1
H(u,v) = 1 1 1 1 Note: IHPF – Any value greater than 0.5 will be 1
else is 0.
D0 = 0.5 1 1 0 1
1 1 1 1 189
Frequency Domain Filters: Working Examples
Step 3,4: G(u,v) = F(u,v) x H(u,v)
0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0
X =
16 0 16 0 1 1 0 1 16 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0
190
Frequency Domain Filters: Working Examples
Step 5: Compute the IDFT of the image.
Input Image
1 1 1 1 0 0 0 0 1 1 1 1
1 j -1 -j 0 0 0 0 1 j -1 -j
X X =
1 -1 1 -1 16 0 0 0 1 -1 1 -1
1 -j -1 j 0 0 0 0 1 -j -1 j
16 16 16 16 1 1 1 1
-16 -16 -16 -16 = -1 -1 -1 -1
1/16 16 16 16 16 1 1 1 1
-16 -16 -16 -16 -1 -1 -1 -1
(-1)^(x+y)
1 1 1 1 1 -1 1 -1
-1 -1 -1 -1 -1 1 -1 1
X
1 1 1 1 1 -1 1 -1
-1 -1 -1 -1 -1 1 -1 1 =
1 -1 1 -1
1 -1 1 -1
Final Output =
1 -1 1 -1
1 -1 1 -1
4. Huffman Coding
5. Lossy Compression
7. Zigzag Coding
Key Points
2. Data Compression is a process of reducing the amount of data required to represent given quantity of
information.
Data are the means by which information is conveyed. Data may be redundant.
3. Redundant data – If some amount of data contains the same information or repeated information.
195
Image Compression Fundamentals
b′ If C =10
the larger representation has 10 bits of data for every
1 bit of data in the smaller representation.
196
Image Compression Fundamentals
Redundancy
It refers to presence of
unnecessary bits used to Spatial Redundancy – Pixel those are
Close to each other often have similar Data that can be discarded without
represent the image data. significantly affecting the perceive
values.
quality of the image.
Temporal Redundancy
There is very less difference
Between to successive video frames.
197
Image Compression Models
Image
For videos –
Where the discrete
parameter t specify
the time.
Overall Objective – Input image is fed to the encoder which creates a compressed representation of it. Now
this compressed data is fed to the decoder which reconstruct the original data or image.
198
Image Compression Fundamentals
Encoding or Compression Process
NOTE: Operation performed by mapper NOTE: Operation performed by NOTE: Operation performed by symbol
function is reversible. Quantizer is not reversible. coder is reversible.
199
Image Compression Fundamentals
Decoding or Decompression Process
It contains mainly two components - First one is symbol decoder and second one is an inverse mapper. These two
components perform exactly inverse operation that is performed by encoder.
200
Image Compression Fundamentals
Average Length of the code Total bits to be transmitted Entropy How much space you saved
202
Huffman Coding working example
Problem Statement –
1. Consider an image of size 10 by 10 (5 bit image). Consider some symbols with different frequencies.
a2 = 40 , a6 = 30 , a1 = 10 , a4 = 10 , a3 = 6 , a5 = 4 (Probability – a2 40/100, a6 30/100, a1 = 10/100…..etc)
Source Reduction
Symbol Probability 1 2 3 4
a5 0.04
203
Huffman Coding working example
Encoded String: 010100111100
Decoding: a3a1a2a2a6
Source Reduction
Symbol Probability 1 2 3 4
a5 0.04 01011
204
Image Compression Fundamentals
Parameter Calculation
Average Length of the code L = 0.4x1 + 0.3 x 2 + 0.1 x 3 + 0.1 x 4 + 0.06 x 5 + 0.04 x 5 = 2.2 bits/symbol
Entropy = -p x log2 p
-0.4 x log(base 2) x 0.4 +
-0.3 x log(base 2) x 0.3 +
.
.
-0.04 x log(base 2) x 0.04 = 2.1396
205
Image Compression Fundamentals
Apply Huffman Encoding to below example
1 3 1 3
5 4 5 4
4 3 5 5
3 1 4 3
206
Run Length Encoding
Run Length Encoding :
Repeating intensities along the row an columns often be compressed by representing runs of identical intensities where
each run length pairs specify the start of the new intensity and number of consecutive pixels that have that intensity.
Example
11111000000001111111110011111100000111111111 Total Bits - 42
(0,2times) (0,5times)
(1,5times)
(1,6times) (1,9times)
(0,8times)
(1,9times)
207
Run Length Encoding
Binary Representation Binary Representation
8 4 2 1
(1,5times) 10101
(0,8times) 01000
(1,9times) 11001
(0,2times) 00010
(1,6times) 10110
(0,5times) 00101
(1,9times) 11001
208
CSET344/CMCA544
Image and Video Processing
4. Huffman Coding
5. Lossy Compression
210
Lossy Compression
Lossless Compression - In this type of compression, after recovering image is exactly same
as it was before applying the compression technique.
Lossy Compression – In this type of compression, after performing the inverse transformation we cant get exactly the same
image as the older one (or image before transformation). Overall the quality of the image get significantly reduced.
Lossy Compression
211
Lossy Compression : DCT
Discrete Cosine Transform - DCT
3. In DCT most of the significant information or signal energy or image energy, is concentrated in few number of
coefficients (near to the origin) and rest other frequency having very small information which can be stored by using very
less number of bits.
So, in the (P,Q) plane by coding few number of coefficients, we can represent most of the signal or image energy.
212
Lossy Compression: DCT
4. DCT coefficient are real valued while DFT coefficients are complex, therefore, hardware implementation of DCT is
easier than DFT.
Video Compression – H.261, MJPEG, MPEG1, H.262 (MPEG2), H.265(HEVC), WebM etc.
213
Lossy Compression : DCT
Issues with the DCT
1 A common issue with DCT compression in digital media are blocky compression
artifacts, caused by DCT blocks. The DCT algorithm can cause block – based artifacts
when heavy compression is applied.
214
Lossy Compression : DCT
2D Forward Discrete Cosine Transform (FDCT)
The two dimensional DCT of an MxN image f(x,y) is defined as follows -
215
Loss Less Compression : DCT
2D Inverse Discrete Cosine Transform (IDCT)
The two dimensional DCT of an MxN image f(x,y) is defined as follows -
216
Lossy Compression
2D DCT Basis Function
217
Lossy Compression
1. This Method is suitable for small image segments such as 8 x 8, 16 x 16. For the same, a DCT transformation matrix T is
computed first for M x M segment by the following equation -
218
Lossy Compression
2. When Transformation matrix (T) is computed then, DCT of an image segment f(x,y) can be found by
3. Since T is real orthonormal , its inverse is equivalent to its transpose therefore, Inverse DCT can be found by
It utilizes the FFT structure for speedy computation of DCT, hence suitable for large input images.
219
Lossy Compression : DCT
8 x 8 input image
DCT Coefficients
220
Color Image Compression : Flow Chart
Encompression
Decompression 221
Text Recognition : Optical Character Recognition (OCR) : Flow Chart
Post-processing
Preprocessing Feature Extraction
1.Motion detection in image and video processing is a critical technique used to identify changes in a
sequence of images or video frames. Essentially, it aims to determine if and where movement has
occurred within a scene.
2. Motion detection focuses on analyzing temporal changes in pixel values across consecutive frames
Frame Differencing
Optical Flow
224
Motion Detection
1. This Method Involves 1.This technique involves creating a 1.Optical flow estimates the apparent
subtracting one frame from model of the static background and motion of objects between frames by
another frame. then subtracting it from each new analyzing the movement of pixels.
frame.
2. The Resulting difference 2.The remaining pixels represent 2.It provides a more detailed
highlight the areas where moving objects. understanding of motion, including
changes has been occurred. direction and velocity.
226
Optical Flow field example
Crowded sequence where some group of pixels are moving in one direction and some other group of pixels
are moving in other direction.
228
Optical Flow field example
229
Optical Flow Applications
1. Motion Based Segmentation: In a particular video you want to identify which objects are moving and which objects
are not moving. Then compute the optical flow and if optical flow is significant, you could able to say these objects
are moving or in a case when optical flow is not significant then objects are not moving.
3. Video Compression.
230
Optical Flow
F(x, y, t ) = F(x+dx, y+dy, t+dt)
231
Optical Flow : Lucas Kanade Method
232
Optical Flow
Rewrite the equation in the matrix Trying to make A as a square
form. Here U,V is unknown. matrix.
If we want to minimize it
then first step we have to
perform differentiation.
Pseudo Inverse
233
Optical Flow: Lucas Kanade Method
234
Optical Flow: Lucas Kanade Method
235
Optical Flow: Lucas Kanade Method
Comments
236
CSET344/CMCA544
Image and Video Processing
Module 4
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay
238
Face Detection
Face detection :
3. Provide these features to a classifier and then take a decision related to face or non face. 239
Face Detection
Feature Vectors.
Features Classifiers
Which features represent the face well. How to construct a face model and efficiently
classify features as face or not ?
240
Face Detection
242
Viola Jones Method for Face Detection : face or not face
243
Viola Jones Method for Face Detection : Overall Flow Diagram
Its an efficient method to scans an image, using simple features and a cascade structure, to locate the faces in real
time.
244
Viola Jones Method for Face Detection: Haar Filters
Haar filters are based on Haar wavelets.
246
Viola Jones Method for Face Detection: Haar Filters
Laplacian
247
Viola Jones Method for Face Detection: Haar Filters
VA[i,j] = Sum(Pixel Intensity in White Area) – Sum (Pixel Intensity in the Black Area)
248
Viola Jones Method for Face Detection: Haar Filters
250
Viola Jones Method for Face Detection: Integral Image Formation
For Example –
An original image I and a Integral Image II is given to
you.
251
Viola Jones Method for Face Detection: Integral Image Formation
STEP 1.
STEP 2. Subtract Q from P.
P = Sum of all the values in the left and top.
STEP 3. Subtract S from P. STEP 4. R is subtracted twice in overall process then add it.
NOTE:
1. Overall
Computational cost
is 3 additions.
2.This computational
cost is independent
of the size of the
rectangle.
252
Viola Jones Method for Face Detection Haar Response using Integral Image
Note:
Integral Image used to compute once per test image.
Integral image formation allows fast computations of Haar
features.
(2061-329+98-584) – (3490-576+329-2061) = 64 253
Total additions = 7
Viola Jones Method for Face Detection : Adaboost or Adaptive Boosting
AdaBoost a Machine Learning based algorithm that excel the operation of feature selection in the Viola Jones Algorithm.
We have a lot of Haar features. Primary role of Adboost is to select most effective haar features from a very large pool.
This selection is very crucial for accuracy and the speed of the face detection.
Each Haar like feature treated as In each round it will In each round it evaluates
Weak classifier. focuses on the training all the weak classifier and
sample those are Identify all the features
Here Weak classifier means – A misclassified. that perform well on the
single haar feature along with training data.
threshold.
Features that perform
Weak Means it can only provide a well, will get higher
rough estimate whether a sub weightage.
window contains a face or not.
254
Viola Jones Method for Face Detection: Adaboost or Adaptive Boosting
Adaboost or Adaptive Boosting Strong Classifier : It is combination of many weak classifier (haar features) that Adaboost
identifies as being most effective.
Weak Classifier 1
Strong Classifier
255
Viola Jones Method for Face Detection : Cascading
Cascading
257
Face Recognition, PCA, Concept of eigen faces.
258
Face Recognition, PCA, Concept of eigen faces.
Eigenface
259
Face Recognition, PCA, Concept of eigen faces.
Face Recognition Process flow diagram
Let's say, you want to recognize this template in the Rich2D image (High resolution).
If you are using template matching for the same purpose then -
1. User need to create a lot of templates with different orientation and scale because the size and orientation of the
templates inside the image may be different.
2. Apart form this, if you could see template is partially not visible in the 2D image, It is covered by some other objects,
The solution in this scenario is, we need to construct a lot of little templates and match all of them. At the end, overall
process is time consuming and computationally not efficient.
262
Scale Invariant Feature Transform (SIFT): Motivation
Instead of doing template matching, one can extract some important descriptive features known as interest points from
template and match it inside the original image.
263
Feature Detection for Machine Learning : SIFT : Flow Diagram
SIFT: It is a robust algorithm designed to identify and describe local features in images that is invariant to scale, rotation
and illumination changes.
or
SIFT can detect the same feature in an image even if the image is resized, rotated and viewed under different lightening
conditions.
Steps in SIFT
Scale Space Extrema Detection Key point Localization Orientation Assignment Key Point Descriptor
264
Feature Detection for Machine Learning : SIFT
1.Scale Space Extrema Detection
The scale space is the process of creating a set of progressively blurred images at multiple resolutions to detect key points
that are scale-invariant (they remain the same even if the image size changes). It helps to detect features that are stable and
can be recognized even when the image is scaled or resized.
265
Feature Detection for Machine Learning : SIFT
Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points
First, algorithm progressively applies a gaussian blur to the image to blur it at different scales (levels), which smooths it by
different amounts. This means we see the image from clear to blurry.
Where:
•L(x,y,σ) is the blurred image at scale σ.
•G(x,y,σ) is the Gaussian kernel.
•I(x,y) is the original image.
The image is also downsampled (reduced in size) after each octave, allowing features to be detected at smaller resolutions (or
sizes) as well. Here, octave is a set of images at different resolutions.
266
Feature Detection for Machine Learning : SIFT
1. Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points
D(x,y,σ)=L(x,y,kσ)−L(x,y,σ)
Where 𝑘 is a constant scaling factor.
268
Feature Detection for Machine Learning : SIFT
1.Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points
269
Feature Detection for Machine Learning : SIFT
2. Key point Localization
After building the scale-space and finding potential keypoints (local maxima or minima in the DoG images), the locations of
detected keypoints need to be refined to make sure they are accurate. To get a more precise location for each keypoint, a
mathematical method called a Taylor series expansion is used. Think of it as a way to zoom in and find the exact point where
the keypoint should be, like adjusting the focus of a camera for a sharper image.
The Taylor series expansion is used to approximate a function near a given point. In the context of the SIFT algorithm, it's
applied to approximate the DoG function around a potential keypoint to refine its location and scale. The Taylor expansion of
the DoG function, D(x,y,σ) around a candidate keypoint is given by:
270
Feature Detection for Machine Learning : SIFT
2. Key point Localization
Some key points might be located in areas that are too flat or don't have enough variation in brightness (low contrast). These
key points are not useful because they can be easily affected by noise. Therefore, the intensity (brightness) of each key point
is checked. If the intensity is below a certain value (0.03, according to the SIFT paper), that key point is discarded. This means
that only key points that are both well-located and have enough contrast are kept.
Key point at different scale Key point removes ( low contrast) Key point removes (located on edges)
271
Feature Detection for Machine Learning : SIFT
3.Orientation Assignment Now, the identified keypoints are considered stable (they won’t change much if the
image is modified slightly).
Each keypoint is given a direction to make the algorithm resistant to image rotation.
A small region around the keypoint is analyzed based on its scale, and the
magnitude and gradients in the image are calculated.
Key points and their direction
272
Feature Detection for Machine Learning : SIFT
4. Key Point Descriptor
After the keypoints have been detected and assigned an orientation,
the next step is to create a descriptor for each keypoint. This
descriptor is a compact representation of the keypoint, capturing
the local image information around it.
274
Histogram of oriented gradient: HOG (Feature Descriptor)
The histogram of oriented gradients method is a feature descriptor technique used in computer vision and
image processing for object detection. It focuses on the shape of an object, counting the
occurrences of gradient orientation in each local region. It then generates a histogram using the
magnitude and orientation of the gradient.
275
Feature Detection for Machine Learning : HOG
Histogram: A graphical representation of the frequency distribution of data — in this case, the gradients’ directions.
Oriented: Refers to the direction of the gradients.
Gradients: Represent changes in pixel intensity values, capturing the edges, textures, and structures in the image.
Steps in HOG:
a b c d e
c – Resize image into 64 x 128, d – keep a grid on image where 16 rows with grid size 8 x 8.
278
Feature Detection for Machine Learning : HOG
281
Salient Object Detection (SOD) in videos Steps in salient object detection
STEP1
STEP2 Spatial Feature Extraction Features from each frame is extracted like texture,
edge, color.
Feature Extraction
STEP3
Generate a saliency map for each frame
using the spatial feature extracted in the
Spatial Saliency Map Generation previous step.
Saliency Prediction
Generate a saliency map for each frame based
Temporal Saliency Map Generation on temporal feature extracted from the
previous frame.
283
Salient Object Detection (SOD) in videos Steps in salient object detection
Fusion of spatial and temporal Fuse the spatial and temporal maps to generate the
saliency Spatiotemporal saliency map.
STEP 5
The final output is the sequence of saliency maps one from each frame of the
Saliency Prediction
input video.
284
Comparison Traditional SOD Vs Deep learning-based techniques
Feature Traditional Methods Deep Learning Based Methods
Contrast-based analysis, heuristic rules, basic End-to-end learning of saliency maps using deep
Saliency Prediction
machine learning classifiers (e.g., SVM). neural networks optimized for this task.
285
Comparison Traditional SOD Vs Deep learning-based techniques
Feature Traditional Methods Deep Learning Based Methods
Typically less reliant on Often require large, pixel-wise annotated video saliency datasets for training (though unsupervised and weakly-
Annotation Dependence
large-scale annotated data. supervised methods are emerging).
286
Human action recognition or activity from videos sequences
Human Action Recognition (HAR) General Pipeline for Human Action Recognition (HAR)
HAR is a technique which is capable of recognizing and
categorizing the human action based on the sensor
data
288
Role of Spatio-Temporal feature for action classification
Spatio Temporal features plays critical and fundamental role for human action recognition through videos.
2. It encode motion and dynamics which helps to bridge the gap between spatial appearance and
temporal evoluation.
4. It handle the variation in execution like handle a same action performed with various speed and
style.
5. By jointly considering spatio and temporal features will help to improve the accuracy and
robustness.
289
End Term Question Paper Pattern
• Section A
• Total 5 Questions (3 marks each) Total 15 marks
• Section B
• Total 3 Questions (5 marks each) Total 15 marks
• Section C
Total 1 Question (10 marks each) Total 10 marks