0% found this document useful (0 votes)
8 views290 pages

Image and Video Processing All Slides

The document outlines the CSET344/CMCA544 Image and Video Processing course, coordinated by Dr. Gaurav Kumar Dashondhi, covering various modules from image acquisition to processing techniques. It highlights the motivations for image processing, applications in fields like medical and industrial automation, and includes a detailed syllabus with evaluation criteria. Key concepts such as pixel representation, image enhancement, and distance measures are also discussed.

Uploaded by

shrutinarang57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views290 pages

Image and Video Processing All Slides

The document outlines the CSET344/CMCA544 Image and Video Processing course, coordinated by Dr. Gaurav Kumar Dashondhi, covering various modules from image acquisition to processing techniques. It highlights the motivations for image processing, applications in fields like medical and industrial automation, and includes a detailed syllabus with evaluation criteria. Key concepts such as pixel representation, image enhancement, and distance measures are also discussed.

Uploaded by

shrutinarang57
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 290

CSET344/CMCA544

Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. From IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 6th Jan to 10th Jan 2025
[email protected]
1
CSET344
Image and Video Processing
Ass_Lec. Ass_Tut. Ass_Lab Ass_Total Faculty Name Faculty mail

4 0 8 12 Mr. Prashant Kapil [email protected]

4 0 0 4 Dr. Gaurav Kumar Dashondhi [email protected]

0 0 10 10 Dr. Kimmi Gupta [email protected]

0 0 10 10 Dr. Shallu Sharma [email protected]

Lect. Week
6th Jan to 10th Jan 2025 2
Motivation

• Three Major Motivation for Processing of Images.

1. Improvement of Pictorial information for human interpretation.


Like using Instagram and Snapchat filters.

2. Processing of Image data for storage, transmission and representation of autonomous


machine perception.

3. Processing is essential to analyze and extract the information.

3
Application of Image Processing

1. Remote sensing.

2. Medical Domain.

3. Security and Surveillance. Object detection and Tracking.

4. Industrial automation.

5. Film and entertainment industry. To add the special effects and create artificial
environment.

4
Application of Image Processing

Automate the Status of Bottle Filling Inspection

5
Application of Image Processing

Automate the IC Connection Inspection : Some connections are broken.

6
Application of Image Processing

Medical Domain

Original Image Processed Image1 Processed Image 2

7
Application of Image Processing
Remote Sensing

8
CSET344 - Syllabus

9
Course Overview : Module 1
1. Analog-to-Digital Image Conversion:
Sampling and Quantization

2. Spatial Domain Image Enhancement:


Histogram Processing: Techniques for modifying the image histogram to improve contrast and visual
appearance.

Histogram Equalization: A specific histogram processing method that aims to redistribute pixel
intensities to achieve a uniform distribution.

3. Convolution:
A fundamental operation in image processing where a kernel (filter) is slide over the image, and an
output pixel is computed as a weighted sum of the input pixels within the kernel's region.

4. Image Smoothing:
Mean Filter, Median Filter, Gaussian Filter

5. Edge Detection:
Prewitt Operator, Sobel Operator, Laplacian Operator, Laplacian of Gaussian (LoG) Operator
And Canny Edge Detector: 10
Course Overview: Module 2
1.Line and Circle Detection using the Hough Transform:

2. Harris Corner Detector:

Corner Detection: A technique for identifying image locations where two edges intersect,
forming a sharp corner.

3. Color Models and Color Transforms:

Color Models: Mathematical representations of color, defining how colors are represented
numerically. Common examples include RGB, HSV, and CIELAB.

Color Transforms: Algorithms for converting colors between different color models, enabling
tasks like color correction, image segmentation, and color analysis.

4. Morphological Operations

5. Texture analysis using GLCM


11
Course Overview: Module 3
1. Concept of Optical Flow:

A technique for estimating the apparent motion of objects between two consecutive image
frames. It involves calculating the pixel-wise displacement vectors that represent the motion
of objects in the scene.

2. Image Enhancement in the Frequency Domain

3. Image Compression using Lossless and Lossy Techniques

4. Discrete Cosine Transform (DCT):


A mathematical transformation that decomposes a signal into a sum of cosine functions at
different frequencies. It is widely used in image and video compression, particularly in the
JPEG standard, as it concentrates most of the signal energy into a few low-frequency
coefficients.

12
Course Overview: Module 4
1. Different Methods of Face Detection

Viola-Jones: A cascade classifier based on Haar-like features and boosting.


Histogram of Oriented Gradients (HOG)
Scale-Invariant Feature Transform (SIFT)

2. PCA for Dimensionality Reduction and other Feature Extractors like HOG, SIFT

3. Techniques related to Video Processing, Formation, Compression and Coding:

4. Silent Object and Human Action Reorganization

5. Detail of the Depth Cameras


Sensors that can measure the distance to objects in a scene, providing depth information.

13
CSET344 – Course Evaluation (Tentative)
1. Mid-Semester: 20 marks
2. End-Semester: 40 marks
3. Project Work: 20 marks
1. Presentation and Q&A (Individual Student) : 10 marks
2. Functionality and Working Condition: 10 marks
4. Laboratory Continuous Assessment: 20 marks
5. Programming Environment: All experiments will be conducted using the Python programming language
with OpenCV on the Google Colab platform or Visual Studio Code.
6. Module Coverage:
1. Before the Mid-Semester : Modules 1 and 2 will be completed.
2. After the Mid-Semester: Modules 3 and 4 will be covered.
7. Question Design: All questions will emphasize logical reasoning and problem-solving.

14
EM Spectrum

Refer: https://www.lumitex.com/blog/visible-light-spectrum

15
Image, Intensity or grey level and Pixel.

It’s a two dimensional function f(x , y) where x, y are the spatial coordinate and the amplitude at that particular
coordinate will be the intensity or grey level.
720 1080

x,y
HD 1920
FULL HD
1320
1320 x 720 1920 x 1080

2100

Image

3800
ULTRA 4k
3800 x 2100

16
Type of Images

0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 255 255 255 0
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
R,G,B
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
0 0 0 0 0 0 0 0 0 0

Black and White Image Gray Scale Image Colour Image


Range 0 to 1 Range 0 to 255 Red Channel Range 0 to 255
Green Channel Range 0 to 255
Blue Channel Range 0 to 255

17
CSET344/CMCA544
Image and Video Processing
Module 1
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 13th Jan to 17th Jan 2025
[email protected]
18
Elements of Digital Image Processing Systems

Scanners
Image Acquisition Cameras

Hard Drives
Image Storage Solid State Drives (SSD)

Image Enhancement
Image Restoration
Image Processing Image Segmentation
Image Analysis

Image Image Transmission using


Communication different networks like LAN, WAN

Image Display Monitors


Printers
19
EM Spectrum

Refer: https://www.lumitex.com/blog/visible-light-spectrum

20
Image Sampling and Quantization

The output of the sensor is


continuous voltage waveform
whose amplitude and
coordinates are continuous.

21
Image Sampling and Quantization
Z
Sampling : Discretizing the time axis or coordinate.

Quantization : Discretizing the amplitude axis.

Digital image : Sampling and Quantization

A B

Y A- Image Projected on sensor array.


B – Result of sampling and quantization.
X,Y are Coordinates.
Z is the amplitude.

22
Representation of Digital images

0 0 0 0 0 0 0 0 0 0
0 1 1 1 0 0 255 255 255 0
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
R,G,B
0 0 0 1 0 0 0 0 255 0
0 1 1 1 0 0 255 255 255 0
0 0 0 0 0 0 0 0 0 0

Black and White Image Gray Scale Image Colour Image


Range 0 to 1 Range 0 to 255 Red Channel Range 0 to 255
Green Channel Range 0 to 255
Blue Channel Range 0 to 255

23
Representation of digital image
It’s a two dimensional function f(x , y) where x, y are the spatial coordinate and the amplitude at that particular
coordinate will be the intensity or grey level.

Center of an Image with dimension M x N is obtained


by Dividing the M and N by 2 and rounding it off to
the nearest integer.

24
Basic Terminologies
Pixel (dots) - It’s a smallest unit of a digital image.

0 0 0 0 0 720 1080

0 255 255 255 0


0 0 0 255 0
F(4,4)
HD FULL HD
0 255 255 255 0 1320 1920
1320 x 720 1920 x 1080
0 0 0 255 0
0 255 255 255 0
0 0 0 0 0 2100

3800
ULTRA 4k
3800 x 2100

25
Basic Terminologies
Spatial Domain

Spatial means Location.


An image made by pixels. Each pixel has some intensity values.
A domain, where intensity or magnitude values are reflected based on the location.

Image Storage and Intensity levels

Lets say size of the image is M x N and its a 8-bit image. Total bits required = M x N x k.

Example: M = 2048, N = 2048 and k = 16 bit image

Total no of bits = 2048 x 2048 x 16 = 67108864 bits


67108864/8 = 8388608 bytes (1 byte = 8 bits)
8388608/1024 = 8192 Kbytes (1kb = 1024 bytes)

No. of intensity levels = 2^k. if 8 bit image then 2^8 = 256 intensity levels or Grey levels (0 to 255).

26
Basic Terminologies
Dynamic Range, Contrast and Contrast Ratio

Dynamic Range: Dynamic range of any image processing system is the ratio of maximum measurable intensity
to minimum detectable intensity.

Example : [255, 1] = 20log(255/1) = approx. 48dB

Dynamic range in terms of images or image contrast or contrast ratio: Difference between highest and lowest
intensity levels in an image.

High dynamic range = Bright or Clear image.


Low dynamic range = Image dull

Dots per inch (Dpi), Resolution

Resolution means how many Dots per inch (DPI) or Pixel per inch (PPI). It directly refer to the clarity of the
image. If resolution is high, it means more information or details can be identified in better way.

27
Basic Terminologies

Spatial Resolution

Spatial Resolution: Capability of sensor to distinguish between two closely spaced objects.

Higher Spatial Resolution: Pixel size is small and one can see more details.

Lower Spatial Resolution: Pixel size is big and one can not distinguish between two closely spaced objects.
28
Basic Terminologies
Intensity Resolution

Capability to resolve different intensity or brightness levels or color in color image.

High Intensity Resolution:

Ability to capture wide range of brightness or intensity levels.

For 32 bit image 2^32 = 65536 intensity or grey levels.

Low Intensity Resolution:

Ability to capture small range of brightness or intensity levels.

For 8 bit image 2^8 = 256 intensity or grey levels.

29
Basic Relationship Between Pixels

Four Neighbors of P, N4(P)

(x-1,y)
(x,y-1) p(x,y) (x,y+1) 8 Neighbors of P N8(p) = N4(P) + ND(P)
(x+1,y)
(x-1,y-1) (x-1,y) (x-1,y+1)
(x,y-1) p(x,y) (x+1,y+1)
Four Diagonal Neighbors of P,ND(P)
(x+1,y-1) (x+1,y) (x+1,y+1)
(x-1,y-1) (x-1,y+1)
p(x,y)
(x+1,y-1) (x+1,y+1)

30
Distance Measures

1.Euclidean Distance:

[(x-s)^2 + (y-t)^2]1/2

2.City Block Distance:


P(3,3) Q1(3,4) Q2(3,5)
P(x,y)
|x-s|+|y-t| Q3(4,4)
P(2,2)
Q4(5,5)
Q(s,t)
Q(4,4) 3.Chess Board Distance:
Find the distance between
Max{|x-s|,|y-t|} (P,Q1)
(P,Q2)
(P,Q3)
(P,Q4)

31
Basic Relationship Between Pixels

City Block Distance Chess Board Distance

2 2 2 2 2 2
2 1 2 2 1 1 1 2
2 1 0 1 2 2 1 0 1 2
2 1 2 2 1 1 1 2
2 2 2 2 2 2

32
Image Enhancement in the Spatial Domain

Spatial Domain

Intensity Transformation Spatial Filtering

(x-1,y+1) (x,y+1) (x+1,y+1)


P(x,y) (x-1,y) p(x,y) (x+1,y)
(x-1,y-1) (x,y-1) (x+1,y-1)

It is applied on a single pixel. It is applied on group or Neighborhood of pixels.


For Example – For Example –
Contrast Manipulation, Image Smoothening,
Image Thresholding, etc. Image Sharpening, etc.
33
Intensity Transformation

Different approaches for intensity transformation:


F(X0,Y0)
1.Identity Transform, (S = T(r)), Linear Transformation

2. Image Negative (Linear Transformation)


Intensity transformation are such kind of
approaches where results are depend on the 3. Log Transform and Exponential Transform
intensity at a point.
4. Power law Transformation (Gamma Correction)
S = T(r), where S = output intensity.
T = Transformation function. 5.Piecewise Linear Transformation Function
r = input intensity. 5.1 Contrast Stretching
5.2 Grey scale to binary image using thresholding.

Smallest possible neighbourhood is 1x1, where


output of transformation function depends only on
a single point or single pixel.

34
Intensity Transformation

35
Intensity Transformation: Image Negative
Motivation : These kind of transformation is used to enhance grey level information embedded in the dark region of
an image or it is required when black area is dominant in size as compared to white region.
S=L–1–r
where L = Maximum Intensity Level
r = Input Intensity Level
S = Output Intensity Level
Consider 8-bit (0 - 255). Where L = 255.

r S = L-r S = L-r-1 S = L-r-2 S = L-r-3 S = L-r-4

10
20
30
40
36
Intensity Transformation: Log Transformations
Motivation : These transformation used to expand the dark pixel in an image while compressing the higher-level
values.
S = clog(1+r), where L = Maximum Intensity Level
r = Input Intensity Level
S = Output Intensity Level, C = constant. C = L-1/log(1+rmax)

r S = C log(1+r)
Consider 8-bit image. Where L = 255 and rmax = 255
0
Calculate S value ?
1
5
200
220
240
37
Intensity Transformation: Power Law Transformations (Gamma Correction)
Motivation : Visual quality of image may be hampered by illumination condition or wrong setting of camera sensor.
To rectify the same, one can utilized power law transformation or Gamma Corrections.
Basic idea is to raise the pixel value with certain power to improve the overall brightness and contrast of
the image.

S = c r^y,
where

r = Input Intensity Level


S = Output Intensity Level, C = constant. C = 255 if 8-bit image.
If y< 1
Contrast Increasing
else:
Contrast Decreasing.
38
Intensity Transformation: Power Law Transformations (Gamma Correction)

3 x 3 Input Image

10 200 150
S = c r^y
20 100 90
70 50 220 C = 255
r = image(x,y)/ 255

Note: Considering 8-bit image.

Y=0.5 Y=1 Y=2

39
Piecewise Linear Transformation Function: Contrast Stretching
Motivation : Low Contrast image can result from poor illumination, lack of dynamic range in the imaging sensor
or even thought the wrong setting of a lens aperture during image acquisition.
Contrast stretching expands the intensity range to utilize the full dynamic range of the sensor.

Min-Max Contrast Stretching


S = (r – Imin) x ((Omax-Omin)/(Imax - Imin)) + Omin

S = Output Intensity Level


r = Input Intensity Level
Omax = Maximum Output
Omin = Minimum Output
Imax = Maximum input
Imin = Minimum input

40
Piecewise Linear Transformation Function: Contrast Stretching
S = (r – Imin) x ((Omax-Omin)/(Imax - Imin)) + Omin S = Output Intensity Level
r = Input Intensity Level
Omax = Maximum Output
Omin = Minimum Output
Before Transformation Imax = Maximum input
Imin = Minimum input
10 5 150
20 100 90 Apply Contrast Stretching for r =10
70 50 30
Omax = 255, Omin = 0
After Transformation Imax = 150, Imin = 5

New pixel value S = (5-5)x((255-0)/(150-5)) + 0

New pixel value S = ((150-5)x((255-0)/(150-5))) + 0

41
Piecewise Linear Transformation Function: Thresholding
output Intensity s

If r1 = r2
S1 = 0 and S2 = L-1

Input Intensity r

42
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 20th Jan to 24th Jan 2025
[email protected]
43
Histogram
A histogram is a graphical representation that shows the relationship between gray levels (pixel intensities)
and their corresponding frequencies in an image.
h(rk) = nk for k = 0,1,2,3……L-1
Frequency

Where, nk is the number of pixels with intensity rk.

P(rk) = h(rk) / MN

P(rk) = Probabilities of Intensity levels occurring in an image.


Grey Levels
M,N are row and column of an image.

Sum of P(rk) for all values of k equal to 1.

Application
1. Image Enhancement
2. Image Thresholding
3. Image Segmentation
4. Image Analysis
44
Histogram Examples

45
Histogram Example
Histogram ?
Intensity or Grey Level Frequency Normalized
Histogram
For a 3-bit image and size is 3x 3
1 3 3/9
1 2 6
2 1 1/9
6 1 3
3 1 1/9
1 6 6
6 4 4/9

For a 8-bit image and size is 3x3


150 200 115 Intensity or Grey Level Frequency Normalized
Histogram
200 125 150
250 100 250

46
Histogram Equalization
Its a technique used in image processing to improve the contrast of an image. It works by redistributing the intensity values
of the pixels in an image so that the histogram becomes more uniform.

Sk = T(rk)

Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1

Where –
L = number of possible intensity levels
Sk = output intensity levels
rk = input intensity levels
Why it is required ?

To enhance the contrast of an image, especially when the pixel intensity values are concentrated in a narrow range
(e.g., very dark or very bright images).

It makes details more visible in poorly contrasted images


47
Histogram Equalization

48
Histogram Equalization

49
Histogram Equalization Example
Consider a 3 bit image of size 64x64 (4096) with intensity distribution shown in the
below table. Calculate the Equalized histogram.

rk nk P(rk) Sk Approximate Value Updated nk Updated


of Sk P(rk)
0 790 790/4096
1 1023
2 850
3 656
4 329
5 245
6 122
7 81

Sk = T(rk)
S = L-1x(Pk) = 7x0.19 = 1.33
Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1 50
Histogram Equalization Example
Consider a 3 bit image of size 64x64 with intensity distribution shown in the below
table. Calculate the Equalized histogram.
rk nk P(rk) Sk Approximate Value Updated nk Updated
of Sk P(rk)
0 790 0.19 1.33 1 790 790/4096
1 1023 0.25 3.08 3 1023
2 850 0.21 4.55 5 850
3 656 0.16 5.67 6 656+329 = 985
4 329 0.08 6.23 6
5 245 0.06 6.65 7 245+122+81= 448
6 122 0.03 6.86 7
7 81 0.02 7.00 7

Sk = T(rk)

Sk = L-1σ𝑘𝑗=𝑜 𝑝𝑟 𝑟𝑗 , 𝑘 = 0,1,2 … 𝐿 − 1
51
Concept of Kernal or Filter or Convolution Mask

X1
W1
W1X1 + W2X2

W2
X2

Weights Inputs Weighted Summation


W1 W2 W3 X1 X2 X3 W1X1 W2X2 W3X3
W4 W5 W6 X X4 X5 X6 = W4X4 W5X5 W6X6
W7 W8 W9 X7 X8 X9 W7X7 W8X8 W9X9

52
Spatial Correlation and Convolution: Padding size (M-1)/2 or (N-1)/2
Input Image
0 0 0 0 0 0 0
0 0 0 0 0 Kernel
0 0 0 0 0 0 0
0 0 0 0 0 1 2 3 0 0 0 0 0 0 0
0 0 1 0 0 4 5 6 0 0 0 1 0 0 0
0 0 0 0 0 7 8 9 0 0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

Correlation Convolution
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 9 8 7 0 0 0 0 1 2 3 0 0
0 0 6 5 4 0 0 0 0 4 5 6 0 0
0 0 3 2 1 0 0 0 0 7 8 9 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 53
Smoothing Spatial Filters
Smoothing Spatial Filter

Linear filter Non-Linear Filter or order static filters

Median Filters
Box Car or Mean
Filter
Max Filter
Weighted Avg.
Filter
Min Filter

Gaussian Filters

54
Image Smoothening: Box Car Filter

A filter that computes the average of pixels in the neighborhood blurs an image. Computing an average is analogous
to integration.
Or

A filter that reduces the sharp transition in the intensity called as smoothening or low pass filtering.

Convolving a smoothening kernel with image result in image blurring and the amount of blurring is always depends on
the size of the kernel.
Kernal Normalized Kernal
1 1 1 0.11 0.11 0.11
1/9 x 1 1 1 = 0.11 0.11 0.11
1 1 1 0.11 0.11 0.11

1 1 1 1
1 1 1 1
1/16 x
1 1 1 1
1 1 1 1 55
Image Smoothening: Box Car Filter
Comparison of outputs between Normalized and Non-Normalized Kernel
Image Normalized Kernel
1 2 3 0.11 0.11 0.11
0.11+0.22+0.33+0.44+0.55+0.66+0.77+0.88+0.99 = 4.95
4 5 6 0.11 0.11 0.11
7 8 9 0.11 0.11 0.11

Not Normalized Kernal


1 2 3 1 1 1
1+2+3+4+5+6+7+8+9 = 45
4 5 6 1 1 1
7 8 9 1 1 1

56
Working Example: Padding size (M-1)/2 or (N-1)/2
Input Image
1 2 5 3 4 Kernel
5 6 7 8 9 0.11 0.11 0.11
2 3 4 5 6 0.11 0.11 0.11
3 6 8 4 2 0.11 0.11 0.11
1 5 6 8 7

Limitation of Box Car Filter


Output is Blurred.
Solution
Circularly Symmetric or isotropic kernel.

Weighted Average Filter

1 2 1
1/16 x 2 4 2
1 2 1
57
Image Smoothening: Gaussian Filter

(x-1,y-1) (x-1,y) (x-1,y+1)


(x,y-1) p(x,y) (x+1,y+1)
(x+1,y-1) (x+1,y) (x+1,y+1)
58
Image Smoothening: Gaussian Filter

Calculate the gaussian Kernel for 3x3 Kernel

(x-1,y-1) (x-1,y) (x-1,y+1) -1,-1 -1,0 -1,1


(x,y-1) p(x,y) (x,y+1) 0,-1 0,0 0,1
(x+1,y-1) (x+1,y) (x+1,y+1) 1,-1 1,0 1,1

Consider SD = 1

X=0,y=0 : SD =1 : 1/2pi =

59
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 27th Jan to 31st Jan 2025
[email protected]
60
Image Smoothening: Gaussian Filter

(x-1,y-1) (x-1,y) (x-1,y+1)


(x,y-1) p(x,y) (x,y+1)
(x+1,y-1) (x+1,y) (x+1,y+1)

61
Image Smoothening: Gaussian Filter
Calculate the Gaussian Kernel for size

(x-1,y-1) (x-1,y) (x-1,y+1) -1,-1 -1,0 -1,1


(x,y-1) p(x,y) (x,y+1) 0,-1 0,0 0,1
(x+1,y-1) (x+1,y) (x+1,y+1) 1,-1 1,0 1,1

Sigma =1 Sigma =2 Sigma =3

0.3678 0.6065 0.3678 0.7788 0.8825 0.7788 0.8949 0.9464 0.8949


0.6065 1 0.6065 0.8825 1 0.8825 0.9464 1 0.9464
0.3678 0.6065 0.3678 0.7788 0.8825 0.7788 0.8949 0.9464 0.8949

62
Image Smoothening: Gaussian Filter
Sigma =1 Sigma =2 Sigma =3

0.3678 0.6065 0.3678 0.7788 0.8825 0.7788 0.8949 0.9464 0.8949


0.6065 1 0.6065 0.8825 1 0.8825 0.9464 1 0.9464
0.3678 0.6065 0.3678 0.7788 0.8825 0.7788 0.8949 0.9464 0.8949

Sum = 4.7138 Sum = 7.6452 Sum = 8.3652

After Normalization After Normalization After Normalization

0.0780 0.1286 0.0780 0.1018 0.1154 0.1018 0.1069 0.1131 0.1069


0.1286 0.2121 0.1286 0.1154 0.1308 0.1154 0.1131 0.1195 0.1131
0.0780 0.1286 0.0780 0.1018 0.1154 0.1018 0.1069 0.1131 0.1069

63
Smoothing Spatial Filters
Smoothing Spatial Filter

Linear filter Non-Linear Filter or order static filters

Box Car or Mean Max Filter


Filter

Weighted Avg. Min Filter


Filter

Gaussian Filters Median Filter

64
Max Filter, Min Filter, Median Filter

65
Examples : Max Filter, Min Filter. Assume Kernel size is 3 x 3

Input Image First Step

3 7 17 18 13 0 0 0 0 0 0 0

10 5 2 20 5 0 3 7 17 18 13 0

9 8 13 1 7 0 10 5 2 20 5 0

16 8 7 20 19 0 9 8 13 1 7 0

14 19 3 30 10 0 16 8 7 20 19 0
0 14 19 3 30 10 0
0 0 0 0 0 0 0
Input Image After Padding Zeros
Second Step
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 3 7 17 18 13 0 0 3 7 17 18 13 0
0 10 5 2 20 5 0 0 10 5 2 20 5 0
0 9 8 13 1 7 0 0 9 8 13 1 7 0
0 16 8 7 20 19 0 0 16 8 7 20 19 0
0 14 19 3 30 10 0 0 14 19 3 30 10 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 66
Examples : Max Filter, Min Filter. Assume Kernel size is 3 x 3

Max. Filter Output


10 17 20 20 20
10 17 20 20 20
Input Image After Padding Zeros

0 0 0 0 0 0 0
0 3 7 17 18 13 0
0 10 5 2 20 5 0 Min. Filter Output
0 9 8 13 1 7 0
0 0 0 0 0
0 16 8 7 20 19 0
0 2 1 1 0
0 14 19 3 30 10 0
0 2 1 1 0
0 0 0 0 0 0 0
0 3 1 1 0
0 0 0 0 0

Mid Point Filter = ½(Min filter + Max Filter)


67
Examples : Mean Filter, Median Filter. Assume Kernel size is 3 x 3

Mean Filter Output

Input Image After Padding Zeros

0 0 0 0 0 0 0
0 3 7 17 18 13 0
0 10 5 2 20 5 0
0 9 8 13 1 7 0
0 16 8 7 20 19 0 Median Filter Output
0 14 19 3 30 10 0 0 3 5 5 0
0 0 0 0 0 0 0 5 8 8 13 5

68
Comparison Between Max., Min. and Median Filters output

69
Image Denoising

Original Image Median Filter Output Image

70
Image Denoising

Original Image Median Filter Output Image

71
Edge Detection

Prewitt

Sobel
Edge Detector

Laplacian

Laplacian of Gaussian (LoG)

• High Pass Filters

• Unsharp Masking and High Boost Filtering


72
What is an Edge, Line and Point

Edge: Edge pixels are the pixels at which intensity of an image Changes abruptly.

Line: it may be viewed as a thin edge segment where intensity of background on either side of line is either
much higher or much lower.

Point: It may be viewed as foreground pixel surrounded by background and Vice-Versa. 73


Types of Edge

Averaging or smoothening is analogous to Integration.

Derivatives could able to detect the abrupt change in intensity.


74
Edge

1. Averaging or smoothening is analogous to Integration.

2. Derivatives could able to detect the abrupt change in the intensity.

3. In case of digital Images: Derivative is defined as finite difference

4. Approximation used for the first derivatives:

4.1 It must be zero in the area of constant intensity.


4.2 It must be non zero at the onset and end of an intensity step or ramp.
4.3 It must be non zero at points along an intensity ramp.

5. Approximation used for the Second derivatives:

5.1 It must be zero in the area of constant intensity.


5.2 It must be non zero at the onset and end of an intensity step or ramp.
5.3 It must be zero at points along an intensity ramp.
75
Edge

Black White Black

Sample Image

0 0 0 7 7 7 0 0 0

Step Edge

First Derivative

Second Derivative
Zero Crossing

76
Edge

77
Edge

Edge First Order Second Order


Ramp Non Zero Non zero at onset
(Thick Edge) and end of the ramp
(Thin Edge)
Point Magnitude Less Magnitude High
Line Magnitude Less Magnitude High

1. In case of ramp and step edge both the second


derivative has opposite sign.
2. If second derivative negative : Transition from
light to dark.
3. If second derivative positive : Transition from
dark to light.

78
Edge

Observations

1. First order derivative produces thicker edges.

2. Second order derivative have the stronger response for point, thin lines and noise.

3. Second order derivative produces double edge response at ramp and step transition in

intensity.

4. The sign of second derivative used to determine whether a transition into an edge from light

to dark or dark to light.

79
Edge
Original Image First Derivative Second Derivative

Noise - zero mean and SD = 0.0

Noise - zero mean and SD = 0.1

Noise - zero mean and SD = 1.0


80
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 3rd Feb to 07th Feb2025
[email protected]
81
Types of Edge

Averaging or smoothening is analogous to Integration.

Derivatives could able to detect the abrupt change in intensity.


82
Edge Detection Image Gradient and its properties

Edge : Magnitude and Direction.

Image Gradient is a tool to identify edge Magnitude and Direction.

Gradient Vector

Magnitude of Gradient Vector

Direction of Gradient Vector

83
Edge Detection Image Gradient and its properties

Edge : An Abrupt or Sudden change in intensity which will help to identify the edges.

Derivatives could able to detect the abrupt change in the intensity.

Horizontal Edge Vertical Edge Right or Principal Diagonal Left Diagonal

10 10 10 10 0 255 0 10 10 255 255 0


0 0 0 10 0 255 255 0 10 255 0 10
255 255 255 10 0 255 255 255 0 0 10 10
84
Edge Detection Image Gradient and its properties

0 1 1 Z1 Z2 Z3
0 0 1 Z4 Z5 Z6
0 0 0 Z7 Z8 Z9

85
Edge Detectors

Prewitt

Sobel
Edge Detector

Laplacian

Laplacian of Gaussian (LoG)

• High Pass Filters

• Unsharp Masking and High Boost Filtering


86
Prewitt and Sobel Edge Detectors
Prewitt Edge Detector

-1 -1 -1 -1 0 1

0 0 0 -1 0 1

1 1 1 -1 0 1

Sobel Edge Detector

-1 -2 -1 -1 0 1

0 0 0 -2 0 2

1 2 1 -1 0 1

Note: Sum of all kernel coefficient is equal to zero. 87


Prewitt and Sobel Edge Detectors example
Find the edge
Prewitt Edge Detector 50 50 50 150 50 50 50
-1 -1 -1 -1 0 1 50 50 50 150 50 50 50
50 50 50 150 50 50 50
0 0 0 -1 0 1 50 50 50 150 50 50 50
50 50 50 150 50 50 50
1 1 1 -1 0 1
50 50 50 150 50 50 50
50 50 50 150 50 50 50

Sobel Edge Detector


50 50 150

-1 -2 -1 -1 0 1 50 50 150

0 0 0 -2 0 2 50 50 150

1 2 1 -1 0 1
Note: Sobel operator has better noise
suppression capability as compared to
Prewitt. 88
Sobel Edge Detectors
Original Image Gradient in x direction

Gradient Angle Image


Note: It generally not contain too much
information but it complement the
information extracted from image
gradient.
Gradient in y direction Gradient Image
89
Smooth image by 5 by 5 kernel and then apply Sobel Edge Detectors
Original Image Gradient in x direction

Gradient in y direction Gradient Image 90


Combining Gradient with Thresholding
Thresholded image
if pixel value > = 33% of the maximum value of gradient image

Output : White pixel


else:
Output: Black Pixel

Thresholding
helps to avoid
the Minor Edges.

Threshold applied on Gradient image Threshold applied on Smoothed Gradient image

Note: When interest lies in high lightening the principal edge and maintaining as much connectivity
as possible, it is common practice to use both smoothening and thresholding.

91
Laplacian of Gaussian (LoG)
Limitation of Prewitt and Sobel Operators

1. They only able to detect the edges in horizontal and vertical direction.
2. For different type of images kernel size will be different.

Silent Features required for any Edge Detectors

1.Second derivative could able to localize edges in the better way as compared to the first derivative.

2.Preprocessing is required before applying edge detector.

3. Edge detector should be isotropic i.e. it could detect edges in all the directions.

4.Larger Kernel should able to detect blurry edges and small kernel could able to detect the sharply focused fine
details.

LoG : Gaussian function first smooth the image and Laplacian will find the second derivative of gaussian
function which directly helps to locate the edge.

92
Laplacian of Gaussian (LoG)
LoG : Gaussian function first smooth the image and Laplacian
will find the second derivative of gaussian function which
directly helps to locate the edge.

Laplacian Function

Laplacian of Gaussian

93
Laplacian of Gaussian (LoG)

10 10 10 10 10
10 10 10 10 10 -1 -1 -1
100 100 100 100 100 -1 8 -1
100 100 100 100 100 -1 -1 -1
10 10 10 10 10
3 by 3 Kernel
Image

Output Image

94
Laplacian of Gaussian (LoG)

10 10 10 10 10
10 10 10 10 10 -1 -1 -1
100 100 100 100 100 -1 8 -1
100 100 100 100 100 -1 -1 -1
10 10 10 10 10
3 by 3 Kernel
Image

Transition from –ve to +ve, it


-220 -220 -220 directly shows the zero crossing
500 500 500 i.e. Edge Location.
500 500 500

95
CSET344/CMCA544
Image and Video Processing
Module 2
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 10th Feb to 14th Feb 2025
[email protected]
96
Canny Edge Detector
Requirements of a good edge detector Change Direction

1. Good Detection: Find all the real edges and ignoring noise or artifacts.

2. Good Localization: Detect edges as close as possible to true edges. NMS

3. Single Response: Return one point only for each true edge point.

Steps in Canny Edge Detector.


Edge
1. Smooth the image to reduce the noise. (Gaussian Smoothening)

2. Gradient or Magnitude (M) Calculation. Angle calculation.

3. Apply Non-Maximum Suppression. (NMS)

Gradient Magnitude
High Threshold
(H)
4. Hysteresis Thresholding.
If M > H: Low Threshold
(L)
Edge
elif L< M<H:
Check is it connected with strong or weak edge
else:
No edge 97
Canny Edge Detector Example

Original Image Thresholded Image Original Image Thresholded Image

Laplacian of Gaussian Canny Edge Detector Laplacian of Gaussian Canny Edge Detector
98
Comparison of all Edge Detector

Laplacian of
Feature Prewitt Sobel Canny
Gaussian (LOG)
Weighted difference (more Second derivative (finds zero Optimized gradient &
Gradient Operator Simple difference
robust) crossings) sophisticated criteria

Noise Sensitivity High Moderate High (but Gaussian blur helps) Low (due to Gaussian blur)

Edge Localization Less precise More precise Moderate Very precise


More (but zero-crossing
Spurious Responses More Fewer Fewer
helps)
Edge Thickness Thicker Thinner Thinner Very thin

Computational Cost Low Slightly higher than Prewitt Higher (due to convolution) Highest

More complex (Gaussian + More complex (multiple


Implementation Simple Relatively simple
Laplacian) stages)

Simple image analysis, basic General-purpose edge Images with low noise, blob High-quality edge detection,
Typical Use Cases
edge detection detection detection computer vision

99
Harris Corner Detector

Area of Significant changes in


all the directions.

Applications of the Harris Corner Detector

•Image Stitching: Finding corresponding corners in multiple images to align and stitch them together.

•Object Tracking: Tracking objects in videos by identifying and following their corners.

•3D Reconstruction: Using corners as features to reconstruct 3D models from multiple images.
100
Harris Corner Detector

101
Here M is Harris Matrix
Harris Corner Detector

0 0 1 4 9
1 0 5 7 11
1 4 9 12 16
3 8 11 14 16
8 10 15 16 20

Change in X direction
-1 0 1
-1
Change in Y direction 0
1

K is sensitivity factor; small value of K is better to find the corners. 102


Harris Corner Detector

4 7 6 4 8 8
8 8 7 8 6 7
8 6 5 6 6 4

Ix = 4^2+7^2+6^2+8^2+8^2+8^2+7^2+8^2+6^2+5^2 = 403

Iy = 4^2+8^2+8^2+8^2+6^2+7^2+6^2+6^2+4^2 = 381

Ix * Iy = 4*4, 7*8……………………… = 385 If R –ve then edge.


If R small then constant region.
401 385 If R is large then Corner
Harris Matrix (H ) or M =
385 381 Note: If k is small then it will detect the corner
in the better way.

Det (H) – k * (Trace)^2


401 x 381 – (385)^2 - 0.04 x (785)^2 = -19268.24
103
Hough Transforms: Line Detection
Motivation: Edge Linking, Line Detection, Circle Detection and other generalized shape detection

Let's consider these two points (1,2) and (2,3)

How to connect them ?

Edges are present then


how to link them.

So many lines can pass A line contains the so


through a single point. many points.

104
Hough Transforms: Line Detection
Point in Image Space will be a line in Parameter Space.
c
y Line from a non colinear point.

x m
Image Space Parameter Space c = -mx+y
Y = mx+c
m = slope
c = Intercept

Image Space Parameter Space 105


Hough Transforms: Line Detection
c Line Detection Algorithm
0 0 0 0 0
1. Quantize the parameter space.
3 0 0 0 0 0
2 0 0 0 0 0 2. Create an accumulator array A(m,c).
0 0 0 0 0
1 3. Set A(m,c) = 0, for all (m,c).
0 0 0 0 0
m 4. Fill the accumulator array and count
Original Image and 3 colinear Quantize the parameter space the frequency.
points and fill the zeros.

1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1
0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0
0 0 1 0 0 0 0 1 0 0 1 1 1 1 1 1 1 3 1 1
0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1
106
m
Hough Transforms: Line Detection:
Working Example:

c
Point (1,2) and (2,3) are they colinear ?

Parameter space : c = -mx+y

For (1,2) –

C = -m+2, if c=0 then m =2 1


if m=0 then c =2 output (2,2)

For (2,3) 1 1.5,3 m


2,2
C = -2m+3, if c=0 then m = 1.5
if m=0 then c = 3 output (1.5,3)
Y = mx + c
Y = x+1

107
Hough Transforms: Line Detection: Polar form
Issues:

1. Slope -∞ < m < ∞

2. It directly makes the accumulator array very large.

3.It required more memory and computation cost.

Solution:
Polar form. Ro (rho) is distance from origin and theta is the angle for the same.

108
Hough Transforms: Circle Detection
y b

x a
Image Plane Parameter Plane

For each circle point in the image plane their will be a Accumulator Array filing will be circular
circle in the parameter space. b
0 1 1 1 0
(x-a)^2 + (y-b)^2 = r^2 1 0 0 0 1
1 0 3 0 1
a,b = centre of the circle.
r = radius of the circle. 1 0 0 0 1
1 1 1 1 0
If r is known then parameter space contains 2
parameters a and b. a
109
Problems with Hough Transform

1.Computational Cost:

2.Parameter Sensitivity:

3.Memory Usage

4.Loss of Positional Accuracy: Handling of Complex Scenes:

5.Requirement for Preprocessing

110
Morphological Operations
1. Morphological operations are a set of techniques used in image processing to analyze and modify the shape and
structure of objects within an image.
2. Morphological operations are defined in terms of sets. Specifically in Image Processing, Morphology uses two types of
set of pixels. Structuring element can be specified in terms of both foreground and background pixels.

111
Dilation
Dilation generally grow or increase thickness. Full match = 1
Partial match = 1
No match = 0

For Example

0 0 0 0 0 0 0 0 1 1 0 0
0 0 1 1 0 0 1 0 1 1 1 1 0
0 1 1 1 1 0 1 0 1 1 1 1 0
0 0 1 1 0 0 1 0 1 1 1 1 0
0 0 0 0 0 0 0 0 1 1 0 0

Input Image Structuring Element Output Image

112
Dilation Examples

Application of dilation

1. Filling holes in connected components.


2. Feature Enhancement

113
Erosion
Erosion generally shrink or decrease thickness. Full match = 1
Partial match = 0
No match = 0

For Example

1 1 1 1 1 1 0 0 0 0 0 0
1 1 0 0 1 1 1 1 0 0 0 0 1
1 0 0 0 0 1 1 1 0 0 0 0 1
1 1 0 0 1 1 1 1 0 0 0 0 1
1 1 1 1 1 1 0 0 0 0 0 0

Input Image Structuring Element Output Image

114
Erosion Examples

Application of erosion

1. Noise Removal.
2. Object Separation and feature extraction
115
Opening
Erosion followed by Dilation

0 0 0 0 0 0 0
0 1 0 0 0 1 0
For Example 0 0 0 0 0 0 0

Erosion
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1
1 1 1 0 1 1 1 1 1 1
1 1 1 0 1 1 1
Input Image Structuring Element 1 1 1 0 1 1 1

Erosion followed by Dilation


Application of erosion

1. Small object elimination.


2. Smoothening Contours.
3. Finding the holes or gaps.
116
Closing
Dilation followed by Erosion.
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
For Example 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1 1 1
Dilation
1 1 1 0 1 1 1 1 1
1 1 1 0 1 1 1 1 1
1 1 1 1 1 1

Input Image 1 1 1 1 1 1
Structuring Element 1 1 1 1 1 1
Application of erosion Dilation followed by Erosion

1. Filling small holes and gaps and Noise


reduction.

117
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 17th Feb to 21st Feb 2025
[email protected]
118
Hit or Miss Transform – (HMT)

1. Hit or Miss Transform is a basic tool for shape detection.

2. HMT utilizes two structuring elements.

B1 - for detecting shapes in the foreground.

B2 - for detecting shapes in the background.

119
Hit or Miss Transform – (HMT)

Detecting a shape D

Structuring element
completely fits on
several locations.

Background with one pixel


thick foreground. Structuring element fits on
one location.

120
Hit or Miss Transform – (HMT)
Q. Instead of going for two structuring elements. Can we detect the same shape by using a single structuring element ?

Solution. Use a structuring element B which is exactly same shape as D with additional border of background elements
with a width of one pixel thick.

121
Hit or Miss Transform – (HMT)
One Structuring Element

Detection of single pixel thick hole.

Detection of an upper right corner.

Detection of Multiple Features.

In this case, X is nothing but don’t care. It


may take value 0 or 1. Here don’t care
value 1 is considered.
122
Basic Morphological Algorithms
Boundary Extraction – First Erode the original image with a Structuring element and then perform a set
difference between A and its eroded output.

Pixel Value 1 Pixel Value 0 123


Basic Morphological Algorithms
Hole Filling – A hole is defined as background region surrounded by a connected border of foreground pixels. An
algorithm is developed based on dilation, complementation and Intersection.

124
Basic Morphological Algorithms
X0 X1 X2 X3

1 1 1 1 1 1 1
1 1 1 1 1
1

1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1

X6 X7 X8 X8 U A 125
Basic Morphological Algorithms
Connected Components – It generally used to extract all the connected components in original image A.

Here Bones are


the foreign objects
inside chicken
Comparisons of all Morphological Operations.
Morphological Applications
operations
Dilation Medical Imaging – Enhance the Visibility of Blood Vessels.
Industrial Inspection – Detect Defects or Irregularity in goods.

Erosion Shrinking - Make the objects much thinner or highlight the core part of object.
Remove Noise – Eliminates noise along the edge of the objects.

Opening Image Cleaning/ Noise Removal


Object Isolation

Closing Analyzing Satellite Imagery


Scientists use satellite images to study changes in land cover, such as deforestation or urban
growth. Sometimes, clouds or shadows can obscure parts of the image, creating gaps in the
data.

Hit or Miss Object Localization.


Texture analysis
Texture is a repeating pattern of local variation in image intensity.

Texture Analysis
Statistical Approaches or Spectral Approaches or
Techniques Techniques

It is generally based on the property of


Fourier spectrum.

(a) Smooth (b) Coarse (c) Regular


Characterization of texture as smooth, coarse and grainy.
Texture From Histogram
Grey Level Co-occurrence Matrix (GLCM)
Problems with texture from histogram –
Measure of texture computed using histogram-based method carries no information regarding to spatial relationship between
pixels.

1. Total Levels present in the input image L = 8. So


the Size of GLCM is 8 by 8.

2. Look for one pixel immediately right.

3. Check the co-occurrence of each pair and fill the


GLCM Matrix.

4. Calculate different statistical descriptor based on


the requirement's.

1 1
5 1
Grey Level Co-occurrence Matrix (GLCM)

mr and mc are mean computed along row and


column.
GLCM Example

Construct a GLCM matrix for the below


matrix

Look for one pixel immediately right.

1 7 2 1
2 3 4 5
7 2 1 7
8 2 1 7
GLCM Example
Construct a GLCM matrix for the below matrix

Look for one pixel immediately right.


Total 8 intensity Levels – Size of GLCM is 8 by 8.

1 7 2 1
2 3 4 5
7 2 1 7 1 2 3 4 5 6 7 8
8 2 1 7 1 3
2 3 1
3 1
4 1
5
6
7 2
8 1
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 17th Feb to 21st Feb 2025
[email protected]
134
Color Image Processing
Color Image Processing

Pseudo-Color Processing Full – Color Processing

Assigning Colors to grey scale intensity Image Acquired using a full color sensor like
or range of intensities. digital camera or color scanner.
Color Fundamentals

1. Basically the color human and other animal perceive in an object is determined by the nature of the light
reflected from the target.

2. Chromatic Light spans the electromagnetic spectrum from approximately 400nm to 700nm.

Three basic quantities used to describe the quality of chromatic light source are :
Radiance.
Luminance.
Brightness.
Color Fundamentals
Radiance: Total amount of energy flows from a light source. It is measured in Watt.

Luminance: Total amount of energy that an observer perceive from the light source. It is measured
in lumens (lm). For example infrared.

Brightness: it’s a subjective descriptor and difficult to measure.

Cones are the sensors in the human eye responsible for color vision.

Human eye is divided into three principle sensing categories.

65% - of all cones are sensitive to RED.

33% - of all cones are sensitive to GREEN.

2% - of cones are sensitive to BLUE.


Color Fundamentals

Because of these absorption characteristics,


human eye sees colors as variable combinations of
so called primary colors: Red, Green and Blue.

Absorption of Light By red green and Blue cones in the human


eye as a function of wavelength.
Color Fundamentals
Hue: Dominant color perceived by an observer.

Primary and secondary color of light and pigments.


Color Fundamentals
Saturation: it refers to the purity of color.

Example :
High Saturation: Lets say, if you are using pure red paint then its fully
saturated.

Low Saturation: If you are gradually mixing white paint inside the red then it
changed from pure red to faded red or pink.

Degree of saturation inversely proportional to the amount of white light


added.

Brightness : it always refer how light and dark color will appear.

High Brightness: if blue is very light almost white.

Low Brightness: If blue is very dark almost black.


Color Fundamentals
Chromaticity: Hue and saturation are taken together is called chromaticity.

Any color is characterized by its brightness and chromaticity.

Tristimulus Values – The amount of red green and blue required to form any particular color is known as tristimulus values
and it is denoted by X, Y, Z.

x,y,z are trichromatic coefficient.


Color Models
Requirement of color Models: To facilitate the specification of color in some standard way.

1. It should follow some coordinate system.

2. There should be a subspace within system, such that each color in the model is represented by a single point contained
in it.

Color Model Application Domain(s) Key Characteristics


Computer displays, digital cameras, Additive color model (combining light);
RGB
image scanners device-dependent
Older printing technologies (less Subtractive color model (absorbing
CMY
common now) light); device-dependent
Subtractive color model; handles black
CMYK Color printing (magazines, books, etc.)
ink; device-dependent
Image editing, color analysis, computer Hue, Saturation, Intensity; more
HSI
vision intuitive for color adjustments
Hue, Saturation, Value; similar to HSI,
Image editing, color picking, computer
HSV slightly different brightness
graphics
representation
Separates luminance (brightness) from
Digital video (DVDs, Blu-ray, digital TV),
YCbCr chrominance (color) for efficient
JPEG compression
compression
Color Models: RGB
Color Models RGB 1. RGB model based on cartesian coordinate system.

2. Primary colors R,G, and B are at the three corners.

3. Secondary colors like Cyan, Magenta and yellow at the


other corners.

4. Black at the origin and white at the corner which is


farthest away from the origin.

5. Assumption is that all R,G and B values are normalized


between 0 to 1.

6. Pixel Depth – No. of bits used to represent each pixel in


RGB space. Each RGB color pixel has a depth of 24bits.

7. let's say it’s an 8-bit image, then the limits of cube


along each axis becomes [0,255].

for example : White will be at a point [255,255,255]


RGB Color Cube
Color Models: HSV
Angle (in degree) Color
0-60 Red
61-120 Yellow
121-180 Green
181-240 Cyan
241-300 Blue
301-360 Magenta

This Color Model is closer to how human perceive the color.

1. Hue: It is a color portion of the model, expressed as a


number between 0 to 360 degree.

2. Saturation ranges between 0 to 1.


0 represent No color.
1 represent the primary color.

3. It represent the brightness 0 means completely black 100


is the brightest.
Color Models: YcbCr
It is most widely used in digital video and photography
systems, including DVD, Blue-ray disc, digital TV and JPEG
images.

Y (Luma) - It represent the brightness or luminance of the


image.

Cb (Blue difference chroma) – It represent the difference


between blue color and the luma.

Cr (Red difference chroma) – it is difference between red color


and chroma.

YCbCr – it allows better compression of image and video data.


CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 24th Feb to 28st Feb 2025
[email protected]
146
Pseudo Color Image Processing Pseudo color or false color image processing

Objective: Assigning colors to grey scale values based on a specified criteria.

Motivation: Human can differentiate between different colors and intensities as compared to different shades of grey.

Grey Scale Image

Color Image
Pseudo Image Processing
Intensity Slicing and Color Coding

1. Any pixel whose intensity level above the plane will be


coded in one color and any pixel below the plane will
be coded with another color.

2. Intensity levels those are lies on the plane assigning


either of the color or the third color.
Pseudo Image Processing
Intensity Slicing and Color Coding

Grey Scale Image Intensity Slicing using 8 Color coded image of the weld,
colors X-ray image of the weld where one color assign to 255
intensity levels and another color
to all other intensity levels.
Regions that appear of constant intensity in grey
scale is quite variable in the color sliced image.
It will make the quality inspector job easy, as a result
low error rate.
Pseudo Image Processing
Intensity to color transformations : it is better than the simple slicing techniques.

Apply three independent transformations on the intensity


of input pixels.

These three results are then fed separately to red, green


and blue channels of color monitors.

The output will be a composite image whose color content


is modulated by the nature of the transformation function.
Pseudo Image Processing
Bag without explosive or with explosive.
Previous transformation applied on single grey scale
image.

The below approach is combining several grey scale


image into single color composite.
Example – Multispectral image processing

Applied different Applied Similar


transformation transformation
function function
Pseudo Image Processing
Example – Multispectral image processing

Red Green Blue

NIR of Landsat RGB Color Composite using IR,G,B RGB Color composite using R,IR,B
Basics of Full Color Image Processing
Full Color Image Processing

Process each Grey scale color Process or work with the color pixel
image components individually directly.
and then form a composite color
image
Color Transformations
Tone Correction

Image Negative

Color Complements
CSET344/CMCA544
Image and Video Processing
Module 3
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 17th March to 21st March 2025
[email protected]
155
Image Enhancement in the Frequency Domain
0. Define Transform ?

1. Frequency domain Fundamentals.

2. 1-D and 2-D Fourier Transform and its inverse

3. Frequency domain filtering fundamentals

4. Ideal Low Pass, Butterworth and Gaussian Filters.

5. Ideal High Pass, Butterworth and Gaussian Filters.

6. Homomorphic Filtering

7. Selective filters.

8. Correspondence between filtering in the spatial domain and frequency domain.

156
Frequency Domain Fundamentals
1. Fourier Series: Any periodic function can be
expressed as a sum of sines and/or cosines of
different frequencies, each multiplied by a different
co-efficient.

Where –
A function f(t) of a continuous variable, t that is
periodic with period T.

This function is the sum of all the four


above functions
157
Frequency Domain Fundamentals
2. Fourier Transform : Functions those are not periodic (But whose area under the curve is finite) can be expressed as
the integrals of sines and/or cosines of different frequencies each multiplied by a weighting function.

Fourier Transform Formula Inverse Fourier Transform Formula

f(t) is a continuous function.

It’s a Fourier transform pair which indicate forward and Inverse Fourier transform.

158
Frequency Domain Fundamentals
Working Example

Note: A Functions that is


expressed by either Fourier
series or Fourier Transform
could be reconstructed or
recovered completely via
an inverse process.

(a) A Box function (b) Fourier Transform (c) Spectrum


159
Frequency Domain Fundamentals
Sampling: A continuous functions have to be converted into the sequence of discrete values before they processed in
the computers.

(a) Continuous Function

(b) Train of Impulses used to model sampling.

(c) Sampled function formed as product of a and b.

160
Frequency Domain Fundamentals
Sampling Theorem: A continuous bandlimited signal can be recovered
completely from the set of its samples if the samples are acquired at a Example -

rate exceeding twice the highest frequency content of the function. m(t) = sin2Πt + sin3Πt + sin4Πt

Fs > 2fm (Oversampled)


W = 2Π
Fs = 2fm (Perfect sampling or Nyquist rate)
2Πf = 2Π
A sampling rate exactly equal to twice the highest frequency is called
the Nyquist rate. f=1

Fs < 2fm (Under sampling or aliasing effect) Similarly f = 1.5 and 2 for other two
cases respectively.
Where, Fs = sampling frequency and fm = highest frequency present in
the signal

Nyquist Frequency = Nyquist rate / 2

161
Frequency Domain Fundamentals
Aliasing: It’s a phenomena where different signals are indistinguishable from Anti Aliasing :
Aliasing can be reduced by smoothening (Low
one another after sampling. Pass filter) input function to attenuate the higher
Fs < 2fm (Under sampling or aliasing effect) frequencies.

This process has to be done before the function


is sampled because aliasing is an sampling issue
that cannot be “undone after the fact” using
computational techniques.

Two different function but their digitization is same.


162
1-D DFT and Inverse DFT working example

163
1-D DFT and Inverse DFT working example

164
Frequency Domain Filters

Frequency Domain Filters

Low Pass Filters (LPF) High Pass Filters (HPF) Homomorphic Filtering Selective Filtering

Ideal LPF Ideal HPF Band Pass Filtering

Butterworth LPF Butterworth HPF Band Reject Filtering

Gaussian LPF Gaussian HPF Notch Filtering

165
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)
A 2D filter which passes all the frequencies within a circle of radius from the origin and cut off or attenuate all the
frequencies which are outside to this circle.

It is specified by the transfer function H(u,v)

Where - Do = Positive Constant


D(u,v) = distance between a point (u , v) in the frequency domain and the center of the P x Q frequency rectangle

166
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)

Cut off Frequency

(a) Ideal LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section

167
Test Pattern image Circle with radi 10,30,60,160,460
Frequency Domain Filters: Ideal Low Pass Filter (ILPF)
a b c

a – Original Image
b – ILPF with cut off Frequency set at
radii value - 10
c – ILPF with cut off Frequency set at
radii value - 30
d – ILPF with cut off Frequency set at
radii value - 60
e – ILPF with cut off Frequency set at
radii value - 160
f – ILPF with cut off Frequency set at
radii value - 460

d e f
168
Frequency Domain Filters: Gaussian Low Pass Filter (GLPF)
It is specified by the transfer function H(u,v)

(a) Gaussian LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section with various value of D0

169
Frequency Domain Filters: Gaussian Low Pass Filter (GLPF)
a b c

a – Original Image
b – GLPF with cut off Frequency set at
radii value - 10
c – GLPF with cut off Frequency set at
radii value - 30
d – GLPF with cut off Frequency set at
radii value - 60
e – GLPF with cut off Frequency set at
radii value - 160
f – GLPF with cut off Frequency set at
radii value - 460

d e f
170
Frequency Domain Filters: Butterworth Low Pass Filter (BLPF)
It is specified by the transfer function H(u,v)

(a) Gaussian LPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section with orders 1 to 4.

171
Frequency Domain Filters: Butterworth Low Pass Filter (BLPF)
a b c

a – Original Image
b – BLPF with cut off Frequency set at
radii value - 10
c – BLPF with cut off Frequency set at
radii value - 30
d – BLPF with cut off Frequency set at
radii value - 60
e – BLPF with cut off Frequency set at
radii value - 160
f – BLPF with cut off Frequency set at
radii value - 460

d e f
172
Comparative Analysis Between ILPF, GLBF and BLPF

The shape of the Butterworth filter is controlled by the filter order.

If n is large then Butterworth filter approaches to ILPF.


if n is small then Butterworth filter approaches to GLPF.
173
Frequency Domain Filters: Image Sharpening using High Pass Filter
High Pass Filter

Image Sharpening can be achieved in the frequency domain by passing the high frequency components (i.e Edges
or other sharp transitions) and attenuate the low frequency components.

Where

D0 = Cut off frequency

n = order
174
Frequency Domain Filters: Image Sharpening using High PF

Transfer Function (a) HPF Transfer function Plot (b) Function displayed as an image (c) Radial Cross Section
175
Frequency Domain Filters: Image Sharpening using High PF
a b c

Filtered with (a) IHPF, (b) GHPF, (c) BHPF with D0 =60

Filtered with (d) IHPF, (e) GHPF, (f) BHPF with D0 =160

d e f
176
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 24th March to 28th March 2025
[email protected]
177
Image Enhancement in the Frequency Domain
0. Define Transform ?

1. Frequency domain Fundamentals.

2. 1-D and 2-D Fourier Transform and its inverse

3. Frequency domain filtering fundamentals

4. Ideal Low Pass, Butterworth and Gaussian Filters.

5. Ideal High Pass, Butterworth and Gaussian Filters.

6. Homomorphic Filtering

7. Correspondence between filtering in the spatial domain and frequency domain.

178
Frequency Domain Filters: Homomorphic Filtering
Objective – Overall objective is to separate illumination and reflectance components to manipulate them
independently.

It used to correct uneven illumination and simultaneously enhance contrast.

Flow diagram of homomorphic filtering

Transfer function of homomorphic filtering


179
Frequency Domain Filters: Homomorphic Filtering

Radial Cross Section of a homographic filter


transfer function Original Image Image Enhanced using homomorphic filtering

180
Frequency Domain Filters: Working Examples

1-D Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT)

DFT

IDFT

181
1-D DFT and Inverse DFT working example

182
1-D DFT and Inverse DFT working example

183
Spatial domain and Frequency Domain Filters Analogy

Spatial Domain Frequency Domain

Transfer Function Transfer Function


F(x,y) G(x,y) F(x,y) G(x,y)
H(x,y) H(x,y)

Convolution Multiplication

G(x,y) = F(x,y) convolution H(x,y) G(x,y) = F(x,y) Multiplication H(x,y)

184
Frequency domain filtering : Flow Chart

3 4 5

Filter Function
Fourier Transform Inverse Fourier Transform
H(u,v)
F(u,v)

Pre-Processing 2 F(u,v) H(u,v) 6 Post-Processing

F(x,y) G(x,y)
Input Image Enhanced Image
1 7

185
Frequency Domain Filters: Working Examples
Problem Question
Input Image

For the spatial domain image, perform the frequency domain 1 0 1 0


filtering using Ideal High pass filter with cut off frequency 0.5. 1 0 1 0
1 0 1 0
1 0 1 0

Step 1: Multiply the input image by (-1)^x+y to shift the Centre from (0,0) to (2,2).

0,0 0,1 0,2 0,3 0,0 0,1 0,2 0,3


1,0 1,1 1,2 1,3 1,0 1,1 1,2 1,3 Note:
Input image is 4x4. so the centre is row/2,
2,0 2,1 2,2 2,3 2,0 2,1 2,2 2,3 column/2 i.e 4/2 = 2, 4/2 = 2.
3,0 3,1 3,2 3,3 3,0 3,1 3,2 3,3

186
Frequency Domain Filters: Working Examples
(-1)^0,0 = 1, (-1)^0,1 = -1 (-1)^0,2 = 1 (-1)^0,3 = -1
0,0 0,1 0,2 0,3
(-1)^1,0 = -1, (-1)^1,1 = 1 (-1)^1,2 = -1 (-1)^1,3 = 1
1,0 1,1 1,2 1,3
2,0 2,1 2,2 2,3 (-1)^2,0 = 1, (-1)^2,1 = -1 (-1)^2,2 = 1 (-1)^2,3 = -1
3,0 3,1 3,2 3,3
(-1)^3,0 = -1, (-1)^3,1 = 1 (-1)^3,2 = -1 (-1)^3,3 = 1

Input Image (-1)^(x+y)

1 0 1 0 1 -1 1 -1 1 0 1 0
1 0 1 0 -1 1 -1 1 -1 0 -1 0
1 0 1 0 X 1 -1 1 -1 = 1 0 1 0
1 0 1 0 -1 1 -1 1 -1 0 -1 0

Note: Its pixel-to-pixel multiplication NOT a matrix multiplication.


187
Frequency Domain Filters: Working Examples
Step 2: Compute the DFT of the image.

DFT = Kernel x f(x,y) x KernelT

Kernel Input Image Kernel Transpose

1 1 1 1 1 0 1 0 1 1 1 1
1 -j -1 j -1 0 -1 0 1 -j -1 j
X X =
1 -1 1 -1 1 0 1 0 1 -1 1 -1
1 j -1 -j -1 0 -1 0 1 j -1 -j

0 0 0 0
0 0 0 0
DFT =
16 0 16 0
0 0 0 0

Note: Matrix multiplication. Consider only real values. 188


Frequency Domain Filters: Working Examples
Step 2: Compute the distance between each value and the centre.

(𝑥 − 𝑢)2 + (𝑦 − 𝑣)2 u, v = 2,2

0,0 0,1 0,2 0,3 2.82 2.23 2 2.23


1,0 1,1 1,2 1,3 D(u,v) = 2.23 1.14 1 1.41
2,0 2,1 2,2 2,3 2 1 0 1
3,0 3,1 3,2 3,3 2.23 1.41 1 1.41

Filter Function H(u,v)

1 1 1 1
H(u,v) = 1 1 1 1 Note: IHPF – Any value greater than 0.5 will be 1
else is 0.
D0 = 0.5 1 1 0 1
1 1 1 1 189
Frequency Domain Filters: Working Examples
Step 3,4: G(u,v) = F(u,v) x H(u,v)

0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0
X =
16 0 16 0 1 1 0 1 16 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0

Note: Its pixel-to-pixel multiplication not a matrix multiplication.

190
Frequency Domain Filters: Working Examples
Step 5: Compute the IDFT of the image.

DFT = 1/4 (Kernel) x f(x,y) x 1/4(KernelT )

Input Image

1 1 1 1 0 0 0 0 1 1 1 1
1 j -1 -j 0 0 0 0 1 j -1 -j
X X =
1 -1 1 -1 16 0 0 0 1 -1 1 -1
1 -j -1 j 0 0 0 0 1 -j -1 j

16 16 16 16 1 1 1 1
-16 -16 -16 -16 = -1 -1 -1 -1
1/16 16 16 16 16 1 1 1 1
-16 -16 -16 -16 -1 -1 -1 -1

Note: Matrix multiplication. 191


Frequency Domain Filters: Working Examples
Step 6: Multiply the output image by (-1)^x+y to shift the Centre from (2,2) to (0,0)

(-1)^(x+y)

1 1 1 1 1 -1 1 -1
-1 -1 -1 -1 -1 1 -1 1
X
1 1 1 1 1 -1 1 -1
-1 -1 -1 -1 -1 1 -1 1 =

1 -1 1 -1
1 -1 1 -1
Final Output =
1 -1 1 -1
1 -1 1 -1

Note: Its pixel-to-pixel multiplication NOT a matrix multiplication. 192


CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 24th March to 28th March 2025
[email protected]
193
Image Compression
1. Image Compression Fundamentals.

2. Lossless Compression Models

3. Run Length Encoding

4. Huffman Coding

5. Lossy Compression

6. Discrete Cosine Transform

7. Zigzag Coding

8. Color Image Compression

9. Text Recognition : OCR

10. Feature detection

11. Integral Image Formation 194


Image Compression Fundamentals

Key Points

1. Compression and Decompression are virtually invisible to the user.

2. Data Compression is a process of reducing the amount of data required to represent given quantity of
information.

Here, the data and information both are different.

Data are the means by which information is conveyed. Data may be redundant.

3. Redundant data – If some amount of data contains the same information or repeated information.

195
Image Compression Fundamentals

Relative Data Redundancy R = 1 – 1/C


Same information is represented C = Compression Ratio
by two different representation
C = b / b′

b′ If C =10
the larger representation has 10 bits of data for every
1 bit of data in the smaller representation.

It means Relative data redundancy is 0.9 i.e 90% of data is redundant.

196
Image Compression Fundamentals

Redundancy

Coding Redundancy Spatial and Temporal Redundancy Irrelevant Information

It refers to presence of
unnecessary bits used to Spatial Redundancy – Pixel those are
Close to each other often have similar Data that can be discarded without
represent the image data. significantly affecting the perceive
values.
quality of the image.
Temporal Redundancy
There is very less difference
Between to successive video frames.

197
Image Compression Models
Image

For videos –
Where the discrete
parameter t specify
the time.

Overall Objective – Input image is fed to the encoder which creates a compressed representation of it. Now
this compressed data is fed to the decoder which reconstruct the original data or image.
198
Image Compression Fundamentals
Encoding or Compression Process

Mapper Quantizer Symbol Coder

Mapper Transform is generally


used to reduce the spatial and
The objective of this step is It generate a fixed or variable length
temporal redundancy.
to keep out the irrelevant code to represent quantizer output.
information present in the
In video application, mapper
compressed representation.
function uses previous video
frames to facilitate the removal
of temporal redundancy.

NOTE: Operation performed by mapper NOTE: Operation performed by NOTE: Operation performed by symbol
function is reversible. Quantizer is not reversible. coder is reversible.

199
Image Compression Fundamentals
Decoding or Decompression Process

It contains mainly two components - First one is symbol decoder and second one is an inverse mapper. These two
components perform exactly inverse operation that is performed by encoder.

Image Compression formats, Containers and Compression Standards

Compression Formats Containers Compression Standards

It is a standard way to It is similar to the file It define the process of compressing


organize or store the format but handles and decompressing the images.
data. multiple types of image
data. These standards are widespread
accepted by image compression
technology.

200
Image Compression Fundamentals

Black Color Formats


Not Sanctioned by International
Standard Organization (ISO)
Blue color Formats
Sanctioned by International
Standard Organization (ISO) 201
Loss Less Compression
Huffman Encoding
Loss Less Compression
Run Length Encoding

Huffman Encoding : its a measure to reduce the coding redundancy.

Steps followed in the Huffman encoding.

Calculate the probability Perform Source Reduction


Assign the symbols in terms
and arrange in the until only 2 probabilities are
of binary.
descending order left.

Calculate different parameters

Average Length of the code Total bits to be transmitted Entropy How much space you saved

202
Huffman Coding working example
Problem Statement –

1. Consider an image of size 10 by 10 (5 bit image). Consider some symbols with different frequencies.
a2 = 40 , a6 = 30 , a1 = 10 , a4 = 10 , a3 = 6 , a5 = 4 (Probability – a2 40/100, a6 30/100, a1 = 10/100…..etc)

Source Reduction

Symbol Probability 1 2 3 4

a2 0.4 0.4 0.4 0.4 0.6

a6 0.3 0.3 0.3 0.3 0.4

a1 0.1 0.1 0.2 0.3


a4 0.1 0.1 0.1
a3 0.06 0.1

a5 0.04

203
Huffman Coding working example
Encoded String: 010100111100

Decoding: a3a1a2a2a6

Source Reduction
Symbol Probability 1 2 3 4

a2 0.4 1 0.4 0.4 0.4 0.6 0

a6 0.3 00 0.3 0.3 0.3 00 0.4 1

a1 0.1 0100 0.1 0.2 010 0.3 01


a4 0.1 0100 0.1 0100 0.1 011
a3 0.06 01010 0.1 0101

a5 0.04 01011

204
Image Compression Fundamentals

Parameter Calculation

Average Length of the code L = 0.4x1 + 0.3 x 2 + 0.1 x 3 + 0.1 x 4 + 0.06 x 5 + 0.04 x 5 = 2.2 bits/symbol

Total bits to be transmitted 10 x 10 x 2.2 = 220bits

Entropy = -p x log2 p
-0.4 x log(base 2) x 0.4 +
-0.3 x log(base 2) x 0.3 +
.
.
-0.04 x log(base 2) x 0.04 = 2.1396

How much space you saved = (10 x 10 x 5 – 10 x 10 x 2.2 )/ 10 x 10 x 5 = 0.56 = 56 %

205
Image Compression Fundamentals
Apply Huffman Encoding to below example

1 3 1 3
5 4 5 4
4 3 5 5
3 1 4 3

206
Run Length Encoding
Run Length Encoding :

Repeating intensities along the row an columns often be compressed by representing runs of identical intensities where
each run length pairs specify the start of the new intensity and number of consecutive pixels that have that intensity.

Example
11111000000001111111110011111100000111111111 Total Bits - 42

(0,2times) (0,5times)
(1,5times)
(1,6times) (1,9times)
(0,8times)

(1,9times)

207
Run Length Encoding
Binary Representation Binary Representation

8 4 2 1
(1,5times) 10101

(0,8times) 01000

(1,9times) 11001

(0,2times) 00010

(1,6times) 10110

(0,5times) 00101

(1,9times) 11001

10101 01000 11001 00010 10110 00101 11001 Total Bits - 35

208
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 1st April to 4h April 2025
[email protected]
209
Image Compression

1. Image Compression Fundamentals.

2. Lossless Compression Models :

3. Run Length Encoding

4. Huffman Coding

5. Lossy Compression

6. Discrete Cosine Transform

8. Color Image Compression

9. Text Recognition : OCR

210
Lossy Compression
Lossless Compression - In this type of compression, after recovering image is exactly same
as it was before applying the compression technique.

Lossy Compression – In this type of compression, after performing the inverse transformation we cant get exactly the same
image as the older one (or image before transformation). Overall the quality of the image get significantly reduced.

Lossy Compression

Discrete Cosine Transform (DCT) Zigzag Coding

211
Lossy Compression : DCT
Discrete Cosine Transform - DCT

Spatial Domain Discrete Cosine Transform Frequency Domain

1. DCT is a lossy image compression techniques.

2. DCT represents an image as a sum of sinusoids of varying magnitudes and frequencies.

3. In DCT most of the significant information or signal energy or image energy, is concentrated in few number of
coefficients (near to the origin) and rest other frequency having very small information which can be stored by using very
less number of bits.
So, in the (P,Q) plane by coding few number of coefficients, we can represent most of the signal or image energy.

212
Lossy Compression: DCT

4. DCT coefficient are real valued while DFT coefficients are complex, therefore, hardware implementation of DCT is
easier than DFT.

5. DCT has vast application such as –

Image Compression – JPEG, HEIF, WebP, BPG etc.

Video Compression – H.261, MJPEG, MPEG1, H.262 (MPEG2), H.265(HEVC), WebM etc.

Audio Compression – AC3(Dolby), AC4

213
Lossy Compression : DCT
Issues with the DCT

1 A common issue with DCT compression in digital media are blocky compression
artifacts, caused by DCT blocks. The DCT algorithm can cause block – based artifacts
when heavy compression is applied.

2 Truncation of higher spectral coefficients results in blurring of the images,


especially where details are high.

3 Coarse quantization of some of the low spectral coefficients introduces graininess


in the smooth portions of the images.

214
Lossy Compression : DCT
2D Forward Discrete Cosine Transform (FDCT)
The two dimensional DCT of an MxN image f(x,y) is defined as follows -

215
Loss Less Compression : DCT
2D Inverse Discrete Cosine Transform (IDCT)
The two dimensional DCT of an MxN image f(x,y) is defined as follows -

216
Lossy Compression
2D DCT Basis Function

First coefficient represent


the constant intensity

Horizontal frequencies are increasing from left to right.

Vertical frequencies are increasing from top to bottom.

217
Lossy Compression

Computing DCT of an Image

Transformation Matrix Approach FFT Based Approach

Transformation Matrix based approach

1. This Method is suitable for small image segments such as 8 x 8, 16 x 16. For the same, a DCT transformation matrix T is
computed first for M x M segment by the following equation -

218
Lossy Compression
2. When Transformation matrix (T) is computed then, DCT of an image segment f(x,y) can be found by

3. Since T is real orthonormal , its inverse is equivalent to its transpose therefore, Inverse DCT can be found by

Fast Fourier Transform (FFT) Based Approach

It utilizes the FFT structure for speedy computation of DCT, hence suitable for large input images.

219
Lossy Compression : DCT

8 x 8 input image

DCT Coefficients

220
Color Image Compression : Flow Chart
Encompression

Color Space Conversion Chroma Subsampling Divide the image

RGB to YCbCr Reduce Chrominance of Divide the image into


image small blocks

Entropy Coding Zigzag Scanning Quantization DCT

Huffman Coding Arrange quantized


coefficient in the Zig Reducing Precision
zag fashion.

Decompression 221
Text Recognition : Optical Character Recognition (OCR) : Flow Chart

Image Acquisition OCR

Scanners, Digital cameras

Post-processing
Preprocessing Feature Extraction

Noise Reduction, Binarization, Spell Check


Contrast Enhancement
Contextual Analysis
Classification
Formatting
Segmentation

Isolate the characters that


need to be recognized.
222
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 7th April to 11th April 2025
[email protected]
223
Motion Detection

1.Motion detection in image and video processing is a critical technique used to identify changes in a
sequence of images or video frames. Essentially, it aims to determine if and where movement has
occurred within a scene.
2. Motion detection focuses on analyzing temporal changes in pixel values across consecutive frames

Frame Differencing

Techniques of Motion Detection Background Subtraction

Optical Flow

224
Motion Detection

Frame Differencing Background Subtraction Optical Flow

1. This Method Involves 1.This technique involves creating a 1.Optical flow estimates the apparent
subtracting one frame from model of the static background and motion of objects between frames by
another frame. then subtracting it from each new analyzing the movement of pixels.
frame.

2. The Resulting difference 2.The remaining pixels represent 2.It provides a more detailed
highlight the areas where moving objects. understanding of motion, including
changes has been occurred. direction and velocity.

3. It's computationally simple


but sensitive to noise and 3.Background subtraction is more 3.Optical flow is computationally
lighting variations. robust than frame differencing but intensive but can handle complex
requires a stable background model. motion scenarios
225
Optical flow field example

For each pixel we compute the


vector.

Based on it , we could identify how


much that vector changed as
compared to the previous frame.

Displacement vector is nothing but


optical flow vector.

226
Optical Flow field example

At boundary there is always problem to identify the correct vectors,


because there is discontinuity at the boundary.
Every pixel is a vector.
227
Optical Flow field example

Crowded sequence where some group of pixels are moving in one direction and some other group of pixels
are moving in other direction.

228
Optical Flow field example

Color coded optical flow


instead of vectors. Here
the color tell you about
direction of motion.

Color coded Traffic


Sequence.

229
Optical Flow Applications

1. Motion Based Segmentation: In a particular video you want to identify which objects are moving and which objects
are not moving. Then compute the optical flow and if optical flow is significant, you could able to say these objects
are moving or in a case when optical flow is not significant then objects are not moving.

2. Structure from motion (3D Shape and motion)

3. Video Compression.

230
Optical Flow
F(x, y, t ) = F(x+dx, y+dy, t+dt)

Two frames are taken at t and dt.

Divide the above equation by dt.

Optical flow equation.

231
Optical Flow : Lucas Kanade Method

One equation and 2 unknowns

For each pixel, you will have one equation so total 9


equations, 2 unknowns. (Over constrained system)

232
Optical Flow
Rewrite the equation in the matrix Trying to make A as a square
form. Here U,V is unknown. matrix.

If we want to minimize it
then first step we have to
perform differentiation.

Pseudo Inverse

Here A is not a square matrix.

233
Optical Flow: Lucas Kanade Method

2 equation and 2 unknowns


through minimization based
approach.

234
Optical Flow: Lucas Kanade Method

235
Optical Flow: Lucas Kanade Method
Comments

1. Lucas Kanade work well for small


motion.

2. If the object moves faster, the


brightness changes rapidly.

In this scenario 2 x 2 and 3 x 3 mask fail


to estimate spatiotemporal derivatives.

3. Pyramids can used to compute large


optical flow vectors. (It takes large
motion and reduce the motion)

236
CSET344/CMCA544
Image and Video Processing
Module 4
Dr. Gaurav Kumar Dashondhi
Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 14th April to 18th April 2025
[email protected]
237
Content

1.Viola Jones method for face detection.

2.Face Recognition, PCA, Concept of Eigen Face

3.Feature Detection for ML Application: SIFT and HOG

4. Salient Object detection.

5. Human Action Recognition from videos.

238
Face Detection

Face detection :

1. Run a window of a fixed or variable size.

2. Extract the different features.

3. Provide these features to a classifier and then take a decision related to face or non face. 239
Face Detection

Feature Vectors.

Features Classifiers

Which features represent the face well. How to construct a face model and efficiently
classify features as face or not ?
240
Face Detection

Identify the different interest points like


corners, edges.

SIFT is mostly utilized to find the


similarity between two images.

Here, our objective is to find face or non


face. So, different features can be
generated from the existing features.

Divide the image into different parts or


template and then match these templates.

But within each templates there is a lot of


variability like eyes don’t look same etc.
241
Face Detection

1. Features should able to


distinguish between face and
non face.

2. Extremely fast to Compute.

Because algo. need to evaluate


millions of window in an image.

242
Viola Jones Method for Face Detection : face or not face

243
Viola Jones Method for Face Detection : Overall Flow Diagram
Its an efficient method to scans an image, using simple features and a cascade structure, to locate the faces in real
time.

Haar Features Simple rectangle filters to detect the contrast difference.

Integral Image A method to increase the calculation of the Haar features.


Overall Flow Diagram

Adaboost A ML method to select most relevant Haar features.


(Adaboosting)

A series of complex classifiers to select and reject


Cascade Classifier
face or non face regions.

244
Viola Jones Method for Face Detection: Haar Filters
Haar filters are based on Haar wavelets.

Haar filter is a two valued filter, which is


computationally more helpful.

Each of the filter (Like HA) applied on the


image and perform the correlation.

Output of this correlation will be a


feature VA .

After applying all the filters, we will end


up with a feature vector.

Note: If you will change the scale then


you will end up with another feature
vector.

Haar Filters Haar Feature Vector


245
Viola Jones Method for Face Detection: Haar Filters

Consider a filter and apply it on


the different images.

Here, filter is nothing but edge


detector.

Haar features are sensitive to the


directionality of the patterns.

246
Viola Jones Method for Face Detection: Haar Filters

Vertical Edge detector

Horizontal Edge detector

Laplacian

247
Viola Jones Method for Face Detection: Haar Filters

A simple Addition and Subtraction is


performed over here.

So, the computational cost of Haar


filter or a Haar feature is cheaper.

VA[i,j] = Sum(Pixel Intensity in White Area) – Sum (Pixel Intensity in the Black Area)
248
Viola Jones Method for Face Detection: Haar Filters

Computational Cost = (N x M) – 1 additions per pixel per filter per scale


249
Viola Jones Method for Face Detection: Integral Image Formation

98 +110 + 99+ 110 + 121 +120 = 658

250
Viola Jones Method for Face Detection: Integral Image Formation

For Example –
An original image I and a Integral Image II is given to
you.

Q. How to perform summation within the rectangle ?

251
Viola Jones Method for Face Detection: Integral Image Formation
STEP 1.
STEP 2. Subtract Q from P.
P = Sum of all the values in the left and top.

STEP 3. Subtract S from P. STEP 4. R is subtracted twice in overall process then add it.
NOTE:
1. Overall
Computational cost
is 3 additions.

2.This computational
cost is independent
of the size of the
rectangle.
252
Viola Jones Method for Face Detection Haar Response using Integral Image

Note:
Integral Image used to compute once per test image.
Integral image formation allows fast computations of Haar
features.
(2061-329+98-584) – (3490-576+329-2061) = 64 253
Total additions = 7
Viola Jones Method for Face Detection : Adaboost or Adaptive Boosting
AdaBoost a Machine Learning based algorithm that excel the operation of feature selection in the Viola Jones Algorithm.
We have a lot of Haar features. Primary role of Adboost is to select most effective haar features from a very large pool.
This selection is very crucial for accuracy and the speed of the face detection.

Weak Classifier Iterative Selection Feature Weighting Strong Classifier

Each Haar like feature treated as In each round it will In each round it evaluates
Weak classifier. focuses on the training all the weak classifier and
sample those are Identify all the features
Here Weak classifier means – A misclassified. that perform well on the
single haar feature along with training data.
threshold.
Features that perform
Weak Means it can only provide a well, will get higher
rough estimate whether a sub weightage.
window contains a face or not.
254
Viola Jones Method for Face Detection: Adaboost or Adaptive Boosting
Adaboost or Adaptive Boosting Strong Classifier : It is combination of many weak classifier (haar features) that Adaboost
identifies as being most effective.

Weak Classifier 1

Weak Classifier 5 Weak Classifier 2

Strong Classifier

Weak Classifier 4 Weak Classifier 3

255
Viola Jones Method for Face Detection : Cascading
Cascading

1. It is structured as a series of filtering stages.


2. Each stage consist of strong classifier, which is collection of most important classifier selected by Adaboost.
3. The stages are arranged in the order of complexity.
4. Initial stages are designed to quickly reject the vast majority of non face sub- windows.
256
Face Recognition, PCA, Concept of eigen faces.

Face Detection Face recognition

Face Verification. Face Identification.


Face or No Face Decision Identify who’s image among all images.

Types of face recognition

Based on Local Region Based on Global Appearance

Local feature analysis Principle Component Analysis (PCA)


Gabor wavelet Independent Component Analysis (ICA)

257
Face Recognition, PCA, Concept of eigen faces.

It generally used to remove the


information which is not useful.

It reduce the dimensions of the data


and accurately decompose the face
structure into orthogonal principal
components which we know as
eigenfaces.

Spatial Domain PCA Space

258
Face Recognition, PCA, Concept of eigen faces.

Eigenface

Eigenface are the eigenvectors of the covariance


matrix.

Eigenfaces are also referred to as ghostly images.

Prime reason - to represent the input data efficiently –


each individual face can be represented in terms of
linear combination of eigenfaces.

259
Face Recognition, PCA, Concept of eigen faces.
Face Recognition Process flow diagram

Calculate the Euclidean


Project the testing
Acquiring Training Compute Eigen faces(Eigenvectors distance between input
face image onto the
Samples of the covariance matrix) face image and training
face space
samples

Find the mean images.


Get its eigenface
find the deviation and
components
centered images.

Initialize the face recognition system Recognize unknown face images


260
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 21st April to 25th April 2025
[email protected]
261
Scale Invariant Feature Transform (SIFT): Motivation

Let's say, you want to recognize this template in the Rich2D image (High resolution).

If you are using template matching for the same purpose then -

1. User need to create a lot of templates with different orientation and scale because the size and orientation of the
templates inside the image may be different.

2. Apart form this, if you could see template is partially not visible in the 2D image, It is covered by some other objects,
The solution in this scenario is, we need to construct a lot of little templates and match all of them. At the end, overall
process is time consuming and computationally not efficient.
262
Scale Invariant Feature Transform (SIFT): Motivation

Instead of doing template matching, one can extract some important descriptive features known as interest points from
template and match it inside the original image.

263
Feature Detection for Machine Learning : SIFT : Flow Diagram
SIFT: It is a robust algorithm designed to identify and describe local features in images that is invariant to scale, rotation
and illumination changes.
or

SIFT can detect the same feature in an image even if the image is resized, rotated and viewed under different lightening
conditions.

Steps in SIFT

Scale Space Extrema Detection Key point Localization Orientation Assignment Key Point Descriptor

264
Feature Detection for Machine Learning : SIFT
1.Scale Space Extrema Detection

The scale space is the process of creating a set of progressively blurred images at multiple resolutions to detect key points
that are scale-invariant (they remain the same even if the image size changes). It helps to detect features that are stable and
can be recognized even when the image is scaled or resized.

It is basically consisting of three different blocks.

Gaussian Blur Difference of Gaussian Identifies Key Points

265
Feature Detection for Machine Learning : SIFT
Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points

First, algorithm progressively applies a gaussian blur to the image to blur it at different scales (levels), which smooths it by
different amounts. This means we see the image from clear to blurry.

This can be described mathematically as:


L(x,y,σ)=G(x,y,σ)∗I(x,y)

Where:
•L(x,y,σ) is the blurred image at scale σ.
•G(x,y,σ) is the Gaussian kernel.
•I(x,y) is the original image.
The image is also downsampled (reduced in size) after each octave, allowing features to be detected at smaller resolutions (or
sizes) as well. Here, octave is a set of images at different resolutions.
266
Feature Detection for Machine Learning : SIFT
1. Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points

The image shows how an image is progressively blurred


across different scales and octaves. Each octave
represents a set of images at progressively lower
resolutions (downsampled). Within each octave, the
image is blurred by different amounts. The leftmost
column shows the images in Octave 1, starting with the
original image at the bottom and becoming more
blurred as you move upward. The middle column
represents Octave 2, where the image has been
downsampled (reduced in size) and similarly blurred at
different scales. The rightmost column shows Octave 3,
where the image is further downsampled and blurred.
267
Feature Detection for Machine Learning : SIFT
1. Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points

After blurring images, the algorithm subtracts one


blurred image from another, producing
the Difference of Gaussians (DoG) images to identify
keypoints. These highlight regions where pixel
intensity changes significantly. These changes are
potential keypoints. DoG is computed by subtracting
two gaussian-blurred images at different scales using
the formula:

D(x,y,σ)=L(x,y,kσ)−L(x,y,σ)
Where 𝑘 is a constant scaling factor.

268
Feature Detection for Machine Learning : SIFT
1.Scale Space Extrema Detection Gaussian Blur Difference of Gaussian Identifies Key Points

Lastly, the algorithm identifies key points by finding maxima and


minima also known as local extrema (either very bright or very dark
spots) over scale and space in the DoG images. The process is
illustrated in the image on the left side. For each pixel in a DoG image,
the algorithm compares it with:
• 8 neighbors in the same scale (the same blurred image).
• 9 pixels in the scale above (the previous blurred image).
• 9 pixels in the scale below (the next blurred image).
This comparison across scales ensures that the algorithm detects
features that appear strong and consistent at a specific scale.

269
Feature Detection for Machine Learning : SIFT
2. Key point Localization

After building the scale-space and finding potential keypoints (local maxima or minima in the DoG images), the locations of
detected keypoints need to be refined to make sure they are accurate. To get a more precise location for each keypoint, a
mathematical method called a Taylor series expansion is used. Think of it as a way to zoom in and find the exact point where
the keypoint should be, like adjusting the focus of a camera for a sharper image.

The Taylor series expansion is used to approximate a function near a given point. In the context of the SIFT algorithm, it's
applied to approximate the DoG function around a potential keypoint to refine its location and scale. The Taylor expansion of
the DoG function, D(x,y,σ) around a candidate keypoint is given by:

270
Feature Detection for Machine Learning : SIFT
2. Key point Localization

Some key points might be located in areas that are too flat or don't have enough variation in brightness (low contrast). These
key points are not useful because they can be easily affected by noise. Therefore, the intensity (brightness) of each key point
is checked. If the intensity is below a certain value (0.03, according to the SIFT paper), that key point is discarded. This means
that only key points that are both well-located and have enough contrast are kept.

Key point at different scale Key point removes ( low contrast) Key point removes (located on edges)
271
Feature Detection for Machine Learning : SIFT
3.Orientation Assignment Now, the identified keypoints are considered stable (they won’t change much if the
image is modified slightly).

To make these keypoints invariant to rotation, a direction or orientation is given.


This helps the algorithm understand that if the image is rotated, it can still find the
keypoint in the same spot because it knows which way it’s pointing. This is
important because, without it, the keypoint wouldn’t be able to handle rotated
images well.

Each keypoint is given a direction to make the algorithm resistant to image rotation.
A small region around the keypoint is analyzed based on its scale, and the
magnitude and gradients in the image are calculated.
Key points and their direction

272
Feature Detection for Machine Learning : SIFT
4. Key Point Descriptor
After the keypoints have been detected and assigned an orientation,
the next step is to create a descriptor for each keypoint. This
descriptor is a compact representation of the keypoint, capturing
the local image information around it.

A key point descriptor is a way to describe keypoints in an image. To


create it, we look at a 16x16 area around each keypoint and divide
this area into 16 smaller blocks, each 4x4 in size. For each small
block, we create an 8-bin histogram to capture the directions of
features. This results in a total of 128 values that make up the
keypoint descriptor. The descriptor is scale-invariant, rotation-
invariant, and robust to changes in lighting and viewpoint.
Computing key point descriptor
273
Feature Detection for Machine Learning : SIFT

SIFT Output SIFT ROI Output

274
Histogram of oriented gradient: HOG (Feature Descriptor)
The histogram of oriented gradients method is a feature descriptor technique used in computer vision and
image processing for object detection. It focuses on the shape of an object, counting the
occurrences of gradient orientation in each local region. It then generates a histogram using the
magnitude and orientation of the gradient.

275
Feature Detection for Machine Learning : HOG
Histogram: A graphical representation of the frequency distribution of data — in this case, the gradients’ directions.
Oriented: Refers to the direction of the gradients.
Gradients: Represent changes in pixel intensity values, capturing the edges, textures, and structures in the image.

Steps in HOG:

Compute HOG in Group cell into larger


Preprocessing Calculate Gradients Divide into cells
each cell blocks.

Resize the image and


perform normalization
Divide the image into small and
to avoid illumination
connected region called as cell.
effect Collect Normalize HOG feature
Cell size is 8 x 8, 16 x 16.
descriptor from all the blocks.

Create 1-D histogram for each


Calculate gradient magnitude and cell.
orientation
276
Feature Detection for Machine Learning : HOG

a b c d e

a – Input image, b – Crop an image with a ratio 1:2 (150 x 300),

c – Resize image into 64 x 128, d – keep a grid on image where 16 rows with grid size 8 x 8.

e – one pixel is shown over here.


277
Feature Detection for Machine Learning : HOG
For a single pixel (60),

Gradient Magnitude: Calculate the difference between x and y direction.


Gradient Direction : Calculate the angle.

278
Feature Detection for Machine Learning : HOG

0 20 40 60 80 100 120 140 160


20-degree gradient direction, Gradient magnitude value is 60 + 30 + 45 = 135
40-degree gradient direction, Gradient magnitude value is 60 + 30 + 15 = 105
30-degree gradient direction, Gradient magnitude value is 60, Now 30 degree is in between 20 and 40 then divide the 60 by 2.
25 degree gradient direction, it is close to 20 as compared to 40 so divide gradient magnitude by 3/4 (i.e. 45) and 1/4 (i.e. 15)
279
CSET344/CMCA544
Image and Video Processing

Dr. Gaurav Kumar Dashondhi


Ph.D. IIT Bombay

Overall Course Coordinator- Lect. Week


Dr. Gaurav Kumar Dashondhi 28th April to 02 May 2025
[email protected]
280
Salient Object Detection (SOD) in videos Salient or prominent or important object detection.

Salient object detection or salient object segmentation. It consist of two stages

1. Detecting the most salient object.


2. Segmenting the actual region of that object.

281
Salient Object Detection (SOD) in videos Steps in salient object detection

Overall Flow Diagram of SOD


Input Video Analysis Output

Feature Extraction Saliency Prediction Saliency map refinement

STEP1

Frame Extraction The Video is broken into smaller frames.

Input Video Analysis

Optical Flow Computation It helps to find out the motion between


consecutive frames.
282
Salient Object Detection (SOD) in videos Steps in salient object detection

STEP2 Spatial Feature Extraction Features from each frame is extracted like texture,
edge, color.
Feature Extraction

Features that capture motions and inter frame


Temporal Feature Extraction
relationships are extracted.

STEP3
Generate a saliency map for each frame
using the spatial feature extracted in the
Spatial Saliency Map Generation previous step.

Saliency Prediction
Generate a saliency map for each frame based
Temporal Saliency Map Generation on temporal feature extracted from the
previous frame.
283
Salient Object Detection (SOD) in videos Steps in salient object detection

STEP 4 Smoothening and Boundary Refinement


Post Processing

Saliency Map Refinement

Fusion of spatial and temporal Fuse the spatial and temporal maps to generate the
saliency Spatiotemporal saliency map.

STEP 5

The final output is the sequence of saliency maps one from each frame of the
Saliency Prediction
input video.

284
Comparison Traditional SOD Vs Deep learning-based techniques
Feature Traditional Methods Deep Learning Based Methods

Learned hierarchical features directly from raw


Hand-crafted features (color, texture, contrast,
Feature Extraction video frames using CNNs (2D, 3D), RNNs (LSTMs,
motion - often based on optical flow).
GRUs), and Transformers.

Implicit learning of temporal dependencies


Explicitly engineered motion features, frame through recurrent layers, 3D convolutions,
Temporal Modeling
differencing, simple integration rules. attention mechanisms, and spatiotemporal
networks.

Contrast-based analysis, heuristic rules, basic End-to-end learning of saliency maps using deep
Saliency Prediction
machine learning classifiers (e.g., SVM). neural networks optimized for this task.

Stronger ability to learn and utilize contextual


Limited understanding of high-level object
Spatial Context information through deep architectures and large
context.
datasets.

Learns motion representations directly or in


Relies heavily on the accuracy of optical flow
Motion Understanding conjunction with optical flow, more robust to
algorithms.
challenging motion scenarios.

285
Comparison Traditional SOD Vs Deep learning-based techniques
Feature Traditional Methods Deep Learning Based Methods

Struggles with cluttered


backgrounds, camouflaged
Handling Complex Scenes More robust to complex scenarios due to the learning capacity of deep models.
objects, and complex
motion.
Often limited generalization
Generalization to unseen data or different Better generalization due to learning from large and diverse datasets.
video types.

Generally lower during


Computational Cost Can be higher due to the complexity of deep networks, but efficiency is an active research area.
inference.

Typically less reliant on Often require large, pixel-wise annotated video saliency datasets for training (though unsupervised and weakly-
Annotation Dependence
large-scale annotated data. supervised methods are emerging).

Less flexible and harder to


Flexibility & Adaptability More flexible and can be adapted to specific requirements through network design and training data.
adapt to new challenges.

Generally lower accuracy


Performance compared to modern deep Achieve state-of-the-art performance on various video saliency benchmarks.
learning methods.

286
Human action recognition or activity from videos sequences
Human Action Recognition (HAR) General Pipeline for Human Action Recognition (HAR)
HAR is a technique which is capable of recognizing and
categorizing the human action based on the sensor
data

Recognizing the human and categorizing it like walk,


jumping in place. 287
Human action recognition or activity from videos sequences
How does human activity recognition work ?

288
Role of Spatio-Temporal feature for action classification
Spatio Temporal features plays critical and fundamental role for human action recognition through videos.

1. It capturers the essence of action by using spatial and temporal features.

2. It encode motion and dynamics which helps to bridge the gap between spatial appearance and
temporal evoluation.

3. It distinguish similar actions.

4. It handle the variation in execution like handle a same action performed with various speed and
style.

5. By jointly considering spatio and temporal features will help to improve the accuracy and
robustness.

289
End Term Question Paper Pattern
• Section A
• Total 5 Questions (3 marks each) Total 15 marks

• Section B
• Total 3 Questions (5 marks each) Total 15 marks

• Section C
Total 1 Question (10 marks each) Total 10 marks

Note: All Modules (1,2,3,4) will be part of end term

You might also like