0% found this document useful (0 votes)
6 views30 pages

Open CV Notes

The document provides a comprehensive guide on image processing using OpenCV, detailing a basic pipeline for image manipulation and analysis through Python scripts. It covers essential operations such as reading images, resizing, color conversion, and various smoothing techniques, along with their definitions and use cases. The content is structured to facilitate understanding for both practitioners and students in the field of computer vision.

Uploaded by

khansamaira395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views30 pages

Open CV Notes

The document provides a comprehensive guide on image processing using OpenCV, detailing a basic pipeline for image manipulation and analysis through Python scripts. It covers essential operations such as reading images, resizing, color conversion, and various smoothing techniques, along with their definitions and use cases. The content is structured to facilitate understanding for both practitioners and students in the field of computer vision.

Uploaded by

khansamaira395
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Open CV

June 19, 2025

1 Image Processing in OpenCV: Exhaustive Breakdown of a


Basic Pipeline with Definitions, Code, and Explanations
1.1 Introduction
In the domain of computer vision and image analysis, preprocessing an image is a fundamental step.
OpenCV, an open-source computer vision library, provides a wide array of tools to perform tasks such
as image reading, scaling, filtering, edge detection, and morphological transformations.
This section contains an extremely detailed walkthrough of a Python script built using OpenCV. The
goal is to understand the purpose and operation of each line of code, and to define every technical term
and function employed, using formal descriptions, relevant examples, and analytical commentary.

1.2 Python Script Overview


Before dissecting the code, let us present the complete script to establish context:

import cv2 as cv
img = cv.imread(’Photo/Varanasi.jpg’)

def resacleFrame(frame, scale=0.25):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)
cv.imshow(’images’, img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)


cv.imshow(’Gray’, gray)

blur = cv.GaussianBlur(img, (19, 19), cv.BORDER_DEFAULT)


cv.imshow(’Blur’, blur)

canny = cv.Canny(img, 125, 175)


cv.imshow(’Canny’, canny)

dilated = cv.dilate(canny, (7, 7), iterations=3)


cv.imshow(’dilated’, dilated)

eroded = cv.erode(dilated, (3, 3), iterations=1)


cv.imshow(’eroded’, eroded)

resized = cv.resize(img, (500, 500), interpolation=cv.INTER_CUBIC)


cv.imshow(’resized’, resized)

cropped = img[50:400, 250:400]

1
cv.imshow(’cropped’, cropped)

cv.waitKey(0)

1.3 Line-by-Line Analysis with Definitions and Explanations


Line 1: Importing OpenCV
import cv2 as cv
Definition: OpenCV (Open Source Computer Vision Library) is a software library of programming
functions mainly aimed at real-time computer vision. It is written in C++ and has bindings for Python,
Java, and MATLAB.
Usage: We import the cv2 module using the alias cv to simplify notation.

Line 2: Reading an Image from Disk


img = cv.imread(’Photo/Varanasi.jpg’)
Function: cv.imread(filepath) reads an image from the specified file.
• Returns a NumPy array representing the image.
• Reads the image in BGR format (Blue, Green, Red), not RGB.
• If the image is not found, it returns None.
Use Case: Reading an image into memory for processing.

Line 3–7: Defining a Rescaling Function


def resacleFrame(frame, scale=0.25):
width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)
Definition: shape is a property of NumPy arrays that returns a tuple (rows, cols, channels). Here:
• frame.shape[0] = height
• frame.shape[1] = width
cv.resize(): Changes the size of an image. The interpolation method defines how pixel values are
calculated:
• cv.INTER AREA: Preferred for shrinking images.
• cv.INTER LINEAR: Default, best for zooming.
• cv.INTER CUBIC: High-quality zooming, slower.

Line 8: Applying the Rescale Function


img = resacleFrame(img)
Reduces the image size to 25% of the original. Important for reducing computational cost during
testing or on devices with limited resources.

Line 9: Displaying the Rescaled Image


cv.imshow(’images’, img)
cv.imshow(): Opens a GUI window to display the image.
• First argument: Window title.
• Second argument: Image to display.

2
Line 10–11: Grayscale Conversion
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’Gray’, gray)

cv.cvtColor(): Converts images from one color space to another.


Definition of Grayscale:
• An image format where each pixel carries intensity information only (0-255).
• Reduces memory and computation.
Use Case: Essential preprocessing step for edge detection, segmentation, thresholding.

Line 12–13: Gaussian Blur


blur = cv.GaussianBlur(img, (19, 19), cv.BORDER_DEFAULT)
cv.imshow(’Blur’, blur)

Definition: Gaussian blur is a smoothing technique using a Gaussian function. It reduces noise and
detail.
Parameters:
• Kernel size (19, 19): The larger the kernel, the smoother the result.
• cv.BORDER DEFAULT: Pads the borders during convolution.
Use Case: Applied before edge detection to avoid false edges from noise.

Line 14–15: Canny Edge Detection


canny = cv.Canny(img, 125, 175)
cv.imshow(’Canny’, canny)

Definition: Canny edge detection is a multi-stage algorithm to detect a wide range of edges.
How It Works:
1. Apply Gaussian Blur
2. Compute Gradient Magnitude and Direction
3. Non-Maximum Suppression
4. Hysteresis Thresholding with two thresholds
Thresholds (125, 175): Weak edges below 125 are discarded. Edges above 175 are kept. Edges in
between are kept only if connected to strong edges.

Line 16–17: Dilation


dilated = cv.dilate(canny, (7, 7), iterations=3)
cv.imshow(’dilated’, dilated)

Definition: Dilation is a morphological operation that expands the white regions (foreground) in a
binary image.
Kernel: The (7,7) matrix determines the neighborhood for expansion.
Iterations: Repeats the dilation three times.
Use Case: Thickens detected edges to fill gaps or connect components.

Line 18–19: Erosion


eroded = cv.erode(dilated, (3, 3), iterations=1)
cv.imshow(’eroded’, eroded)

Definition: Erosion shrinks the white regions. It’s the inverse of dilation.
Use Case: Removes noise and reduces object size.

3
Line 20–21: Resizing Using Cubic Interpolation
resized = cv.resize(img, (500, 500), interpolation=cv.INTER_CUBIC)
cv.imshow(’resized’, resized)
Definition: Resizing to a fixed dimension (500,500) using cubic interpolation.
Cubic Interpolation: Uses 16 neighboring pixels to compute each new pixel value — produces
smooth results.

Line 22–23: Cropping


cropped = img[50:400, 250:400]
cv.imshow(’cropped’, cropped)
Definition: Cropping is the process of extracting a sub-region from an image.
• img[y1:y2, x1:x2] selects pixels between row 50 and 400, and column 250 to 400.
• This operation does not alter the original image.

Line 24: Holding Windows Open


cv.waitKey(0)
Definition: cv.waitKey() waits for a key event. Argument 0 means wait indefinitely.
Use Case: Prevents the image windows from closing immediately after display.

1.4 Conclusion of the Breakdown


Each function and operation in the above script serves a critical role in the image preprocessing pipeline.
The program covers an extensive range of fundamental operations — reading, resizing, converting, fil-
tering, edge detection, morphological transformations, and region extraction — each of which is indis-
pensable for real-world computer vision tasks such as:
• Object detection and recognition
• Image segmentation
• Preprocessing for machine learning and deep learning models
• Real-time surveillance and motion tracking
The modularity and clarity of this script make it an excellent starting point for any computer vision
practitioner or student.

2 Image Smoothing Techniques Using OpenCV: An In-Depth


Line-by-Line Analysis and Conceptual Overview
2.1 Introduction
Image smoothing, also known as image blurring, is a fundamental technique in computer vision and image
processing. It involves reducing image noise and detail by averaging pixel values with their neighbors.
Smoothing serves as a preprocessing step for edge detection, feature extraction, and denoising.
This section comprehensively analyzes a Python script that demonstrates four major smoothing
techniques using OpenCV:
• Averaging (Box Filter)
• Gaussian Blur
• Median Blur
• Bilateral Filter
We define each concept, explain its role in image processing, and offer a line-by-line commentary on
the implementation.

4
2.2 Complete Python Script Context
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.15):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)
cv.imshow(’cv’, img)

average = cv.blur(img, (3, 3))


cv.imshow(’averatge’, average)

gaus_Avg = cv.GaussianBlur(img, (3, 3), 0)


cv.imshow(’Gaussian Blur’, gaus_Avg)

median = cv.medianBlur(img, 3)
cv.imshow(’Medoan’, median)

bilateral = cv.bilateralFilter(img, 5, 15, 15)


cv.imshow(’Bilateral’, bilateral)

cv.waitKey(0)

2.3 Detailed Explanation and Functionality of Each Line


Line 1–2: Importing Libraries
import cv2 as cv
import numpy as np

cv2: OpenCV library for image processing.


NumPy (np): Fundamental library for matrix operations; essential as OpenCV images are NumPy
arrays.

Line 3: Reading the Image


img = cv.imread(’Photo\\Varanasi.jpg’)

Reads the image from the path. The double backslash ‘


‘ ensures Windows path compatibility. The image is loaded as a NumPy array in BGR format.

Line 4–8: Image Rescaling Function


def resacleFrame(frame, scale=0.15):
width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

Purpose: Reduces image size to 15% of original to optimize display and performance.

• frame.shape[1] = width
• frame.shape[0] = height
• cv.resize(...) scales the image using INTER AREA interpolation (ideal for shrinking).

5
Line 9: Apply Rescaling
img = resacleFrame(img)
Why scale down? Speeds up display and processing, useful in real-time applications or GUI
visualizations.

Line 10: Displaying Original Image


cv.imshow(’cv’, img)
Displays the original (rescaled) image with the label ”cv”.

2.4 Image Smoothing Techniques


We now analyze four distinct smoothing methods applied in the script. Each is unique in how it handles
neighboring pixel values and noise.

2.4.1 1. Averaging (Box Blur)


average = cv.blur(img, (3, 3))
cv.imshow(’averatge’, average)
Definition: Averaging replaces each pixel’s value with the average of its surrounding pixels defined
by the kernel size.
Mathematical Operation:
a b
1 X X
I ′ (x, y) = I(x + i, y + j)
mn i=−a
j=−b

where m × n is the kernel size (e.g., 3 × 3).


Effect: Softens the image uniformly; blurs both noise and edges.
Use Case: Basic noise reduction.

2.4.2 2. Gaussian Blur


gaus_Avg = cv.GaussianBlur(img, (3, 3), 0)
cv.imshow(’Gaussian Blur’, gaus_Avg)
Definition: Gaussian blur uses a weighted average where closer pixels have more influence, based
on a Gaussian distribution.
Mathematical Form:
1 − x2 +y2 2
G(x, y) = e 2σ
2πσ 2
Parameters:
• (3, 3): Kernel size.
• 0: Standard deviation calculated automatically.
Effect: More natural blur compared to averaging; reduces noise while preserving edges better.
Use Case: Preprocessing before edge detection or for photographic effects.

2.4.3 3. Median Blur


median = cv.medianBlur(img, 3)
cv.imshow(’Medoan’, median)
Definition: Median filtering replaces each pixel with the median of neighboring pixels.
Why Median?
• Effective against salt-and-pepper noise.
• Better preserves edges than mean-based methods.
Kernel: 3 refers to a 3x3 neighborhood.
Use Case: Cleaning binary images, smoothing text, reducing salt-and-pepper artifacts.

6
2.4.4 4. Bilateral Filter
bilateral = cv.bilateralFilter(img, 5, 15, 15)
cv.imshow(’Bilateral’, bilateral)

Definition: Bilateral filtering smooths images while preserving edges using both:
• Spatial distance (how close pixels are)

• Intensity difference (how similar pixels are)


Parameters:
• 5: Diameter of pixel neighborhood.

• 15: σcolor : Larger values mean pixels with larger intensity differences will be mixed.
• 15: σspace : Larger values mean more distant pixels will influence the blur.
Effect: Removes noise while retaining sharp edges. Best among all for edge-preserving filtering.
Use Case: Medical imaging, cartoonizing, HDR photography, skin smoothing in portraits.

Final Step: Hold Display Windows Open


cv.waitKey(0)

Definition: Waits for a key event indefinitely to keep the image windows open.
Use Case: Prevents automatic closing of image display windows until user interaction.

2.5 Comparative Summary of Smoothing Techniques


Method Edge Preservation Noise Reduction Use Case
Averaging Poor Moderate Basic smoothing
Gaussian Blur Moderate Good Preprocessing for detection
Median Blur Good Excellent (salt-pepper) Text, binary images
Bilateral Filter Excellent Excellent Facial smoothing, HDR, medical

2.6 Real-World Applications


• Autonomous Vehicles: Preprocess road scenes before edge detection.

• Medical Imaging: Denoise X-ray, MRI, or CT scan images.


• Surveillance: Enhance visibility in low-quality security footage.
• Face Detection: Smoothing images to improve face detection accuracy.

3 Color Space Conversions in OpenCV: A Comprehensive The-


oretical and Practical Overview
3.1 Introduction
Color spaces define how color information is represented in an image. Although digital images are often
stored in the RGB (Red, Green, Blue) format, alternative color spaces can be better suited for specific
image processing tasks such as segmentation, enhancement, tracking, and analysis.
In this section, we explore a Python script that performs multiple color space transformations us-
ing OpenCV. Each conversion is explained in-depth with definitions, mathematical foundations, code
explanation, and real-world use cases.

7
3.2 Complete Python Script Overview
import cv2 as cv
img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.15):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)
cv.imshow(’Varansi’, img)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)


cv.imshow(’gray’, gray)

hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)


cv.imshow(’HSV’, hsv)

lab = cv.cvtColor(img, cv.COLOR_BGR2LAB)


cv.imshow(’LAB’, lab)

rgb = cv.cvtColor(img, cv.COLOR_BGR2RGB)


cv.imshow(’RGB’, rgb)

hsv_bgr = cv.cvtColor(hsv, cv.COLOR_HSV2BGR)


cv.imshow(’HSV to BGR’, hsv_bgr)

lab_bgr = cv.cvtColor(lab, cv.COLOR_LAB2BGR)


cv.imshow(’LAB to BGR’, lab_bgr)

cv.waitKey(0)

3.3 Line-by-Line Explanation and Theoretical Insights


Lines 1–2: Importing and Reading Image
import cv2 as cv
img = cv.imread(’Photo\\Varanasi.jpg’)

cv2: The OpenCV Python binding.


cv.imread(): Loads an image from disk in BGR format by default.
Note: OpenCV loads images in BGR, not RGB. This is important for correct color interpretation.

Lines 3–8: Rescaling Function


def resacleFrame(frame, scale=0.15):
width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

Reduces image size for faster processing and display.


cv.INTER AREA: Recommended interpolation method for image shrinking.

Line 9: Rescaling the Image


img = resacleFrame(img)

Purpose: Reduces image dimensions to 15% of original size.

8
Line 10: Display Original Image
cv.imshow(’Varansi’, img)

Shows the rescaled BGR image labeled as ”Varansi”.

3.4 Color Space Conversions


The next part of the script performs several transformations between color spaces.

3.4.1 1. BGR to Grayscale


gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’gray’, gray)

Definition: Grayscale images store luminance (brightness) only, reducing the image to one channel.
Mathematical Concept:
Y = 0.299R + 0.587G + 0.114B
Use Case: Ideal for edge detection, thresholding, and simplifying input for deep learning.

3.4.2 2. BGR to HSV (Hue, Saturation, Value)


hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)
cv.imshow(’HSV’, hsv)

HSV:

• Hue (H): Color type (0–180 in OpenCV)


• Saturation (S): Vividness (0–255)
• Value (V): Brightness (0–255)
Why HSV? It separates color (H) from intensity (V), which is useful in:

• Color tracking
• Skin detection
• Lighting-invariant operations

3.4.3 3. BGR to LAB (CIE L*a*b*)


lab = cv.cvtColor(img, cv.COLOR_BGR2LAB)
cv.imshow(’LAB’, lab)

CIE L*a*b*:
• L*: Lightness (0 is black, 100 is white)
• a*: Green-Red axis
• b*: Blue-Yellow axis

Use Case: LAB is perceptually uniform — color differences correspond to visual perception. It is
often used for:
• Image enhancement
• Histogram equalization

• Professional photo editing

9
3.4.4 4. BGR to RGB
rgb = cv.cvtColor(img, cv.COLOR_BGR2RGB)
cv.imshow(’RGB’, rgb)

Definition: RGB is the standard format for most image viewers, unlike OpenCV’s BGR.
Use Case: Displaying images using matplotlib, which assumes RGB.

3.4.5 5. HSV to BGR


hsv_bgr = cv.cvtColor(hsv, cv.COLOR_HSV2BGR)
cv.imshow(’HSV to BGR’, hsv_bgr)

Conversion: Reconstructs the original image (approximately) from HSV.


Use Case: After processing in HSV (e.g., masking), conversion back to BGR is needed for sav-
ing/display.

3.4.6 6. LAB to BGR


lab_bgr = cv.cvtColor(lab, cv.COLOR_LAB2BGR)
cv.imshow(’LAB to BGR’, lab_bgr)

Purpose: Converts LAB image back to BGR format after processing.


Caution: Due to rounding errors and color approximations, the reverse transformation may not be
pixel-perfect.

Final Step: Wait for Key Event


cv.waitKey(0)

Function: Keeps all GUI windows open until a key is pressed.

3.5 Summary of Color Spaces


Color Space Channels Advantages Applications
Grayscale 1 Simple, low memory Edge detection, thresholding
HSV 3 Separates color/intensity Color tracking, segmentation
LAB 3 Perceptually uniform Image enhancement, color correction
RGB 3 Natural display order Visualization, graphics

3.6 Real-World Use Cases


• Self-Driving Cars: HSV for lane detection under varying light.

• Medical Imaging: LAB for contrast enhancement.


• Augmented Reality: RGB conversion for visualization.
• Photography: LAB and HSV for color manipulation in editing tools.

3.7 Conclusion
This script demonstrates the powerful and flexible color space conversion capabilities of OpenCV. Each
color model is optimized for specific tasks — choosing the right one can drastically improve the perfor-
mance of your vision system. Understanding these spaces is essential for all image processing pipelines,
especially in computer vision, robotics, and graphics applications.

10
4 Geometric Transformations in OpenCV: Translation, Rota-
tion, and Flipping
4.1 Introduction
Geometric transformations are fundamental operations in computer vision, used to alter the spatial
configuration of an image without changing its content. These include operations such as:
• Translation: Shifting the image along x and/or y axes.
• Rotation: Rotating the image about a defined point.
• Flipping: Mirroring the image across an axis.
Such transformations are essential in data augmentation, robot vision, image registration, and graph-
ical applications. This section analyzes a Python script that implements these transformations using
OpenCV.

4.2 Complete Script Overview


import cv2 as cv
import numpy as np
img = cv.imread(’Photo/download.jpg’)
cv.imshow(’image’, img)

Explanation:
• cv2 is the OpenCV module.
• numpy is used for numerical matrix operations.
• The image is read in BGR format using cv.imread() and displayed using cv.imshow().

4.3 Image Translation


4.3.1 Definition
Translation refers to shifting the image in the horizontal (x-axis) and vertical (y-axis) directions.
Mathematical Form:  
 ′   x
x 1 0 tx  
= · y
y′ 0 1 ty
1
Where:
• tx : shift in x-direction
• ty : shift in y-direction
• x, y: original coordinates
• x′ , y ′ : translated coordinates

4.3.2 Code and Explanation


def Translate(img, x, y):
transMat = np.float32([[1, 0, x], [0, 1, y]])
dimension = (img.shape[1], img.shape[0])
return cv.warpAffine(img, transMat, dimension)

tranlated = Translate(img, 100, -100)


# cv.imshow(’translated’, tranlated)

Function Description:

11
• transMat: The transformation matrix for shifting.
• img.shape[1] is width, img.shape[0] is height.
• cv.warpAffine() applies the affine transformation.

• The image is shifted 100 pixels right and 100 pixels upward.
Use Case: Shifting images for data augmentation in machine learning, or aligning objects in robotics.

4.4 Image Rotation


4.4.1 Definition
Rotation involves rotating an image around a fixed point, usually the center of the image.
Mathematical Form:
 
cos θ − sin θ (1 − cos θ)x0 + sin θy0
RotMatrix =
sin θ cos θ (1 − cos θ)y0 − sin θx0

4.4.2 Code and Explanation


def Rotate(img, angle, rotPoint=None):
(h, w) = img.shape[:2]
if rotPoint is None:
rotPoint = (w // 2, h // 2)
rotMat = cv.getRotationMatrix2D(rotPoint, angle, 1.0)
dimension = (w, h)
return cv.warpAffine(img, rotMat, dimension)

rotated = Rotate(img, -45)


# cv.imshow(’rotated’, rotated)

rot_rotated = Rotate(rotated, -45)


# cv.imshow(’rotated twice’, rot_rotated)

Function Breakdown:
• angle: Rotation angle in degrees (negative = clockwise).
• rotPoint: Center of rotation. Defaults to image center.
• cv.getRotationMatrix2D(): Returns a 2x3 affine rotation matrix.

• cv.warpAffine(): Applies the rotation.


Use Case:
• Aligning or re-orienting images.

• Simulating camera or object movement.


• Augmenting training data in classification tasks.

4.5 Image Flipping


4.5.1 Definition
Flipping reverses the pixels of an image either horizontally, vertically, or both.

12
4.5.2 Code and Modes
flip = cv.flip(img, 0) # Vertical Flip
# cv.imshow(’fliped’, flip)

flip2 = cv.flip(img, 1) # Horizontal Flip


# cv.imshow(’fliped2’, flip2)

flip3 = cv.flip(img, -1) # Both Horizontal and Vertical


# cv.imshow(’fliped3’, flip3)

cv.flip() Axis Codes:

• 0: Flip vertically.
• 1: Flip horizontally.
• -1: Flip both axes.
Use Case:

• Horizontal flip for facial recognition symmetry.


• Vertical flip for mirrored environments.
• Combined flip for geometric data augmentation.

4.6 Final Step: Holding Windows Open


cv.waitKey(0)

Purpose: Pauses the execution and keeps image windows open until a key is pressed.

4.7 Summary of Geometric Transformations

Transformation Function Matrix


 Involved
 Typical Use Cases
1 0 x
Translation cv.warpAffine() Object tracking, registration
0 1 y
Rotation cv.getRotationMatrix2D() 2x3 rotation matrix Image orientation, augmentation
Flipping cv.flip() N/A (built-in logic) Data augmentation, symmetry correction

4.8 Real-World Applications


• Augmented Reality: Reorient virtual objects in the user view.
• Medical Imaging: Standardize direction of X-rays or MRIs.

• Machine Learning: Geometric augmentations increase dataset diversity.


• Autonomous Vehicles: Adjust camera inputs to align with trajectory.

4.9 Conclusion
This script demonstrates how spatial transformations such as translation, rotation, and flipping can be
implemented in OpenCV. These operations are critical for real-time applications, data preprocessing,
and robust model training. Mastering them provides a strong foundation for more advanced computer
vision workflows.

13
5 Channel Splitting and Merging in OpenCV: Theory, Code,
and Applications
5.1 Introduction
In digital image processing, color images are typically represented using multiple channels — each channel
encodes intensity values for a primary color component. In the case of the BGR format (used by
OpenCV), these components are:
• B: Blue Channel
• G: Green Channel

• R: Red Channel
Manipulating these channels independently enables various applications such as color filtering, en-
hancement, and object detection based on specific spectral properties.
This section provides an in-depth explanation of how to split and merge color channels using OpenCV,
based on the provided Python script.

5.2 Complete Python Script Overview


import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.15):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)
cv.imshow(’Varansi’, img)

blank = np.zeros(img.shape[:2], dtype=’uint8’)


b, g, r = cv.split(img)

blue = cv.merge([b, blank, blank])


cv.imshow(’Blue’, blue)

green = cv.merge([blank, g, blank])


cv.imshow(’Green’, green)

red = cv.merge([blank, blank, r])


cv.imshow(’Red’, red)

cv.waitKey(0)

5.3 Line-by-Line Explanation and Theoretical Background


Lines 1–2: Importing Libraries
import cv2 as cv
import numpy as np

These are the standard imports for OpenCV and NumPy.

14
Line 3: Reading the Image
img = cv.imread(’Photo\\Varanasi.jpg’)
Loads the image in BGR format. The image is stored as a 3-dimensional NumPy array of shape
(height, width, 3).

Lines 4–8: Rescaling the Image


def resacleFrame(frame, scale=0.15):
width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)
Purpose: Downscale the image to reduce memory and computation.
cv.INTER AREA: Preferred interpolation method for reducing image size.

Line 9: Applying Rescaling


img = resacleFrame(img)

Line 10: Displaying the Rescaled Image


cv.imshow(’Varansi’, img)
Shows the complete rescaled image for reference.

Line 11: Creating a Blank Image


blank = np.zeros(img.shape[:2], dtype=’uint8’)
Explanation:
• img.shape[:2] returns (height, width) — for a single-channel blank image.
• uint8 specifies pixel values from 0 to 255.
• Used as placeholder for zeroing unused color channels.

5.4 Channel Splitting


b, g, r = cv.split(img)
cv.split(): Decomposes a 3-channel BGR image into its individual blue, green, and red components.
After Splitting:
• b, g, and r are 2D arrays (grayscale images).
• Each pixel in these arrays corresponds to the intensity of that color in the original image.

5.5 Channel Merging and Visualization


To visualize each channel separately in color, we merge the target channel with two blank matrices.

5.5.1 Blue Channel Visualization


blue = cv.merge([b, blank, blank])
cv.imshow(’Blue’, blue)
Interpretation:
• b is retained in the blue channel.
• Green and red are zeroed out.
• Result: Pure blue intensities in the image.

15
5.5.2 Green Channel Visualization
green = cv.merge([blank, g, blank])
cv.imshow(’Green’, green)

5.5.3 Red Channel Visualization


red = cv.merge([blank, blank, r])
cv.imshow(’Red’, red)

5.6 Final Display Hold


cv.waitKey(0)
Prevents all OpenCV display windows from closing until a key is pressed.

5.7 Why Split and Merge Channels?


5.7.1 Applications of Splitting
• Feature Detection: Extract features from a specific color component.
• Masking: Apply masks only on the green or red channel.
• Analysis: Measure distribution and histogram of color intensities.

5.7.2 Applications of Merging


• Channel Manipulation: Modify brightness/contrast of one channel.
• Color Emphasis: Highlight or isolate certain colors in the image.
• Image Reconstruction: Combine edited channels to form the final image.

5.8 Matrix Insight


A pixel (x, y) in the original image has:
img[x, y] = [B, G, R]
After splitting:
b[x, y] = B, g[x, y] = G, r[x, y] = R
After merging with blank channels:
New Image[x, y] = [B, 0, 0] (for Blue Visualization)

5.9 Conclusion
Channel splitting and merging is a powerful low-level operation in image processing. It provides the
flexibility to perform channel-specific transformations, filtering, and analysis. Understanding this concept
is essential for tasks in color science, machine vision, and neural network preprocessing.

6 Contour Detection in OpenCV: Theory, Code Dissection, and


Applications
6.1 Introduction
Contour detection is a fundamental operation in computer vision. A contour can be defined as a curve
joining all continuous points along a boundary which share the same color or intensity. In binary images,
this typically means tracing the edges of white regions.
This section provides a comprehensive, line-by-line analysis of a Python script that detects and
draws contours using OpenCV. Concepts covered include grayscale conversion, thresholding, binary
segmentation, contour retrieval modes, and drawing functions.

16
6.2 Complete Script Overview
import cv2 as cv
import numpy as np

img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.25):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)
cv.imshow(’JPG’, img)

blank = np.zeros(img.shape, dtype=’uint8’)


cv.imshow(’Blank’, blank)

gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)


cv.imshow(’Gray’, gray)

ret, thresh = cv.threshold(gray, 125, 255, cv.THRESH_BINARY)


cv.imshow(’Thresh’, thresh)

contours, hierarchies = cv.findContours(thresh, cv.RETR_LIST, cv.CHAIN_APPROX_NONE)


print(f’{len(contours)}’)

cv.drawContours(blank, contours, -1, (0, 0, 255), 1)


cv.imshow(’Countours’, blank)

cv.waitKey(0)

6.3 Step-by-Step Code Explanation


Lines 1–3: Imports and Image Loading
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)

• cv2: OpenCV library for image processing.


• numpy: For matrix and numerical operations.
• The image is loaded in BGR format.

Lines 4–9: Image Rescaling


def resacleFrame(frame, scale=0.25):
...
img = resacleFrame(img)

Scales the image to 25% of its original dimensions for better display and processing efficiency.

Line 10: Display Original Image


cv.imshow(’JPG’, img)

17
Line 11: Creating a Blank Image
blank = np.zeros(img.shape, dtype=’uint8’)
cv.imshow(’Blank’, blank)

Purpose: An empty canvas of the same shape as the original image, used to draw contours.

Line 12–13: Grayscale Conversion


gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’Gray’, gray)

Why Grayscale? Most contour detection techniques operate on single-channel images where pixel
intensities define object boundaries.

6.4 Thresholding for Binary Conversion


ret, thresh = cv.threshold(gray, 125, 255, cv.THRESH_BINARY)
cv.imshow(’Thresh’, thresh)

cv.threshold(): Converts grayscale to binary:


(
maxVal, if src(x, y) > thresh
dst(x, y) =
0, otherwise

Parameters:
• gray: Source image
• 125: Threshold value
• 255: Maximum value assigned to pixels above threshold

• cv.THRESH BINARY: Thresholding method


Use Case: Converts image to black and white, necessary for contour extraction.

6.5 Finding Contours


contours, hierarchies = cv.findContours(thresh, cv.RETR_LIST, cv.CHAIN_APPROX_NONE)

Definition:
cv.findContours() retrieves contours from a binary image.
Returns:
• contours: List of contour points (arrays of (x, y))

• hierarchies: Structural hierarchy among contours


Modes:
• cv.RETR LIST: Retrieves all contours, no hierarchy.

• cv.RETR TREE: Retrieves all contours and builds a full hierarchy tree.
• cv.RETR EXTERNAL: Retrieves only the outermost contours.
Contour Approximation:
• cv.CHAIN APPROX NONE: Stores all contour points.

• cv.CHAIN APPROX SIMPLE: Removes redundant points.

18
6.6 Drawing Contours
cv.drawContours(blank, contours, -1, (0, 0, 255), 1)
cv.imshow(’Countours’, blank)

Parameters:
• blank: Destination canvas

• contours: List of contours


• -1: Draw all contours (use index to draw specific one)
• (0, 0, 255): Red color in BGR

• 1: Thickness of contour lines

6.7 Alternative Method: Canny + Contours (commented)


# blur = cv.GaussianBlur(gray, (5,5), cv.BORDER_DEFAULT)
# canny = cv.Canny(blur, 125, 175)
# cv.imshow(’Canny’, canny)

Explanation:
• Blurring reduces noise before edge detection.
• Canny edge detection produces binary edges.
• Contours can be found using these edges as input.

6.8 Summary of Key Functions


Function Purpose Category
cv.cvtColor() Convert BGR to Grayscale Preprocessing
cv.threshold() Convert grayscale to binary Segmentation
cv.findContours() Extract contours from binary image Feature Extraction
cv.drawContours() Draw contours on an image Visualization

6.9 Real-World Applications


• Object Detection: Localize objects based on boundary shapes.
• Image Segmentation: Divide image into regions.

• Shape Matching: Compare contour structures for object classification.


• Medical Imaging: Outline tumors, tissues in X-rays/MRIs.
• Robotics: Detect obstacles or follow lines based on contours.

6.10 Conclusion
Contour detection is a foundational technique in image analysis. Through operations like grayscale
conversion, thresholding, and morphological processing, contours can be reliably extracted and visualized.
Understanding these functions builds a strong base for more advanced tasks in computer vision such as
shape analysis, object tracking, and real-time robotic perception.

19
7 Bitwise Operations in OpenCV: Theory, Logic, and Visual
Image Manipulation
7.1 Introduction
Bitwise operations are logical manipulations applied at the binary level between two images. Each pixel
in the resulting image is computed by applying binary logic (AND, OR, XOR, NOT) to the corresponding
pixels in the input images.
These operations are extremely useful in:
• Image masking
• Region of Interest (ROI) extraction
• Image blending
• Set-theoretic shape operations
In this section, we explore a Python script implementing all major bitwise operations using simple
geometric shapes: a rectangle and a circle.

7.2 Complete Script Overview


import cv2 as cv
import numpy as np

img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.15):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

img = resacleFrame(img)

blank = np.zeros((400, 400), dtype=’uint8’)


rectangle = cv.rectangle(blank.copy(), (30, 30), (370, 370), 255, -1)
circle = cv.circle(blank.copy(), (200, 200), 200, 255, -1)

binand = cv.bitwise_and(rectangle, circle)


cv.imshow(’Bitwise AND’, binand)

binor = cv.bitwise_or(rectangle, circle)


cv.imshow(’Bitwise OR’, binor)

binxor = cv.bitwise_xor(rectangle, circle)


cv.imshow(’Bitwise XOR’, binxor)

binnot = cv.bitwise_not(rectangle)
cv.imshow(’Bitwise NOT’, binnot)

cv.waitKey(0)

7.3 Line-by-Line Code Explanation


Lines 1–3: Importing and Reading
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)

20
• cv2: OpenCV library.
• numpy: Matrix operations.
• img: The actual image isn’t used for logic, but is read and scaled.

Lines 4–9: Rescaling Image


def resacleFrame(frame, scale=0.15): ...
img = resacleFrame(img)

Downscales image for preview or context — not used in logic processing here.

Line 10: Creating Blank Image


blank = np.zeros((400, 400), dtype=’uint8’)

Creates a 400x400 grayscale image filled with 0 (black background).

7.4 Drawing Shapes on Blank Canvas


rectangle = cv.rectangle(blank.copy(), (30, 30), (370, 370), 255, -1)
circle = cv.circle(blank.copy(), (200, 200), 200, 255, -1)

• cv.rectangle(): Draws a white square.


• cv.circle(): Draws a white circle.
• 255: White color in grayscale.
• -1: Fills the shape.
• blank.copy(): Ensures original blank image isn’t modified.

Result: Two binary images (rectangle and circle) ready for bitwise operations.

7.5 Bitwise AND


binand = cv.bitwise_and(rectangle, circle)
cv.imshow(’Bitwise AND’, binand)

Logical Operation:
Result(x, y) = Rectangle(x, y) ∧ Circle(x, y)
Interpretation: Only the intersection of the two shapes is white; rest is black.

7.6 Bitwise OR
binor = cv.bitwise_or(rectangle, circle)
cv.imshow(’Bitwise OR’, binor)

Logical Operation:
Result(x, y) = Rectangle(x, y) ∨ Circle(x, y)
Interpretation: Union of the shapes is white; only where both are black is the result black.

7.7 Bitwise XOR


binxor = cv.bitwise_xor(rectangle, circle)
cv.imshow(’Bitwise XOR’, binxor)

Logical Operation:
Result(x, y) = Rectangle(x, y) ⊕ Circle(x, y)
Interpretation: Only regions where the shapes do not overlap are white.

21
7.8 Bitwise NOT
binnot = cv.bitwise_not(rectangle)
cv.imshow(’Bitwise NOT’, binnot)

Logical Operation:
Result(x, y) = ¬Rectangle(x, y)
Interpretation: Inverts all pixels — white becomes black, black becomes white.

7.9 Visual Comparison Summary


Operation Region Highlighted Application
AND Intersection Mask intersection or overlap detection
OR Union Combining ROIs or masks
XOR Non-overlapping parts Detect change regions
NOT Inversion Mask inversion or background change

7.10 Applications of Bitwise Operations


• Masking: Use AND to apply a mask over an image.
• Segmentation: Use XOR to isolate unique regions.
• ROI Manipulation: Use OR to combine multiple regions.
• Inversion Tasks: Use NOT to invert binary masks or background.

7.11 Conclusion
Bitwise operations are efficient, low-level operations that form the backbone of image masking, segmenta-
tion, and blending workflows. When combined with contour extraction or thresholding, these operations
unlock powerful tools for image analysis, robotics vision, and medical imaging.

8 Image Masking in OpenCV: Theoretical and Practical Explo-


ration
8.1 Introduction
Masking is a fundamental operation in image processing where certain regions of an image are selected
for processing while others are ignored. A mask is a binary matrix (black-and-white image) that acts as
a filter to specify which parts of the original image should be preserved or altered.
In OpenCV, masking is often implemented using bitwise operations in combination with binary masks.
This section explains a Python script that uses OpenCV to create a circular mask and apply it
to a resized image of Varanasi. The explanation includes visual logic, data structures, and pixel-level
implications of masking.

8.2 Complete Python Script Overview


import cv2 as cv
import numpy as np

img = cv.imread(’Photo\\Varanasi.jpg’)

def resacleFrame(frame, scale=0.15):


width = frame.shape[1] * scale
height = frame.shape[0] * scale
dimension = (int(width), int(height))
return cv.resize(frame, dimension, interpolation=cv.INTER_AREA)

22
img = resacleFrame(img)

blank = np.zeros(img.shape[:2], dtype=’uint8’)


cv.imshow(’blank’, blank)

mask = cv.circle(blank, (img.shape[1] // 2, img.shape[0] // 2), 100, 255, -1)


cv.imshow(’mask’, mask)

masked = cv.bitwise_and(img, img, mask=mask)


cv.imshow(’masked’, masked)

cv.waitKey(0)

8.3 Step-by-Step Code Explanation


Lines 1–3: Importing and Reading Image
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)

• cv2: OpenCV for image processing.


• numpy: For matrix operations.
• img: Loaded in BGR format.

Lines 4–9: Rescaling Function and Execution


def resacleFrame(...):
...
img = resacleFrame(img)

Resizes the image to 15% of its original dimensions using cv.INTER AREA, which is effective for
downscaling.

Line 10: Creating a Blank Image


blank = np.zeros(img.shape[:2], dtype=’uint8’)

Explanation:
• img.shape[:2] gives (height, width) — creating a single-channel (grayscale) image.
• All pixels initialized to 0 (black).
Purpose: Acts as the canvas on which the circular mask is drawn.

Line 11: Displaying Blank Canvas


cv.imshow(’blank’, blank)

8.4 Creating the Mask


mask = cv.circle(blank, (img.shape[1] // 2, img.shape[0] // 2), 100, 255, -1)
cv.imshow(’mask’, mask)

Explanation:
• Draws a filled white circle (255) at the center of the image on the blank canvas.
• Radius = 100 pixels.
• The result is a binary mask with a white circle on black background.

23
Mathematical Description:
(
255 if (x − cx )2 + (y − cy )2 < r2
mask(x, y) =
0 otherwise

Where (cx , cy ) is the center and r is the radius.

8.5 Applying the Mask


masked = cv.bitwise_and(img, img, mask=mask)
cv.imshow(’masked’, masked)

Bitwise AND Operation with Mask


(
img(x, y), if mask(x, y) = 255
Output(x, y) =
0, otherwise

• The original image is preserved where the mask is white.


• All other regions are turned to black.

cv.bitwise and(...):
• First argument: input image
• Second argument: same image (bitwise operation on itself)
• mask=mask: optional mask to restrict effect

8.6 Visual Logic


• Left: Original image

• Middle: Binary mask (circle)


• Right: Output image with only circular region preserved

8.7 Applications of Masking


• Object Isolation: Select circular features like eyes, dials, fruits.
• ROI Extraction: Focus analysis only on region of interest.
• Blurring Specific Zones: Apply filters selectively.
• Medical Imaging: Highlight anatomical zones like tumors or vessels.

8.8 Conclusion
Masking is a vital technique in computer vision that provides spatial selectivity in image processing tasks.
Whether used for object isolation, attention-based filtering, or ROI-focused analytics, masks guide how
and where image operations are applied. This script serves as a minimal yet powerful example of how
to implement masks using NumPy arrays and OpenCV’s bitwise operations.

24
9 Image Histograms in OpenCV: A Deep Dive into Pixel Dis-
tribution and Analysis
9.1 Introduction
An image histogram is a graphical representation of the distribution of pixel intensities in a digital
image. It plots the number of pixels for each intensity value. Histograms are crucial for understanding
image contrast, brightness, dynamic range, and for preprocessing tasks like thresholding, equalization,
and segmentation.
This chapter dissects a script that computes and visualizes both grayscale and color histograms, with
and without masking, using OpenCV and matplotlib. The discussion covers every line in technical
detail and explains underlying principles with mathematical rigor.

9.2 Script Overview


import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
• cv2: OpenCV library for image manipulation.
• numpy: Matrix and numerical computations.
• matplotlib.pyplot: For histogram plotting.

Image Reading and Rescaling


img = cv.imread(’Photo\\Varanasi.jpg’)
def resacleFrame(frame, scale=0.25):
...
img = resacleFrame(img)
cv.imshow(’JPG’, img)
• The image is loaded in BGR format.
• Rescaled to 25% of its original size using cv.INTER AREA.

Blank Canvas for Masking


blank = np.zeros((img.shape[:2]), dtype=’uint8’)
Creates a single-channel image of same height and width as the input, filled with zeros (black).

9.3 Part A — Grayscale Histogram (Commented)


The following section, though commented out, is worth analyzing theoretically.

Grayscale Conversion
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’gray’, gray)
Converts the image from 3-channel BGR to a single-channel grayscale using luminance-weighted
average:
Y = 0.299 · R + 0.587 · G + 0.114 · B

Mask Creation for Grayscale


circle = cv.circle(blank, (img.shape[1]//2, img.shape[0]//2), 200, 255, -1)
mask = cv.bitwise_and(gray, gray, mask=circle)
• Draws a filled white circle of radius 200 pixels centered on the image.
• The mask is then applied using cv.bitwise and, keeping only pixels inside the circular region.

25
Grayscale Histogram Computation
gray_hist = cv.calcHist([gray], [0], mask, [256], [0, 256])

Explanation:
• First arg: List of images.
• Second arg: Channel index (0 for grayscale).
• Third arg: Binary mask.
• Fourth arg: Number of bins (256 for 8-bit image).
• Fifth arg: Intensity range (0 to 255).
Mathematical Interpretation: Let I(x, y) be pixel intensities and M (x, y) the mask:
W X
X H
H(i) = δ(I(x, y) = i) · δ(M (x, y) = 255)
x=0 y=0

Grayscale Histogram Plotting


plt.figure()
plt.title(’Grayscale Histogram’)
plt.xlabel(’bins’)
plt.ylabel(’# of pixels’)
plt.plot(gray_hist)
plt.xlim([0, 256])
plt.show()

9.4 Part B — Color Histogram (Active)


Circular Mask for Color Histogram
mask = cv.circle(blank, (img.shape[1]//2,img.shape[0]//2), 100, 255, -1)
masked = cv.bitwise_and(img, img, mask=mask)
cv.imshow(’Mask’, masked)

• Creates a circular mask with radius 100.


• Applies it on the 3-channel image.

Color Histogram Calculation


colors = (’b’, ’g’, ’r’)
for i, col in enumerate(colors):
hist = cv.calcHist([img], [i], None, [256], [0, 256])
plt.plot(hist, color=col)
...

• Iterates through channels: blue (0), green (1), red (2).


• Computes histogram for each channel.
• None mask means histogram for entire image.

Histogram Plotting for Each Color Channel


plt.title(’Color Histogram’)
plt.xlabel(’bins’)
plt.ylabel(’# of pixels’)
plt.xlim([0,256])
plt.show()

26
9.5 Histogram Interpretation and Analysis
• X-axis (bins): Intensity values from 0 (black) to 255 (white).
• Y-axis: Count of pixels for each intensity.
• Peak at high intensities: Brighter image.
• Spread across range: High contrast.
• Narrow spike: Low contrast or under/overexposure.

9.6 Mathematical Foundation: Histogram Function


W X
X H
Hc (i) = δ(Ic (x, y) = i)
x=0 y=0

Where:
• c ∈ {B, G, R} is the color channel.
• i ∈ [0, 255] is the bin index.
• δ(·) is the Kronecker delta (1 if true, else 0).

9.7 Use Cases of Histograms


• Contrast Enhancement: Detect if histogram is clustered at one end.
• Equalization: Flatten the histogram to improve visibility.
• Image Comparison: Use histogram correlation as similarity metric.
• Segmentation: Intensity-based region separation.
• Camera Feedback: Auto exposure and lighting adjustment.

9.8 Conclusion
Histograms are essential for analyzing the tonal and color distribution of images. Through OpenCV’s
calcHist and Python’s matplotlib, we can visualize and interpret image data at a statistical level.
Whether working on segmentation, enhancement, or machine learning preprocessing, histograms provide
powerful insight into the underlying pixel structure of images.
This script illustrates both grayscale and color histogram construction, as well as spatially restricted
analysis using masks—offering a practical toolbox for any researcher in computer vision or digital image
processing.

10 Thresholding in OpenCV: Fixed and Adaptive Methods with


Binary Segmentation
10.1 Introduction
Thresholding is a fundamental technique in image processing used to segment images by converting
grayscale images into binary images. In its simplest form, thresholding sets all pixels above a certain
intensity to one value (usually white) and all below to another (usually black).
This section deeply explores fixed (global) and adaptive thresholding using OpenCV. These are often
used in:
• Document scanning (binarization),
• Object segmentation,
• Optical character recognition (OCR),
• Industrial quality inspection.

27
10.2 Script Overview
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
img = cv.imread(’Photo\\Varanasi.jpg’)

• cv2: OpenCV for image processing.


• numpy: For image shape and numerical computations.
• matplotlib.pyplot: Optional (not used in this particular script).

Image Rescaling
def resacleFrame(frame, scale=0.25):
...
img = resacleFrame(img)

Purpose: Reduce the size of the image to 25% for faster processing and display. cv.INTER AREA is
used for downsampling.

Original Image and Grayscale Conversion


cv.imshow(’JPG’, img)
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’gray’, gray)

• Converts the 3-channel BGR image to a 1-channel grayscale image.


• Grayscale is essential before applying thresholding.

10.3 Binary Thresholding


threshold, thresh = cv.threshold(gray, 150, 255, cv.THRESH_BINARY)
cv.imshow(’thresh’, thresh)

Definition:
cv.threshold() applies a global fixed threshold:
(
′ maxVal = 255 if I(x, y) > 150
I (x, y) =
0 otherwise

• gray: input image

• 150: threshold value


• 255: maximum value (white)
• cv.THRESH BINARY: operation mode

Interpretation: Segments the image into two parts based on intensity — good when lighting is
uniform.

10.4 Inverse Binary Thresholding


threshold, thresh_inv = cv.threshold(gray, 150, 255, cv.THRESH_BINARY_INV)
cv.imshow(’thresh_inv’, thresh_inv)

28
Definition:
(
′ 0 if I(x, y) > 150
I (x, y) =
maxVal = 255 otherwise
Use Case: Useful when the foreground is darker than the background (e.g., dark text on white
paper).

10.5 Adaptive Thresholding


adaptive_thresh = cv.adaptiveThreshold(gray, 255, cv.ADAPTIVE_THRESH_MEAN_C,
cv.THRESH_BINARY, 21, 3)
cv.imshow(’adaptive’, adaptive_thresh)

Concept:
Fixed thresholding fails under non-uniform lighting. Adaptive thresholding calculates the threshold value
for a pixel based on a small neighborhood around it.

Mathematical Formulation:
(
255 if I(x, y) > T (x, y)
I ′ (x, y) =
0 otherwise
Where T (x, y) is the mean or weighted sum of the neighboring pixel intensities in a window (block size).

Parameters Explained:
• gray: Input image.
• 255: Maximum value.
• cv.ADAPTIVE THRESH MEAN C: Uses mean of block.
• cv.THRESH BINARY: Binary threshold.
• 21: Block size (must be odd).
• 3: Constant subtracted from the mean.

Modes:
• cv.ADAPTIVE THRESH MEAN C: Mean of neighborhood.
• cv.ADAPTIVE THRESH GAUSSIAN C: Weighted Gaussian mean.

10.6 Visual Comparisons


• Fixed Binary: Sharp cutoff at 150, sensitive to lighting.
• Inverse Binary: Inverts result; useful for dark-on-light features.
• Adaptive: Locally adjusted, ideal for scanned documents or scenes with shadows.

10.7 Applications of Thresholding Techniques


• Document Scanning and OCR: Adaptive thresholding improves text clarity.
• Edge Detection Preprocessing: Binary images simplify edge analysis.
• Medical Imaging: Segment tumors or tissues.
• License Plate Recognition: Helps isolate characters.
• Fingerprint Recognition: Prepares binary ridge maps.

29
10.8 Conclusion
Thresholding transforms grayscale images into binary masks that are crucial for further analysis. While
fixed thresholding is computationally cheaper and sufficient under consistent lighting, adaptive thresh-
olding is significantly more robust in real-world scenarios with variable illumination. Both techniques
form the foundation for many high-level vision tasks in industry and research.

30

You might also like