Open CV Notes
Open CV Notes
import cv2 as cv
img = cv.imread(’Photo/Varanasi.jpg’)
img = resacleFrame(img)
cv.imshow(’images’, img)
1
cv.imshow(’cropped’, cropped)
cv.waitKey(0)
2
Line 10–11: Grayscale Conversion
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’Gray’, gray)
Definition: Gaussian blur is a smoothing technique using a Gaussian function. It reduces noise and
detail.
Parameters:
• Kernel size (19, 19): The larger the kernel, the smoother the result.
• cv.BORDER DEFAULT: Pads the borders during convolution.
Use Case: Applied before edge detection to avoid false edges from noise.
Definition: Canny edge detection is a multi-stage algorithm to detect a wide range of edges.
How It Works:
1. Apply Gaussian Blur
2. Compute Gradient Magnitude and Direction
3. Non-Maximum Suppression
4. Hysteresis Thresholding with two thresholds
Thresholds (125, 175): Weak edges below 125 are discarded. Edges above 175 are kept. Edges in
between are kept only if connected to strong edges.
Definition: Dilation is a morphological operation that expands the white regions (foreground) in a
binary image.
Kernel: The (7,7) matrix determines the neighborhood for expansion.
Iterations: Repeats the dilation three times.
Use Case: Thickens detected edges to fill gaps or connect components.
Definition: Erosion shrinks the white regions. It’s the inverse of dilation.
Use Case: Removes noise and reduces object size.
3
Line 20–21: Resizing Using Cubic Interpolation
resized = cv.resize(img, (500, 500), interpolation=cv.INTER_CUBIC)
cv.imshow(’resized’, resized)
Definition: Resizing to a fixed dimension (500,500) using cubic interpolation.
Cubic Interpolation: Uses 16 neighboring pixels to compute each new pixel value — produces
smooth results.
4
2.2 Complete Python Script Context
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)
img = resacleFrame(img)
cv.imshow(’cv’, img)
median = cv.medianBlur(img, 3)
cv.imshow(’Medoan’, median)
cv.waitKey(0)
Purpose: Reduces image size to 15% of original to optimize display and performance.
• frame.shape[1] = width
• frame.shape[0] = height
• cv.resize(...) scales the image using INTER AREA interpolation (ideal for shrinking).
5
Line 9: Apply Rescaling
img = resacleFrame(img)
Why scale down? Speeds up display and processing, useful in real-time applications or GUI
visualizations.
6
2.4.4 4. Bilateral Filter
bilateral = cv.bilateralFilter(img, 5, 15, 15)
cv.imshow(’Bilateral’, bilateral)
Definition: Bilateral filtering smooths images while preserving edges using both:
• Spatial distance (how close pixels are)
• 15: σcolor : Larger values mean pixels with larger intensity differences will be mixed.
• 15: σspace : Larger values mean more distant pixels will influence the blur.
Effect: Removes noise while retaining sharp edges. Best among all for edge-preserving filtering.
Use Case: Medical imaging, cartoonizing, HDR photography, skin smoothing in portraits.
Definition: Waits for a key event indefinitely to keep the image windows open.
Use Case: Prevents automatic closing of image display windows until user interaction.
7
3.2 Complete Python Script Overview
import cv2 as cv
img = cv.imread(’Photo\\Varanasi.jpg’)
img = resacleFrame(img)
cv.imshow(’Varansi’, img)
cv.waitKey(0)
8
Line 10: Display Original Image
cv.imshow(’Varansi’, img)
Definition: Grayscale images store luminance (brightness) only, reducing the image to one channel.
Mathematical Concept:
Y = 0.299R + 0.587G + 0.114B
Use Case: Ideal for edge detection, thresholding, and simplifying input for deep learning.
HSV:
• Color tracking
• Skin detection
• Lighting-invariant operations
CIE L*a*b*:
• L*: Lightness (0 is black, 100 is white)
• a*: Green-Red axis
• b*: Blue-Yellow axis
Use Case: LAB is perceptually uniform — color differences correspond to visual perception. It is
often used for:
• Image enhancement
• Histogram equalization
9
3.4.4 4. BGR to RGB
rgb = cv.cvtColor(img, cv.COLOR_BGR2RGB)
cv.imshow(’RGB’, rgb)
Definition: RGB is the standard format for most image viewers, unlike OpenCV’s BGR.
Use Case: Displaying images using matplotlib, which assumes RGB.
3.7 Conclusion
This script demonstrates the powerful and flexible color space conversion capabilities of OpenCV. Each
color model is optimized for specific tasks — choosing the right one can drastically improve the perfor-
mance of your vision system. Understanding these spaces is essential for all image processing pipelines,
especially in computer vision, robotics, and graphics applications.
10
4 Geometric Transformations in OpenCV: Translation, Rota-
tion, and Flipping
4.1 Introduction
Geometric transformations are fundamental operations in computer vision, used to alter the spatial
configuration of an image without changing its content. These include operations such as:
• Translation: Shifting the image along x and/or y axes.
• Rotation: Rotating the image about a defined point.
• Flipping: Mirroring the image across an axis.
Such transformations are essential in data augmentation, robot vision, image registration, and graph-
ical applications. This section analyzes a Python script that implements these transformations using
OpenCV.
Explanation:
• cv2 is the OpenCV module.
• numpy is used for numerical matrix operations.
• The image is read in BGR format using cv.imread() and displayed using cv.imshow().
Function Description:
11
• transMat: The transformation matrix for shifting.
• img.shape[1] is width, img.shape[0] is height.
• cv.warpAffine() applies the affine transformation.
• The image is shifted 100 pixels right and 100 pixels upward.
Use Case: Shifting images for data augmentation in machine learning, or aligning objects in robotics.
Function Breakdown:
• angle: Rotation angle in degrees (negative = clockwise).
• rotPoint: Center of rotation. Defaults to image center.
• cv.getRotationMatrix2D(): Returns a 2x3 affine rotation matrix.
12
4.5.2 Code and Modes
flip = cv.flip(img, 0) # Vertical Flip
# cv.imshow(’fliped’, flip)
• 0: Flip vertically.
• 1: Flip horizontally.
• -1: Flip both axes.
Use Case:
Purpose: Pauses the execution and keeps image windows open until a key is pressed.
4.9 Conclusion
This script demonstrates how spatial transformations such as translation, rotation, and flipping can be
implemented in OpenCV. These operations are critical for real-time applications, data preprocessing,
and robust model training. Mastering them provides a strong foundation for more advanced computer
vision workflows.
13
5 Channel Splitting and Merging in OpenCV: Theory, Code,
and Applications
5.1 Introduction
In digital image processing, color images are typically represented using multiple channels — each channel
encodes intensity values for a primary color component. In the case of the BGR format (used by
OpenCV), these components are:
• B: Blue Channel
• G: Green Channel
• R: Red Channel
Manipulating these channels independently enables various applications such as color filtering, en-
hancement, and object detection based on specific spectral properties.
This section provides an in-depth explanation of how to split and merge color channels using OpenCV,
based on the provided Python script.
img = resacleFrame(img)
cv.imshow(’Varansi’, img)
cv.waitKey(0)
14
Line 3: Reading the Image
img = cv.imread(’Photo\\Varanasi.jpg’)
Loads the image in BGR format. The image is stored as a 3-dimensional NumPy array of shape
(height, width, 3).
15
5.5.2 Green Channel Visualization
green = cv.merge([blank, g, blank])
cv.imshow(’Green’, green)
5.9 Conclusion
Channel splitting and merging is a powerful low-level operation in image processing. It provides the
flexibility to perform channel-specific transformations, filtering, and analysis. Understanding this concept
is essential for tasks in color science, machine vision, and neural network preprocessing.
16
6.2 Complete Script Overview
import cv2 as cv
import numpy as np
img = cv.imread(’Photo\\Varanasi.jpg’)
img = resacleFrame(img)
cv.imshow(’JPG’, img)
cv.waitKey(0)
Scales the image to 25% of its original dimensions for better display and processing efficiency.
17
Line 11: Creating a Blank Image
blank = np.zeros(img.shape, dtype=’uint8’)
cv.imshow(’Blank’, blank)
Purpose: An empty canvas of the same shape as the original image, used to draw contours.
Why Grayscale? Most contour detection techniques operate on single-channel images where pixel
intensities define object boundaries.
Parameters:
• gray: Source image
• 125: Threshold value
• 255: Maximum value assigned to pixels above threshold
Definition:
cv.findContours() retrieves contours from a binary image.
Returns:
• contours: List of contour points (arrays of (x, y))
• cv.RETR TREE: Retrieves all contours and builds a full hierarchy tree.
• cv.RETR EXTERNAL: Retrieves only the outermost contours.
Contour Approximation:
• cv.CHAIN APPROX NONE: Stores all contour points.
18
6.6 Drawing Contours
cv.drawContours(blank, contours, -1, (0, 0, 255), 1)
cv.imshow(’Countours’, blank)
Parameters:
• blank: Destination canvas
Explanation:
• Blurring reduces noise before edge detection.
• Canny edge detection produces binary edges.
• Contours can be found using these edges as input.
6.10 Conclusion
Contour detection is a foundational technique in image analysis. Through operations like grayscale
conversion, thresholding, and morphological processing, contours can be reliably extracted and visualized.
Understanding these functions builds a strong base for more advanced tasks in computer vision such as
shape analysis, object tracking, and real-time robotic perception.
19
7 Bitwise Operations in OpenCV: Theory, Logic, and Visual
Image Manipulation
7.1 Introduction
Bitwise operations are logical manipulations applied at the binary level between two images. Each pixel
in the resulting image is computed by applying binary logic (AND, OR, XOR, NOT) to the corresponding
pixels in the input images.
These operations are extremely useful in:
• Image masking
• Region of Interest (ROI) extraction
• Image blending
• Set-theoretic shape operations
In this section, we explore a Python script implementing all major bitwise operations using simple
geometric shapes: a rectangle and a circle.
img = cv.imread(’Photo\\Varanasi.jpg’)
img = resacleFrame(img)
binnot = cv.bitwise_not(rectangle)
cv.imshow(’Bitwise NOT’, binnot)
cv.waitKey(0)
20
• cv2: OpenCV library.
• numpy: Matrix operations.
• img: The actual image isn’t used for logic, but is read and scaled.
Downscales image for preview or context — not used in logic processing here.
Result: Two binary images (rectangle and circle) ready for bitwise operations.
Logical Operation:
Result(x, y) = Rectangle(x, y) ∧ Circle(x, y)
Interpretation: Only the intersection of the two shapes is white; rest is black.
7.6 Bitwise OR
binor = cv.bitwise_or(rectangle, circle)
cv.imshow(’Bitwise OR’, binor)
Logical Operation:
Result(x, y) = Rectangle(x, y) ∨ Circle(x, y)
Interpretation: Union of the shapes is white; only where both are black is the result black.
Logical Operation:
Result(x, y) = Rectangle(x, y) ⊕ Circle(x, y)
Interpretation: Only regions where the shapes do not overlap are white.
21
7.8 Bitwise NOT
binnot = cv.bitwise_not(rectangle)
cv.imshow(’Bitwise NOT’, binnot)
Logical Operation:
Result(x, y) = ¬Rectangle(x, y)
Interpretation: Inverts all pixels — white becomes black, black becomes white.
7.11 Conclusion
Bitwise operations are efficient, low-level operations that form the backbone of image masking, segmenta-
tion, and blending workflows. When combined with contour extraction or thresholding, these operations
unlock powerful tools for image analysis, robotics vision, and medical imaging.
img = cv.imread(’Photo\\Varanasi.jpg’)
22
img = resacleFrame(img)
cv.waitKey(0)
Resizes the image to 15% of its original dimensions using cv.INTER AREA, which is effective for
downscaling.
Explanation:
• img.shape[:2] gives (height, width) — creating a single-channel (grayscale) image.
• All pixels initialized to 0 (black).
Purpose: Acts as the canvas on which the circular mask is drawn.
Explanation:
• Draws a filled white circle (255) at the center of the image on the blank canvas.
• Radius = 100 pixels.
• The result is a binary mask with a white circle on black background.
23
Mathematical Description:
(
255 if (x − cx )2 + (y − cy )2 < r2
mask(x, y) =
0 otherwise
cv.bitwise and(...):
• First argument: input image
• Second argument: same image (bitwise operation on itself)
• mask=mask: optional mask to restrict effect
8.8 Conclusion
Masking is a vital technique in computer vision that provides spatial selectivity in image processing tasks.
Whether used for object isolation, attention-based filtering, or ROI-focused analytics, masks guide how
and where image operations are applied. This script serves as a minimal yet powerful example of how
to implement masks using NumPy arrays and OpenCV’s bitwise operations.
24
9 Image Histograms in OpenCV: A Deep Dive into Pixel Dis-
tribution and Analysis
9.1 Introduction
An image histogram is a graphical representation of the distribution of pixel intensities in a digital
image. It plots the number of pixels for each intensity value. Histograms are crucial for understanding
image contrast, brightness, dynamic range, and for preprocessing tasks like thresholding, equalization,
and segmentation.
This chapter dissects a script that computes and visualizes both grayscale and color histograms, with
and without masking, using OpenCV and matplotlib. The discussion covers every line in technical
detail and explains underlying principles with mathematical rigor.
Grayscale Conversion
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
cv.imshow(’gray’, gray)
Converts the image from 3-channel BGR to a single-channel grayscale using luminance-weighted
average:
Y = 0.299 · R + 0.587 · G + 0.114 · B
25
Grayscale Histogram Computation
gray_hist = cv.calcHist([gray], [0], mask, [256], [0, 256])
Explanation:
• First arg: List of images.
• Second arg: Channel index (0 for grayscale).
• Third arg: Binary mask.
• Fourth arg: Number of bins (256 for 8-bit image).
• Fifth arg: Intensity range (0 to 255).
Mathematical Interpretation: Let I(x, y) be pixel intensities and M (x, y) the mask:
W X
X H
H(i) = δ(I(x, y) = i) · δ(M (x, y) = 255)
x=0 y=0
26
9.5 Histogram Interpretation and Analysis
• X-axis (bins): Intensity values from 0 (black) to 255 (white).
• Y-axis: Count of pixels for each intensity.
• Peak at high intensities: Brighter image.
• Spread across range: High contrast.
• Narrow spike: Low contrast or under/overexposure.
Where:
• c ∈ {B, G, R} is the color channel.
• i ∈ [0, 255] is the bin index.
• δ(·) is the Kronecker delta (1 if true, else 0).
9.8 Conclusion
Histograms are essential for analyzing the tonal and color distribution of images. Through OpenCV’s
calcHist and Python’s matplotlib, we can visualize and interpret image data at a statistical level.
Whether working on segmentation, enhancement, or machine learning preprocessing, histograms provide
powerful insight into the underlying pixel structure of images.
This script illustrates both grayscale and color histogram construction, as well as spatially restricted
analysis using masks—offering a practical toolbox for any researcher in computer vision or digital image
processing.
27
10.2 Script Overview
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
img = cv.imread(’Photo\\Varanasi.jpg’)
Image Rescaling
def resacleFrame(frame, scale=0.25):
...
img = resacleFrame(img)
Purpose: Reduce the size of the image to 25% for faster processing and display. cv.INTER AREA is
used for downsampling.
Definition:
cv.threshold() applies a global fixed threshold:
(
′ maxVal = 255 if I(x, y) > 150
I (x, y) =
0 otherwise
Interpretation: Segments the image into two parts based on intensity — good when lighting is
uniform.
28
Definition:
(
′ 0 if I(x, y) > 150
I (x, y) =
maxVal = 255 otherwise
Use Case: Useful when the foreground is darker than the background (e.g., dark text on white
paper).
Concept:
Fixed thresholding fails under non-uniform lighting. Adaptive thresholding calculates the threshold value
for a pixel based on a small neighborhood around it.
Mathematical Formulation:
(
255 if I(x, y) > T (x, y)
I ′ (x, y) =
0 otherwise
Where T (x, y) is the mean or weighted sum of the neighboring pixel intensities in a window (block size).
Parameters Explained:
• gray: Input image.
• 255: Maximum value.
• cv.ADAPTIVE THRESH MEAN C: Uses mean of block.
• cv.THRESH BINARY: Binary threshold.
• 21: Block size (must be odd).
• 3: Constant subtracted from the mean.
Modes:
• cv.ADAPTIVE THRESH MEAN C: Mean of neighborhood.
• cv.ADAPTIVE THRESH GAUSSIAN C: Weighted Gaussian mean.
29
10.8 Conclusion
Thresholding transforms grayscale images into binary masks that are crucial for further analysis. While
fixed thresholding is computationally cheaper and sufficient under consistent lighting, adaptive thresh-
olding is significantly more robust in real-world scenarios with variable illumination. Both techniques
form the foundation for many high-level vision tasks in industry and research.
30