0% found this document useful (0 votes)
1 views8 pages

End Sem

The document discusses various techniques in object detection, image processing, and computer vision, including sliding window methods, region proposal methods, and deep learning models for face detection. It compares different algorithms like YOLO, SSD, and Faster R-CNN, and covers edge detection methods such as Sobel and Canny. Additionally, it explains image segmentation techniques, filtering, convolution, and transformations like the Hough Transform and Fourier Transform.

Uploaded by

Rashid Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views8 pages

End Sem

The document discusses various techniques in object detection, image processing, and computer vision, including sliding window methods, region proposal methods, and deep learning models for face detection. It compares different algorithms like YOLO, SSD, and Faster R-CNN, and covers edge detection methods such as Sobel and Canny. Additionally, it explains image segmentation techniques, filtering, convolution, and transformations like the Hough Transform and Fourier Transform.

Uploaded by

Rashid Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Object Detection using Sliding Window and Region Proposal

Sliding Window Technique:

 A fixed-size window slides across the image (horizontally and vertically).

 At each location, the sub-region is passed to a classifier (e.g., SVM, CNN) to determine
whether it contains the object.

 Drawbacks: High computational cost, slow, and doesn’t handle scale changes well.

Region Proposal Methods:

 Instead of exhaustive sliding, these methods propose regions likely to contain objects.

 Selective Search: Groups similar regions based on color, texture, size.

 Edge Boxes: Generates boxes based on edge information.

 These proposals are then passed to CNNs for classification and bounding box regression.

Viola-Jones for Face Detection:

 Uses Haar-like features computed with an integral image.

 A cascade of classifiers quickly eliminates non-face regions.

 Adaboost is used to select the best features.

Deep Learning Models for Face Detection:

 MTCNN, RetinaFace, and others use CNNs to detect faces at different scales.

 Provide higher accuracy and robustness to pose, illumination, and occlusion variations.

2. Comparison of YOLO, SSD, and Faster R-CNN; Harris and Shi-Tomasi Corner Detection

Feature YOLO SSD Faster R-CNN

Type Single-shot Single-shot Two-stage

Speed Very fast Fast Slower

Accuracy Moderate-high Moderate-high High

Pipeline Unified CNN Multi-scale CNN RPN + Detection

Harris Corner Detector:

 Measures intensity change in all directions.

 Uses the second-moment matrix.

 Corner response: R=det(M)−k(trace(M))2R = det(M) - k(trace(M))^2R=det(M)−k(trace(M))2

Shi-Tomasi:

 Improves Harris by using minimum eigenvalue of the matrix MMM.


 A point is a good corner if the smallest eigenvalue is above a threshold.

3. Hough Transform and Morphological Operations

Hough Transform:

 Used to detect lines and shapes (e.g., circles).

 Transforms each point to parameter space (e.g., lines in polar form: ρ=xcos⁡θ+ysin⁡θ\rho = x\
cos\theta + y\sin\thetaρ=xcosθ+ysinθ).

 Peaks in accumulator space indicate lines.

Morphological Operations:

 Applied to binary images.

 Based on structuring elements.

Erosion:

 Shrinks white regions.

 Removes small noise and separates objects.

Dilation:

 Expands white regions.

 Fills small holes.

Sobel Edge Detection:

 Uses gradient filters in X and Y directions.

 Highlights edges based on intensity change.

Canny Edge Detection:

 Steps: Gaussian blur → Gradient → Non-maximum suppression → Hysteresis thresholding.

 Produces clean and continuous edges.

4. Image Segmentation Techniques

Thresholding:

 Converts grayscale to binary using a threshold value.

Global Thresholding:

 Single threshold for the whole image.

Adaptive Thresholding:

 Threshold computed locally for different regions.


 Useful for uneven lighting.

Region-Based Segmentation:

 Groups pixels with similar properties.

 Region growing starts from a seed and includes neighboring pixels.

Opening:

 Erosion followed by dilation.

 Removes small objects.

Closing:

 Dilation followed by erosion.

 Fills small holes.

5. SIFT, SURF, Viola-Jones, and Panorama Matching

SIFT (Scale-Invariant Feature Transform):

 Detects keypoints invariant to scale, rotation.

 Steps: Scale-space extrema → Keypoint localization → Orientation → Descriptor.

SURF (Speeded-Up Robust Features):

 Faster than SIFT using integral images and box filters.

 Less accurate but computationally efficient.

Viola-Jones:

 As explained earlier, uses Haar features, Adaboost, and cascade classifiers.

Keypoint Matching in Panorama Creation:

 Detect keypoints using SIFT/SURF.

 Match descriptors between images.

 Estimate transformation (homography).

 Warp and blend images to create panorama.

6. CNN Architectures and Harris Corner Detection

YOLO:

 Predicts bounding boxes and class probabilities directly from image in one pass.

 Very fast; used in real-time applications.

Faster R-CNN:
 Region Proposal Network (RPN) suggests regions.

 These are classified and refined by the detector head.

Harris Corner Detector (Revisited):

 Computes gradient matrix for each pixel.

 Uses eigenvalues of matrix to detect corners.

7. Fundamentals of Computer Vision

Computer Vision:

 Field enabling machines to interpret and understand visual information.

 Applications: object detection, facial recognition, autonomous vehicles, etc.

Pixels:

 Smallest element of an image.

 Each pixel stores color/intensity values.

Resolution:

 Number of pixels in width × height.

 Higher resolution = more detail.

Image Representation:

 Grayscale: one value per pixel.

 RGB: three values (Red, Green, Blue).

 Stored as 2D or 3D arrays.

Image Formation:

 Through lenses projecting scene onto a sensor.

 Pinhole model, perspective projection, and lens distortions affect the image.

Brightness: Intensity level (e.g., dark vs. bright image).

Contrast: Difference between darkest and brightest regions.

Hue: Type of color (e.g., red, blue).

Saturation: Intensity or purity of color (gray = low saturation).

✅ 1. Filtering in Image Processing

➤ Definition:
Filtering is the process of modifying or enhancing an image by emphasizing or removing certain
features like noise, edges, or textures.

➤ Types of Filters:

A. Linear Filters:

Apply a linear transformation to pixel values.

 Mean Filter (Averaging): Reduces noise by replacing each pixel with the average of its
neighbors.

 Gaussian Filter: Applies a weighted average using a Gaussian kernel. Smoothens image while
preserving edges better than mean filtering.

B. Non-linear Filters:

 Median Filter: Replaces pixel value with the median of its neighborhood. Very effective in
removing salt-and-pepper noise.

➤ Example:

An image with salt-and-pepper noise can be cleaned using a median filter, which removes outliers
(black or white dots) while preserving edges.

✅ 2. Convolution in Image Processing

➤ Definition:

Convolution is a mathematical operation used to apply filters to images.

➤ How It Works:

 A kernel (filter matrix) is slid across the image.

 At each location, the sum of the element-wise product of the kernel and the overlapping
image region is computed.

 This value replaces the central pixel.

➤ Mathematical Expression:

G(x,y)=∑i=−kk∑j=−kkI(x+i,y+j)⋅K(i,j)G(x, y) = \sum_{i=-k}^{k} \sum_{j=-k}^{k} I(x+i, y+j) \cdot K(i,


j)G(x,y)=i=−k∑kj=−k∑kI(x+i,y+j)⋅K(i,j)

Where:

 III is the input image

 KKK is the kernel

 GGG is the output image

➤ Example:

Using a 3×3 sharpening kernel:


ini

CopyEdit

[ 0 -1 0

-1 5 -1

0 -1 0 ]

applied via convolution enhances edges in an image.

✅ 3. Edge Detection

Edge detection helps identify object boundaries by detecting changes in intensity.

➤ A. Sobel Edge Detection

➤ How it works:

 Applies two 3×3 kernels: one for horizontal (GxG_xGx), one for vertical (GyG_yGy) gradients.

 Combined gradient magnitude:

G=Gx2+Gy2G = \sqrt{G_x^2 + G_y^2}G=Gx2+Gy2

➤ Kernels:

text

CopyEdit

Gx = [ -1 0 1 Gy = [ -1 -2 -1

-2 0 2 0 0 0

-1 0 1 ] 1 2 1]

➤ Example:

Apply Sobel to detect roads in satellite images by emphasizing edges in horizontal and vertical
directions.

➤ B. Canny Edge Detection

Canny is a multi-stage edge detection algorithm:

1. Noise Reduction: Gaussian blur

2. Gradient Calculation: Sobel-like operation

3. Non-Maximum Suppression: Thins edges

4. Double Thresholding: Strong and weak edges

5. Edge Tracking by Hysteresis: Connects weak edges to strong ones


➤ Example:

Used in medical imaging (e.g., MRI, X-rays) to detect boundaries of tissues or bones accurately.

✅ 4. Image Transformations

➤ A. Fourier Transform (FT)

➤ Definition:

Transforms an image from spatial domain to frequency domain. Useful to analyze frequency
content.

➤ How it works:

F(u,v)=∑x∑yf(x,y)⋅e−j2π(ux/M+vy/N)F(u, v) = \sum_{x} \sum_{y} f(x, y) \cdot e^{-j2\pi(ux/M +


vy/N)}F(u,v)=x∑y∑f(x,y)⋅e−j2π(ux/M+vy/N)

➤ Use Cases:

 Filtering (e.g., low-pass to remove high-frequency noise)

 Image compression (JPEG uses DCT, a related concept)

 Pattern recognition

➤ Example:

A fingerprint image with periodic noise can be denoised by applying Fourier Transform, masking
high-frequency components, and applying Inverse Fourier Transform.

➤ B. Hough Transform

➤ Definition:

Used to detect geometric shapes (lines, circles) in images.

➤ For Lines:

A line can be expressed as:

ρ=xcos⁡θ+ysin⁡θ\rho = x\cos\theta + y\sin\thetaρ=xcosθ+ysinθ

Each edge point votes in the accumulator space for possible lines passing through it.

➤ For Circles:

Circle equation: (x−a)2+(y−b)2=r2(x - a)^2 + (y - b)^2 = r^2(x−a)2+(y−b)2=r2

➤ Example:

Used in license plate detection or lane detection in autonomous driving by detecting straight lines
on the road.

You might also like