0% found this document useful (0 votes)
20 views33 pages

Topics

The document discusses various image processing and computer vision techniques including pixel level operations, geometric transformations, composition and blending, filtering, edge detection, feature detection, feature description, neural networks and generative models. It provides details on techniques like brightness adjustment, alpha blending, Sobel operators, median filtering, Canny edge detection, Harris corner detection, local binary patterns, convolutional neural networks and variational autoencoders.

Uploaded by

LA garner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views33 pages

Topics

The document discusses various image processing and computer vision techniques including pixel level operations, geometric transformations, composition and blending, filtering, edge detection, feature detection, feature description, neural networks and generative models. It provides details on techniques like brightness adjustment, alpha blending, Sobel operators, median filtering, Canny edge detection, Harris corner detection, local binary patterns, convolutional neural networks and variational autoencoders.

Uploaded by

LA garner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Contents

Pixel level operations .................................................................................................................................................4


Point processing and transformation .....................................................................................................................4
Grey level transformations .................................................................................................................................4
Brightness/Contrast Adjustment ........................................................................................................................4
Applications ........................................................................................................................................................4
Image representation .............................................................................................................................................4
Sampling and Reconstruction .............................................................................................................................4
Aliasing................................................................................................................................................................5
Image Subsampling (Downsampling) .................................................................................................................5
Geometric transformations ........................................................................................................................................5
Image wrapping/registration..................................................................................................................................5
Image morphing .....................................................................................................................................................5
Composition & blending .............................................................................................................................................6
Alpha blending ........................................................................................................................................................6
Image pyramids ......................................................................................................................................................6
Multi-Resolution blending ......................................................................................................................................6
Image gradients ..........................................................................................................................................................6
Partial derivatives ...................................................................................................................................................6
Gradient ..................................................................................................................................................................7
Sobel & Prewitt operators ......................................................................................................................................7
Filtering (general concept) .........................................................................................................................................7
Linear filtering (Convolution with kernels) .............................................................................................................7
Properties ...........................................................................................................................................................7
Mean/Box Filter (moving average) .....................................................................................................................8
Gaussian Filter ....................................................................................................................................................8
Non-Linear Filtering ................................................................................................................................................8
Median filter .......................................................................................................................................................8
Cross-correlation ....................................................................................................................................................8
Edge detection............................................................................................................................................................9
Canny edge detector ..............................................................................................................................................9
Laplacian of Gaussian (LoG)....................................................................................................................................9
Feature detection .......................................................................................................................................................9
Corner detection ....................................................................................................................................................9

1
Harris corner detector ........................................................................................................................................9
FAST (Features from Accelerated Segment Test) ...............................................................................................9
Blob detection ........................................................................................................................................................9
Methods .......................................................................................................................................................... 10
Boundary detection using Hough transform ....................................................................................................... 10
Preprocessing Edge Images ............................................................................................................................. 10
Edge Tracking Methods ................................................................................................................................... 10
Fitting Lines and Curves to Edges .................................................................................................................... 11
The Hough Transform ...................................................................................................................................... 12
Parameterization problem .............................................................................................................................. 13
Feature description ................................................................................................................................................. 14
Local Binary Patterns (LBP) .................................................................................................................................. 14
Histogram of Oriented Gradients (HOG) ............................................................................................................. 14
Neural networks (general concept) ......................................................................................................................... 14
Feedforward networks ........................................................................................................................................ 14
Backpropagation.................................................................................................................................................. 15
Activation functions............................................................................................................................................. 15
Convolutional neural networks (CNNs) ................................................................................................................... 15
Architecture ......................................................................................................................................................... 15
Feature learning .................................................................................................................................................. 15
Popular architectures .......................................................................................................................................... 15
Generative models .................................................................................................................................................. 17
Variational autoencoders (VAEs) ......................................................................................................................... 17
Encoder ............................................................................................................................................................ 17
Latent Space .................................................................................................................................................... 17
Decoder ........................................................................................................................................................... 17
Training Process............................................................................................................................................... 18
Key Advantages of VAEs .................................................................................................................................. 18
Generative adversarial networks (GANs) ............................................................................................................ 18
Generator (G) .................................................................................................................................................. 18
Discriminator (D) ............................................................................................................................................. 19
Loss Functions ................................................................................................................................................. 19
Training Dynamics ........................................................................................................................................... 19
Image segmentation ................................................................................................................................................ 20
Thresholding-based ............................................................................................................................................. 20

2
Global Thresholding......................................................................................................................................... 20
Adaptive Thresholding..................................................................................................................................... 20
Multiple Thresholds (Intensity Range Thresholding) ...................................................................................... 21
Histogram ........................................................................................................................................................ 21
Non-Uniform Illumination ............................................................................................................................... 21
Region-based ....................................................................................................................................................... 21
Region Growing ............................................................................................................................................... 22
Region Splitting................................................................................................................................................ 22
Region Merging ............................................................................................................................................... 22
Watershed Segmentation Algorithm............................................................................................................... 23
Semantic segmentation ....................................................................................................................................... 23
Instance segmentation ........................................................................................................................................ 23
Object detection ...................................................................................................................................................... 23
Sliding window..................................................................................................................................................... 23
R-CNN family (R-CNN, Fast R-CNN...) .................................................................................................................. 23
Single-shot detectors (e.g YOLO) ......................................................................................................................... 24
Image classification ................................................................................................................................................. 24
Basic classifiers (e.g SVM, Random forest,...) ...................................................................................................... 24
Deep learning classifiers (e.g CNNs) .................................................................................................................... 24
Morphological image processing............................................................................................................................. 24
Erosion ................................................................................................................................................................. 26
Dilation ................................................................................................................................................................ 27
Compound operations ......................................................................................................................................... 28
Opening ........................................................................................................................................................... 29
Closing ............................................................................................................................................................. 30
Hit and miss transform ........................................................................................................................................ 31
Morphological filtering ........................................................................................................................................ 31
Greyscale morphology......................................................................................................................................... 32
Erosion ............................................................................................................................................................. 32
Dilation ............................................................................................................................................................ 32
Opening ........................................................................................................................................................... 32
Closing ............................................................................................................................................................. 32
Important Concepts......................................................................................................................................... 33
Applications in Computer Vision ..................................................................................................................... 33

3
Foundation
Pixel level operations
Point processing and transformation
Modifies individual pixel values independently

Grey level transformations


Modify pixel intensity values

Types:

 Negative: Inverts the image.


 Log/Inverse Log: Adjusts contrast, especially for images with high dynamic range.
 nth Power/Root: Gamma correction, adjusts brightness.
 Identity: No change to the image.

Brightness/Contrast Adjustment
Linear transformations of pixel intensities.

 Contrast stretching: Expands the range of intensity levels to improve image contrast.
 Power-Law transformations: Generalization of contrast stretching, allowing fine-tuning of image
appearance.
 Histogram equalization: Redistributes pixel values to enhance contrast based on the image's intensity
distribution.

Applications
 Image enhancement
 Basic color correction
 Color transfer: Applies the color characteristics of one image to another.

Image representation
Images can be mathematically represented as a function from R^2 (2D space) to R (intensity/color).

Sampling and Reconstruction


Continuous functions (like images) are represented by discrete samples.

Reconstruction aims to "fill in the gaps" between samples.

4
Aliasing
Undersampling: Taking too few samples can lead to aliasing, where high-frequency information is lost or
misinterpreted as lower frequencies.

In images, aliasing can cause visual artifacts like moiré patterns.

Anti-Aliasing: Combating aliasing by increasing the sampling rate or smoothing the image before sampling (low-
pass filtering).

Image Subsampling (Downsampling)


Reduces image size by removing rows and columns.

To avoid aliasing, filter the image first (usually with a low-pass filter) before downsampling.

Geometric transformations
Image wrapping/registration
Purpose: Aligns images, corrects distortions, or matches images from different sources.

Techniques:

Affine Transformations: Scaling, rotation, translation, shearing.

Projective Transformations: More complex warps that maintain straight lines.

Applications: Medical image alignment, panorama stitching, image stabilization.

Image morphing
Purpose: Smoothly transitions between two images.

Techniques:

Affine Morphing: Interpolates corresponding points based on affine transformations.

Non-Rigid Morphing: More complex transformations using control points or mesh warping.

5
Applications: Special effects, animation, facial expression analysis.

Composition & blending


Alpha blending
Purpose: Combines multiple images with transparency.

Method: Pixel values are weighted by their alpha channel (transparency level).

Formula: Blended Pixel = (Alpha * Foreground Pixel) + ((1 - Alpha) * Background Pixel)

Applications: Image overlays, transparency effects, compositing.

Image pyramids
Purpose: Multi-scale representation of images.

Gaussian Pyramids: Created by repeatedly blurring and downsampling. Used for blending, noise removal, and
coarse-to-fine processing.

Laplacian Pyramids: Created by subtracting Gaussian levels. Capture detail at different scales, used for edge
detection, blending, and image reconstruction.

Multi-Resolution blending
Purpose: Seamlessly blend images with different resolutions or content.

Technique: Uses Gaussian/Laplacian pyramids to blend images at different frequency bands.

Applications: Creating panoramas, merging images with smooth transitions.

Feature extraction and analysis


Image gradients
Calculate the direction and magnitude of intensity changes in an image.

Partial derivatives
Represent the rate of change in the horizontal (x) and vertical (y) directions.

6
Gradient
A vector combining the partial derivatives, indicating the direction of the steepest ascent (strongest edge
direction).

Sobel & Prewitt operators


Convolution kernels that approximate partial derivatives for efficient gradient computation.

Filtering (general concept)


Modifies images to enhance certain features or remove noise.

Linear filtering (Convolution with kernels)


A technique to modify signals (images, audio) by applying a linear operation to local neighborhoods of
pixels/samples.

Mechanism: Slides a kernel (small matrix of numbers) over the image, computing a weighted sum of pixel values
at each position.

Written

Properties

Associativity: equivalent to

Linearity: filter(f + g) = filter(f) + filter(g)

7
Shift Invariance: Filter's effect is the same regardless of input position.

Mean/Box Filter (moving average)


A basic filter that averages pixel values over a sliding window,
smoothing the image.

Gaussian Filter
A smoothing filter with a bell-
shaped kernel, smooths the image with less blurring than a mean
filter

Removes high-frequency components (low-pass filter).

Convolving a Gaussian with itself results in another Gaussian.

Non-Linear Filtering
Mechanism: Pixel values are modified based on complex operations, not just weighted sums.

Median filter
Replaces each pixel with the median value of its neighborhood, excellent for removing salt-and-pepper noise.

Cross-correlation
A measure of similarity between two signals or image regions.

A convolution operation is a cross-correlation where the filter is flipped both horizontally and vertically before
being applied to the image.

Sliding Dot Product: The kernel slides over the image, calculating the dot product at each position.

Uses: Template matching, object detection, motion estimation.

8
Edge detection
Locates significant changes in intensity (edges) within an image.

Canny edge detector


Steps: Noise reduction, gradient calculation, non-maximum suppression, hysteresis thresholding.

Advantages: Good localization, low error rate, single response to edges.

Laplacian of Gaussian (LoG)


Steps: Gaussian smoothing followed by Laplacian filtering (second derivative).

Advantages: Multi-scale edge detection, less sensitive to noise than Laplacian alone.

Feature detection
Finds distinct points or regions in an image.

Corner detection
Identifies interest points in an image where edges meet at significant angles.

Applications: Object recognition, image registration, camera calibration, tracking, 3D reconstruction.

Harris corner detector


Finds corners using eigenvalue analysis of local image patches.

Measures the intensity variation in a local window when shifted in different directions.

Corners exhibit strong variation in all directions.

Robust to rotation, but not scale-invariant.

FAST (Features from Accelerated Segment Test)


Checks intensity differences along a circle around a candidate pixel.

Very fast, suitable for real-time applications.

Blob detection
Identifies regions with similar intensity values; Detects regions in an image that are brighter or darker than their
surroundings.

9
Applications: Object detection, tracking, medical image analysis.

Methods
Laplacian of Gaussian (LoG): Finds blobs by identifying zero-crossings of the second derivative.

Difference of Gaussians (DoG): Approximates LoG by subtracting two Gaussian-filtered images with different
scales.

Determinant of Hessian (DoH): Finds blobs based on the intensity changes in multiple directions.

Boundary detection using Hough transform


A technique used to detect lines, curves, and objects in an image using the concept of parameter space.

Preprocessing Edge Images


Goal: Prepare the edge image for boundary detection by improving edge quality and reducing noise.

Techniques:

Thresholding: Convert grayscale edge image to binary by selecting a threshold value. This separates strong edges
from weak ones.

Thinning (Skeletonization): Reduce thick edges to single-pixel width for more accurate boundary representation.

Noise Reduction: Apply filters like Gaussian blur to smooth out noise while preserving edge structure.

Morphological Operations: Dilation and erosion can be used to connect broken edges or remove small gaps.

Edge Tracking Methods


Goal: Connect individual edge pixels into continuous contours or boundaries.

Local Tracking: Search for STRONG EDGES along normals to approximate boundary; fit curve (eg., polynomials)
to strong edges.

 Hysteresis Thresholding: Used in Canny edge detection, connects strong edges and includes weaker
ones if they are linked to strong edges.
 Edge Linking: Follows a connected path of edge pixels based on local gradient direction and magnitude.

Global Tracking:

 Graph-Based Methods: Construct a graph where nodes represent edge pixels and edges represent
connections. Find optimal paths in the graph to extract boundaries.
 Challenges: Dealing with noisy edges, gaps in contours, and junctions where multiple edges meet.

10
Fitting Lines and Curves to Edges
Goal: Represent detected edges with mathematical models (lines, curves) for further analysis and interpretation.

Line Fitting:

 Least Squares: Find the line that minimizes the sum of squared distances to the edge points.

 RANSAC (Random Sample Consensus): Robust method that handles outliers by repeatedly fitting lines to
randomly sampled subsets of points.

Curve Fitting:

 Polynomial Fitting: Represent curves using polynomial functions of varying degrees.


 Splines: Create smooth curves by connecting piecewise polynomial segments.

11
Applications: Finding object boundaries, lane detection in autonomous vehicles, shape analysis.

The Hough Transform


Goal: Robustly detect specific shapes (lines, circles, ellipses) in images. Edges VOTE for the possible model

Parameter Space: Transforms the image space (x, y) into a parameter space (e.g., m, c for lines, or center and
radius for circles).

Accumulator: Each edge point in the image space votes for possible parameters in the parameter space. Peaks in
the accumulator indicate the most likely parameters of the shapes.

Advantages:

 Robust to noise and occlusions, can detect multiple instances of a shape.


 Edges need not be connected
 Complete object need not be visible

Limitations: Computational complexity increases with the number of parameters, less effective for complex
shapes.

Variants: Generalized Hough Transform (GHT) for detecting arbitrary shapes.

12
Parameterization problem
The Hough Transform involves transforming
points from the image space (x, y
coordinates) into a parameter space (often
using parameters like slope 'm' and y-
intercept 'b' for lines).

The choice of parameterization is crucial. In


this case, the slide implies that the standard
slope-intercept form (y = mx + b) is
problematic for the specific line
arrangement shown in the image.

The image demonstrates a vertical line


where calculating the slope 'm' would be
problematic (since it would be undefined or
infinite).

The slide suggests using a different parameterization to avoid the issue with undefined slopes.

The recommended alternative is the same one used for computing the minimum moment of inertia. This likely
refers to using the normal form of a line equation:

ρ = x * cos(θ) + y * sin(θ)

where:

 ρ (rho) is the distance of the line from the origin.


 θ (theta) is the angle the line's normal makes with the positive x-axis.

This parameterization works well for all line orientations, including vertical lines.

13
The error function 'E' measures how well a potential line fits the data points. The slide emphasizes the
importance of formulating this error carefully.

The provided equation for 'E' seems to calculate the mean squared error between the points and the line, based
on the alternative parameterization.

Feature description
Creates a numerical representation (descriptor) of each detected feature for matching or comparison.

Local Binary Patterns (LBP)


Encodes local texture patterns as a binary string.

Histogram of Oriented Gradients (HOG)


Describes local shape by the distribution of gradient orientations.

Machine learning methods


Neural networks (general concept)
Feedforward networks
Information flows only in one direction (input to output).

14
Backpropagation
Algorithm for training neural networks by adjusting weights based on error signals.

Activation functions
Introduce non-linearity to allow complex decision boundaries.

Convolutional neural networks (CNNs)


Architecture
Convolutional Layers: Learn hierarchical features using filters.

Pooling Layers: Reduce dimensionality and provide spatial invariance.

Feature learning
Learns features directly from data, eliminating manual feature engineering.

Popular architectures
AlexNet (2012) Deeper and wider than LeNet-5, with 5 Won the ImageNet Large Scale Visual
convolutional layers, 3 pooling layers, and 3 Recognition Challenge (ILSVRC) 2012,
fully connected layers. It also introduced the significantly outperforming traditional
use of ReLU activation functions for faster methods.
training. Popularized the use of deep CNNs for image
classification.
Showed the importance of using GPUs for
training large models.
VGGNet (2014) Known for its simplicity, VGGNet features a Showed that increasing depth can
consistent architecture with small 3x3 significantly improve performance.
convolutional filters and max-pooling layers. Demonstrated the effectiveness of using very
Different configurations (VGG16, VGG19) small filters (3x3) throughout the network.
vary in depth.
InceptionNet Introduced the concept of inception modules, Improved computational efficiency compared
(GoogLeNet) which are parallel branches with different to simply increasing depth or width.
(2014) filter sizes. This allows the network to learn Achieved state-of-the-art performance on
features at multiple scales simultaneously. ImageNet.
Inception modules have been incorporated
into other CNN architectures.
ResNet (2015) Introduced residual connections, where the Solved the vanishing gradient problem,
output of a layer is added to the input of a enabling the training of very deep networks.

15
later layer. This allows for training much Achieved state-of-the-art performance on
deeper networks (e.g., ResNet50, ResNet101, ImageNet and other tasks.
ResNet152). Residual connections have become a
standard building block for many CNN
architectures.
DenseNet Dense blocks, where each layer is connected Further improved the vanishing gradient
(2016) to all subsequent layers in the block. This problem.
creates direct paths for gradient flow and Achieved state-of-the-art performance on
strengthens feature propagation. several benchmarks.
Showed that dense connections can be
beneficial for feature reuse.
MobileNet Designed for mobile and embedded vision Demonstrates the effectiveness of depthwise
(2017) applications, MobileNet uses depthwise separable convolutions for lightweight
separable convolutions, which dramatically models.
reduce the number of parameters and Achieves a good balance between accuracy
computations compared to traditional and efficiency, making it suitable for mobile
convolutions. devices.

EfficientNet A family of models that scales network width, Provides a systematic approach for scaling
(2019) depth, and resolution in a balanced way to CNNs to achieve better accuracy and
achieve optimal efficiency. efficiency.
EfficientNets achieve state-of-the-art
performance with fewer parameters and
computations than other models.
RegNet (2020) A simple, regular network structure that Challenges the notion that complex network
achieves competitive results with a reduced design is always necessary for high
design space. performance.
Shows that well-designed regular
architectures can be surprisingly effective.
Vision A transformer-based architecture originally Introduces a new paradigm for image
Transformer designed for natural language processing recognition based on transformers.
(ViT) (2020) (NLP), adapted for computer vision. It divides Achieves state-of-the-art performance on
the image into patches, treats them as ImageNet and other tasks.
tokens, and uses self-attention mechanisms Opens up new possibilities for combining
to learn relationships between patches. vision and language models.

Swin Extends the Vision Transformer by Achieved state-of-the-art performance on


Transformer introducing hierarchical representation various vision tasks, including image
(2021) learning through shifted windows. This allows classification, object detection, and semantic
for efficient modeling of long-range segmentation.
dependencies in images. Furthered the integration of Transformer
architectures into computer vision.

16
ConvNeXt A modern reinterpretation of the ResNet Shows that with modern training techniques
(2022) architecture, focusing on simple, modular and careful architectural choices, a simple
design principles. model can rival more complex ones.
Emphasizes the importance of empirical
evaluation and thorough ablation studies.

Generative models
Learn the underlying distribution of data to generate new samples.

Variational autoencoders (VAEs)


Learn a latent representation of data using encoder and decoder networks.

Encoder
Purpose: Takes an input (e.g., an image) and compresses it into a lower-dimensional representation called the
latent space.

Mechanism: Typically consists of neural network layers (e.g., convolutional layers for images) that progressively
reduce the dimensionality of the data.

Output: Instead of a single point in the latent space, the encoder outputs parameters of a probability
distribution (usually a Gaussian distribution). This distribution represents the encoder's belief about the most
likely values in the latent space that could have generated the input.

Latent Space
Purpose: A compressed representation of the input data that captures its essential features or patterns.

Representation: Each point in the latent space corresponds to a different possible configuration of the input
data.

Distribution: The encoder outputs a probability distribution over the latent space, capturing the uncertainty
about the exact representation.

Decoder
Purpose: Takes a sample from the latent space distribution and reconstructs it into an output that resembles the
original input.

Mechanism: Typically consists of neural network layers (e.g., transpose convolutional layers for images) that
progressively increase the dimensionality of the data.

17
Output: A reconstruction of the input data.

Training Process
1. Encoding: The input data is passed through the encoder, producing a distribution in the latent space.
2. Sampling: A sample is drawn from the latent space distribution.
3. Decoding: The sample is passed through the decoder to generate a reconstruction of the input data.
4. Loss Calculation: The loss function combines two terms:
 Reconstruction Loss: Measures how well the decoder's output matches the original input.
 KL Divergence: Measures the difference between the encoder's output distribution and a standard
normal distribution. This term regularizes the latent space and encourages it to follow a simple, well-
behaved distribution.
5. Backpropagation: The gradients of the loss function are calculated and used to update the weights of
both the encoder and decoder through backpropagation.

Key Advantages of VAEs


Generative Capabilities: VAEs can generate new samples by sampling from the latent space distribution and
decoding them.

Continuous Latent Space: The latent space is continuous, allowing smooth interpolation between points and
generation of new samples with smooth variations.

Regularization: The KL divergence term helps prevent overfitting and encourages a meaningful latent space
representation.

Generative adversarial networks (GANs)


Two networks (generator and discriminator) compete to create realistic fake samples.

Generator (G)
Purpose: The generator is a neural network designed to create synthetic data that resembles real data. It takes
random noise as input and transforms it into increasingly realistic samples through a series of layers and
operations.

Architecture: The generator's architecture depends on the type of data it's generating. For images, it typically
uses transpose convolutions (also known as deconvolutions) to upsample the noise and produce images. For
other data types, it might use recurrent neural networks (RNNs) or other specialized architectures.

Training: The generator is trained to fool the discriminator. It starts by generating poor quality samples, but
gradually improves as it learns to produce more convincing fakes that the discriminator struggles to distinguish
from real data.

18
Discriminator (D)
Purpose: The discriminator is another neural network that acts as a binary classifier. It takes either real data or
synthetic data from the generator and tries to classify it as real or fake.

Architecture: The discriminator's architecture also depends on the data type. For images, it usually uses
convolutional layers to extract features and make predictions about whether the image is real or generated.

Training: The discriminator is trained to correctly classify real data as real and generated data as fake. It learns
to identify the subtle differences between real and synthetic samples, providing feedback to the generator.

Loss Functions
Purpose: Loss functions quantify how well the generator and discriminator are performing during training. The
goal is to minimize the generator's loss and maximize the discriminator's loss.

Types:

Adversarial Loss: The most common loss function for GANs.

For the discriminator, it's usually based on binary cross-entropy loss, encouraging it to make correct predictions.

For the generator, it's often the inverse of the discriminator's loss, encouraging it to generate samples that fool
the discriminator.

Additional Losses: Depending on the application, other losses might be used to improve the quality of the
generated samples. These could include:

Feature Matching Loss: Encourages the generator to produce samples with similar feature distributions as real
data.

Perceptual Loss: Encourages the generator to produce samples that are perceptually similar to real data, even if
they don't match pixel-for-pixel.

Training Dynamics
The training of a GAN is a dynamic, adversarial process. The generator and discriminator are trained
simultaneously in a minimax game. The generator tries to minimize its loss (fool the discriminator), while the
discriminator tries to maximize its loss (correctly classify real and fake samples). This back-and-forth competition
drives both networks to improve.

Applications

19
Image segmentation
Divides images into meaningful regions (objects, foreground/background).

Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact
representation.

Applications: Finding tumors, veins, etc. in medical images, finding targets in satellite/aerial images, finding
people in surveillance images, summarizing video, etc.

Thresholding-based
Partition an image into distinct regions (foreground/background) based on pixel intensity.

Pixels above the threshold are typically assigned to the foreground, while those below are assigned to the
background.

Global Thresholding
A single threshold value is used for the entire image.

Suitable when:

Image has a clear bimodal histogram (two distinct peaks for foreground and background).

Lighting conditions are uniform across the image.

Limitations:

Fails when lighting conditions vary significantly across the image.

Struggles with images that don't have distinct intensity distributions.

Adaptive Thresholding
Threshold value varies across different image regions.

Types:

 Local Thresholding: A threshold value calculated for a specific region or neighborhood within an image.
 Global Thresholding with Adaptive Window Size: Window size for calculating the threshold adapts to
local image characteristics.

Suitable when:

Image has non-uniform illumination.

Foreground and background intensities vary across the image.

Advantages:

Handles variations in lighting conditions.

20
Adapts to local image characteristics.

Provides more accurate segmentation results in many cases.

Multiple Thresholds (Intensity Range Thresholding)


Uses multiple thresholds to segment an image into more than two regions.

Suitable when:

You need to identify objects with specific intensity ranges.

You want to segment an image into multiple classes.

Histogram
A graphical representation of the distribution of pixel intensities in an image.

Helps in:

Choosing an appropriate threshold value.

Determining if global or adaptive thresholding is suitable.

Identifying the optimal thresholds for multiple thresholding.

Non-Uniform Illumination
Occurs when the lighting conditions vary across different parts of the image.

Makes global thresholding challenging.

Solutions:

Adaptive thresholding (adjusts threshold locally).

Preprocessing techniques like illumination correction or normalization.

Region-based
Group pixels into meaningful regions based on shared properties (e.g., intensity, color, texture).

Unlike thresholding, which relies solely on pixel intensity, region-based methods consider spatial relationships
and similarity between neighboring pixels.

21
Region Growing
Process:

Starts with seed points (pixels selected as representative of different regions).

Iteratively expands each region by adding neighboring pixels that are similar to the region's properties.

Growth continues until no more similar pixels can be added.

Advantages:

Simple and intuitive.

Can adapt to variations within regions.

Limitations:

Sensitive to the choice of seed points and similarity criteria.

Can be computationally expensive.

Region Splitting
Process:

Starts with the entire image as one region.

Recursively divides the region into smaller ones based on a homogeneity criterion (e.g., intensity variance within
the region).

Splitting continues until all sub-regions are homogeneous.

Advantages:

Can handle complex image structures.

Doesn't require seed points.

Limitations:

Can over-segment the image (create too many small regions).

Requires careful choice of homogeneity criterion.

Region Merging
Process:

Starts with an over-segmented image (e.g., result of region splitting).

Iteratively merges adjacent regions based on their similarity.

Merging continues until no more similar regions can be merged.

22
Advantages:

Can correct over-segmentation caused by other methods.

Combines the advantages of region growing and splitting.

Limitations:

Requires an initial over-segmentation.

Can be sensitive to the similarity criterion.

Watershed Segmentation Algorithm


Analogy: Imagine the image as a topographic map, where intensity represents elevation. Rainwater flows
downhill and accumulates in basins (regions).

Process:

Finds "catchment basins" (regions) and "watershed lines" (boundaries between regions).

Can be sensitive to noise and over-segment the image.

Often used with markers (seed points) to control the segmentation.

Advantages:

Good for separating touching or overlapping objects.

Can be used with other segmentation methods (e.g., region growing) as a post-processing step.

Limitations:

Can be sensitive to noise and over-segment the image.

Semantic segmentation
K-means clustering

Instance segmentation
dddd

Object detection
Sliding window

R-CNN family (R-CNN, Fast R-CNN...)

23
Single-shot detectors (e.g YOLO)

Image classification
Assigns a label (e.g., cat, dog) to an entire image.

Basic classifiers (e.g SVM, Random forest,...)


Support Vector Machines (SVM): Find the optimal hyperplane separating classes.

Random Forest: Ensemble of decision trees, robust to overfitting.

k-Nearest Neighbors (k-NN): Classifies based on the majority class of the k closest training samples.

Naive Bayes: Simple probabilistic classifier based on Bayes' theorem with strong independence assumptions.

Deep learning classifiers (e.g CNNs)


Convolutional Neural Networks (CNNs): Learn hierarchical features from images automatically. Powerful but
computationally expensive.

Advantages: State-of-the-art performance on many tasks, automatic feature learning.

Disadvantages: Requires large labeled datasets, computationally expensive to train.

Morphological image processing

Morphological image processing is a collection of non-linear operations related to the shape or morphology of
features in an image. Morphological operations rely only on the relative ordering of pixel values, not on their
numerical values, and therefore are especially suited to the processing of binary images. Morphological
operations can also be applied to greyscale images such that their light transfer functions are unknown and
therefore their absolute pixel values are of no or minor interest.

24
Morphological techniques probe an image with a small shape or template called a structuring element. The
structuring element is positioned at all possible locations in the image and it is compared with the corresponding
neighbourhood of pixels. Some operations test whether the element "fits" within the neighbourhood, while
others test whether it "hits" or intersects the neighbourhood:

A morphological operation on a binary image creates a new binary image in which the pixel has a non-zero value
only if the test is successful at that location in the input image.

The structuring element is a small binary image, i.e. a small matrix of pixels, each with a value of zero or one:

 The matrix dimensions specify the size of the structuring element.


 The pattern of ones and zeros specifies the shape of the structuring element.
 An origin of the structuring element is usually one of its pixels, although generally the origin can be
outside the structuring element.

A common practice is to have odd dimensions of the structuring matrix and the origin defined as the centre of
the matrix. Stucturing elements play in moprphological image processing the same role as convolution kernels in
linear image filtering.

When a structuring element is placed in a binary image, each of its pixels is associated with the corresponding
pixel of the neighbourhood under the structuring element. The structuring element is said to fit the image if, for
each of its pixels set to 1, the corresponding image pixel is also 1. Similarly, a structuring element is said to hit, or
intersect, an image if, at least for one of its pixels set to 1 the corresponding image pixel is also 1.

25
Erosion

26
Dilation

27
Compound operations

28
Opening

Opening generally smoothes the contour of an object, breaks narrow isthmuses, and eliminates thin protrusions.

29
Closing

Closing tends to smooth sections of contours but it generates fuses narrow breaks and long thin gulfs, eliminates
small holes, and fills gaps in the contour.

30
Hit and miss transform

Morphological filtering
Morphological filtering of a binary image is conducted by considering compound operations like opening and
closing as filters. They may act as filters of shape. For example, opening with a disc structuring element smooths
corners from the inside, and closing with a disc smooths corners from the outside. But also these operations can
filter out from an image any details that are smaller in size than the structuring element, e.g. opening is filtering
the binary image at a scale defined by the size of the structuring element. Only those portions of the image that
fit the structuring element are passed by the filter; smaller structures are blocked and excluded from the output
image. The size of the structuring element is most important to eliminate noisy details but not to damage
objects of interest.

31
Greyscale morphology
Extends morphological operations (originally designed for binary images) to grayscale images.

Modifies image structure based on intensity values, using a structuring element (kernel) to probe the image.

Erosion
Effect: Shrinks bright regions, removes small bright details, breaks narrow connections.

Implementation: For each pixel, replaces its value with the minimum value found within the structuring
element's neighborhood.

Applications:

 Removing noise
 Separating objects
 Finding intensity valleys

Dilation
Effect: Expands bright regions, fills small holes, connects nearby objects.

Implementation: For each pixel, replaces its value with the maximum value found within the structuring
element's neighborhood.

Applications:

 Closing gaps
 Connecting broken lines
 Finding intensity peaks

Opening
Sequence: Erosion followed by dilation.

Effect: Removes small bright objects while preserving the overall shape of larger bright regions.

Applications:

Smoothing object outlines

Filtering out noise

Closing
Sequence: Dilation followed by erosion.

32
Effect: Fills small holes and gaps while preserving the overall shape of larger bright regions.

Applications:

 Filling small cavities


 Connecting nearby objects

Important Concepts
Structuring Element (SE): A small binary or grayscale image that defines the neighborhood of pixels considered
during morphological operations. The shape and size of the SE affect the results.

Duality: Erosion and dilation are dual operations. The dilation of an image is the same as inverting the image,
performing erosion, and inverting again. This property simplifies implementation.

Granulometry (Sieving): A series of openings with increasing SE sizes can be used to analyze the size distribution
of objects in an image.

Applications in Computer Vision


Preprocessing: Removing noise, smoothing object boundaries, enhancing contrast.

Segmentation: Watershed segmentation often uses morphological operations as a preprocessing step.

Feature Extraction: Identifying specific shapes or textures.

Object Recognition: Can be used in combination with other techniques to identify objects based on their shape
or structure.

33

You might also like