Lec 2 (Image Segemnation)
Lec 2 (Image Segemnation)
Lec 2
Image Segmentation
Image segmentation is a fundamental technique in
digital image processing and computer vision.
It involves partitioning a digital image into multiple
segments (regions or objects) to simplify and analyze an
image by separating it into meaningful components,
Which makes the image processing more efficient by
focusing on specific regions of interest.
A typical image segmentation task goes through the
following steps:
Image 1. Groups pixels in an image based on shared
segmentation characteristics like colour, intensity, or texture.
Why do we
need Image It's like separating ingredients in a dish.
Segmentatio
n?
By isolating objects (things) and
backgrounds (stuff), image analysis
becomes more efficient and accurate.
Panoptic Segmentation
• Combines semantic + instance
segmentation (labels "stuff" + "things").
4. Interactive
Segmentation
Types of • User-guided (e.g.,
Image clicks/scribbles refine
Segmentat masks).
ion
5. Real-Time
Segmentation
• Optimized for speed (e.g.,
autonomous driving).
In semantic image segmentation, we categorize
image pixels based on their semantic meaning,
not just their visual properties. This classification
system often uses two main categories: Things
and Stuff.
•Things: Things refer, to countable objects or
distinct entities in an image with clear
boundaries, like people, flowers, cars, animals
Semantic etc. So, the segmentation of "Things" aims to
label individual pixels in the image to specific
image classes by delineating the boundaries of
individual objects within the image
segmentation •Stuff: Stuff refers to specific regions or areas in
an image different elements in an image like
background or repeating patterns of similar
materials which can not be counted like road, sky
and grass which may not have clear boundaries
but play a crucial role in understanding the
overall context in an image. The segmentation of
"Stuff" involves grouping of pixels in an image
into clearly identifiable regions based on the
common properties like colour, texture or
context.
• Instance segmentation inverts the priorities of
semantic segmentation: whereas semantic
segmentation algorithms predict only semantic
classification of each pixel (with no regard for
individual instances), instance segmentation
delineates the exact shape of each separate
object instance.
Computational Complexity
Some segmentation techniques, particularly
those based on deep learning models, require
substantial computational resources and
processing time. Ensuring efficient and
real-time segmentation in resource-constrained
environments is an ongoing challenge.
Traditional Segmentation techniques
Clustering based
method
One of the most commonly used clustering
algorithms is k-means. Here, the k
represents the number of clusters (not to be
confused with k-nearest neighbor). Let’s
understand how k-means works:
1. First, randomly select k initial clusters
2. Randomly assign each data point to any
Clustering one of the k clusters
3. Calculate the centers of these clusters
based 4. Calculate the distance of all the points
segmentation from the center of each cluster
5. Depending on this distance, the points
are reassigned to the nearest cluster
6. Calculate the center of the newly
formed clusters
7. Finally, repeat steps (4), (5) and (6) until
either the center of the clusters does
not change or we reach the set number
of iterations
One of the most commonly used clustering
algorithms is k-means. Here, the k
represents the number of clusters (not to be
confused with k-nearest neighbor). Let’s
understand how k-means works:
1. First, randomly select k initial clusters
2. Randomly assign each data point to any
Clustering one of the k clusters
3. Calculate the centers of these clusters
based 4. Calculate the distance of all the points
segmentation from the center of each cluster
5. Depending on this distance, the points
are reassigned to the nearest cluster
6. Calculate the center of the newly
formed clusters
7. Finally, repeat steps (4), (5) and (6) until
either the center of the clusters does
not change or we reach the set number
of iterations
Semantic segmentation
Difference
between
SS and IS
U-Net is a convolutional neural network (CNN)
architecture that was specifically designed for
biomedical image segmentation tasks.
Developed in 2015, U-Net has become one of
the go-to architectures for various
segmentation tasks due to its effectiveness
and efficiency. You can find the original pape
The U-Net architecture is characterized by
its U-shaped structure, which gives it its
UNet name. It consists of an encoding path and a
decoding path.
UNet
architecture Expansive Path (Decoder):
•The expansive path, or decoder, involves upsampling
the feature maps to gradually recover the spatial
information lost during the downsampling stage. This
path typically includes transposed convolutions or
upsampling layers.
Feature Hierarchies:
• Convolutions extract localized features (edges,
textures, patterns) at the current resolution.
• Stacking multiple convolutions (e.g., 3×3
Why kernels) allows the network to
Convolutio capture increasingly complex features (e.g.,
from edges to object parts).
ns Before
Pooling? Non-Linearity (ReLU):
• ReLU activations introduce non-linearity,
enabling the network to model complex
relationships.
Channel Expansion:
• Convolutions often increase the number of
channels (feature maps) while reducing spatial
dimensions, balancing the trade-off between
resolution and depth.
Why Max- •Spatial Invariance:
Pooling After • Max-pooling (e.g., 2×2 with
stride 2) downsamples feature
Convolutions maps, making the
network invariant to small
? translations (useful for object
localization).
•Dimensionality Reduction:
• Reduces computational cost and
memory usage by shrinking the
spatial size.
•Highlight Dominant Features:
• Max-pooling retains the most
activated features, suppressing
noise and irrelevant details.
•Role of Skip Connections: The key innovation in the U-
Net architecture is the use of skip connections that
connect the contracting path with the corresponding
layers in the expansive path. These skip connections
help in preserving fine-grained details and spatial
information during the upsampling process.
UNet
Loss Function:
architecture •The network is trained using a loss function such as
cross-entropy loss or Dice coefficient loss, which
compares the predicted segmentation masks with the
ground truth masks to optimize the network parameters
When it comes to assessing the performance
of a segmentation model, it’s crucial to have
reliable metrics that can quantify the
Evaluating accuracy and quality of the segmentation
Segmentation results.
Performance
with the DICE One widely used metric for this purpose is
Metric the DICE coefficient, also known as the
Dice similarity coefficient or Dice index.
The DICE metric provides a measure of the
similarity between two sets, in this case,
the predicted segmentation and the
ground truth segmentation. It calculates
the overlap between the two sets, taking into
account both the false positives and false
negatives.
Mathematically, the DICE score is defined as:
DICE score = 2 * |A ∩ B| / (|A| + |B|)