0% found this document useful (0 votes)
10 views52 pages

Lec 2 (Image Segemnation)

Image segmentation is a key technique in digital image processing that divides an image into meaningful segments for easier analysis. It includes various types such as semantic, instance, and panoptic segmentation, each serving different purposes in applications like medical imaging, autonomous systems, and augmented reality. Challenges in image segmentation include ambiguity, over- and under-segmentation, and computational complexity, with techniques like U-Net architecture and the DICE metric used for effective segmentation and performance evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views52 pages

Lec 2 (Image Segemnation)

Image segmentation is a key technique in digital image processing that divides an image into meaningful segments for easier analysis. It includes various types such as semantic, instance, and panoptic segmentation, each serving different purposes in applications like medical imaging, autonomous systems, and augmented reality. Challenges in image segmentation include ambiguity, over- and under-segmentation, and computational complexity, with techniques like U-Net architecture and the DICE metric used for effective segmentation and performance evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Mathematics for ML

Lec 2

Image Segmentation
Image segmentation is a fundamental technique in
digital image processing and computer vision.
It involves partitioning a digital image into multiple
segments (regions or objects) to simplify and analyze an
image by separating it into meaningful components,
Which makes the image processing more efficient by
focusing on specific regions of interest.
A typical image segmentation task goes through the
following steps:
Image 1. Groups pixels in an image based on shared
segmentation characteristics like colour, intensity, or texture.

2. Assigns a label to each pixel, indicating its


belonging to a specific segment or object.

3. The resulting output is a segmented image, often


visualized as a mask or overlay highlighting the
different segments.
Image segmentation is crucial in computer
vision tasks because it breaks down
complex images into manageable pieces.

Why do we
need Image It's like separating ingredients in a dish.
Segmentatio
n?
By isolating objects (things) and
backgrounds (stuff), image analysis
becomes more efficient and accurate.

This is essential for tasks like self-driving


cars identifying objects or medical imaging
analyzing tumours.
Semantic Segmentation
• Classifies each pixel into a category
(e.g., "car," "road").
Types of • Output: Single mask per class.
Image
Segmentat Instance Segmentation
ion • Distinguishes between individual
objects of the same class (e.g., "car1,"
"car2").
• Output: Unique masks for each object.

Panoptic Segmentation
• Combines semantic + instance
segmentation (labels "stuff" + "things").
4. Interactive
Segmentation
Types of • User-guided (e.g.,
Image clicks/scribbles refine
Segmentat masks).
ion
5. Real-Time
Segmentation
• Optimized for speed (e.g.,
autonomous driving).
In semantic image segmentation, we categorize
image pixels based on their semantic meaning,
not just their visual properties. This classification
system often uses two main categories: Things
and Stuff.
•Things: Things refer, to countable objects or
distinct entities in an image with clear
boundaries, like people, flowers, cars, animals
Semantic etc. So, the segmentation of "Things" aims to
label individual pixels in the image to specific
image classes by delineating the boundaries of
individual objects within the image
segmentation •Stuff: Stuff refers to specific regions or areas in
an image different elements in an image like
background or repeating patterns of similar
materials which can not be counted like road, sky
and grass which may not have clear boundaries
but play a crucial role in understanding the
overall context in an image. The segmentation of
"Stuff" involves grouping of pixels in an image
into clearly identifiable regions based on the
common properties like colour, texture or
context.
• Instance segmentation inverts the priorities of
semantic segmentation: whereas semantic
segmentation algorithms predict only semantic
classification of each pixel (with no regard for
individual instances), instance segmentation
delineates the exact shape of each separate
object instance.

• Instance segmentation isolates things from stuff


—which it ignores—and can thus be understood
Instance as an evolved form of object detection that
outputs a precise segmentation mask instead of
segmentati an approximate bounding box.

on • It’s a more difficult task than semantic


segmentation: even when things of the same
class are touching or even overlapping one
another, instance segmentation models must be
able to separate and determine the shape of
each one, whereas semantic segmentation
models can simply lump them together.
• Consider, for example, how the two different
models treat the parked cars in this image of a
city street.
Panoptic segmentation aims to
merge semantic and instance
segmentation, providing a
comprehensive understanding of
both “stuff” classes (e.g., sky,
Panoptic road) and “thing” classes (e.g.,
people, cars) in an image. It
segmentatio assigns a unique label to each
pixel while distinguishing
n between object instances. Hybrid
approaches combining deep
learning-based semantic
segmentation and instance
segmentation techniques are
commonly used for panoptic
segmentation.
Panoptic
segmentation
Panoptic
segmentation
Object Recognition and Tracking
Image segmentation is fundamental to
object recognition and tracking tasks. By
segmenting objects from the background, it
becomes easier to extract relevant features
and classify objects accurately. Object
Applications tracking algorithms utilize segmentation to
track objects over time in video sequences.
of image Medical Imaging
segmentation Image segmentation plays a critical role in
medical imaging applications, such as tumor
detection, organ segmentation, and disease
diagnosis. Accurate segmentation of
anatomical structures and abnormalities
assists in surgical planning, treatment
assessment, and computer-aided diagnosis.
Autonomous Systems
Segmentation is essential in autonomous
systems, including autonomous vehicles and
robots. It helps in scene understanding,
obstacle detection, and navigation. By
segmenting objects and the surrounding
environment, autonomous systems can make
informed decisions and navigate safely.

Applications Augmented Reality


Image segmentation enables the integration of
of image virtual content with real-world scenes in
augmented reality applications. By segmenting
segmentation objects or regions of interest, augmented
reality systems can overlay digital information
precisely on the appropriate areas, enhancing
user experiences.
Image Editing and Manipulation
Image segmentation is a vital component of
image editing tools. It allows for precise
selection and isolation of objects or regions for
various editing tasks, such as background
removal, object replacement, and image
compositing.
Ambiguity and Noise
Image segmentation can be challenging when
dealing with ambiguous boundaries or regions
with similar characteristics. Variations in
lighting conditions, noise, and texture can
affect the accuracy of segmentation
algorithms.
Over-Segmentation and Under-
Challenges of Segmentation
Over-segmentation occurs when an image is
image divided into excessive regions, while under-
segmentation happens when multiple objects
segmentation are grouped into a single region. Balancing the
trade-off between fine-grained segmentation
and merging similar objects is a common
challenge.

Computational Complexity
Some segmentation techniques, particularly
those based on deep learning models, require
substantial computational resources and
processing time. Ensuring efficient and
real-time segmentation in resource-constrained
environments is an ongoing challenge.
Traditional Segmentation techniques
Clustering based
method
One of the most commonly used clustering
algorithms is k-means. Here, the k
represents the number of clusters (not to be
confused with k-nearest neighbor). Let’s
understand how k-means works:
1. First, randomly select k initial clusters
2. Randomly assign each data point to any
Clustering one of the k clusters
3. Calculate the centers of these clusters
based 4. Calculate the distance of all the points
segmentation from the center of each cluster
5. Depending on this distance, the points
are reassigned to the nearest cluster
6. Calculate the center of the newly
formed clusters
7. Finally, repeat steps (4), (5) and (6) until
either the center of the clusters does
not change or we reach the set number
of iterations
One of the most commonly used clustering
algorithms is k-means. Here, the k
represents the number of clusters (not to be
confused with k-nearest neighbor). Let’s
understand how k-means works:
1. First, randomly select k initial clusters
2. Randomly assign each data point to any
Clustering one of the k clusters
3. Calculate the centers of these clusters
based 4. Calculate the distance of all the points
segmentation from the center of each cluster
5. Depending on this distance, the points
are reassigned to the nearest cluster
6. Calculate the center of the newly
formed clusters
7. Finally, repeat steps (4), (5) and (6) until
either the center of the clusters does
not change or we reach the set number
of iterations
Semantic segmentation
Difference
between
SS and IS
U-Net is a convolutional neural network (CNN)
architecture that was specifically designed for
biomedical image segmentation tasks.
Developed in 2015, U-Net has become one of
the go-to architectures for various
segmentation tasks due to its effectiveness
and efficiency. You can find the original pape
The U-Net architecture is characterized by
its U-shaped structure, which gives it its
UNet name. It consists of an encoding path and a
decoding path.

architecture •Encoding Path: This part of the network


captures the context of the input image by
using a series of convolutional and max-
pooling layers to downsample the spatial
dimensions. It “contracs” the original images.
•Decoding Path: The decoding path uses
upsampling and convolutional layers to
produce a segmentation map that has the
same spatial dimensions as the input image.
It “expands” the contracted images.
UNet architecture
U-Net’s strength in segmentation comes
from its use of skip connections, (grey
arrows in the Figure 1) which connect the
encoding and decoding paths by
merging features. This helps retain spatial
UNet details lost during downsampling, preserving
the image’s local and global context. By
architecture maintaining this spatial information, U-Net
achieves more accurate segmentation
masks. The skip connections assist the
network in grasping the relationships
between image parts, leading to improved
segmentation results.
Convolutional
for
downsampling
Skip connections are a neural network design
technique that create shortcuts between
layers, allowing data to bypass certain
operations (like convolutions or pooling).
What Are They were popularized by architectures
like ResNet (Residual Networks) and U-
Skip Net, addressing key challenges in deep
learning.
Connectio
•Definition: A skip connection (or "shortcut")
ns? forwards the output of an earlier layer
directly to a later layer, skipping
intermediate operations.
•Mechanism:
• In U-Net, they connect encoder
layers to decoder
layers via concatenation.
• In ResNet, they use element-wise
addition to combine features
Skip connections are a neural network design
technique that create shortcuts between
layers, allowing data to bypass certain
operations (like convolutions or pooling).
What Are They were popularized by architectures
like ResNet (Residual Networks) and U-
Skip Net, addressing key challenges in deep
learning.
Connectio
•Definition: A skip connection (or "shortcut")
ns? forwards the output of an earlier layer
directly to a later layer, skipping
intermediate operations.
•Mechanism:
• In U-Net, they connect encoder
layers to decoder
layers via concatenation.
• In ResNet, they use element-wise
addition to combine features
Contracting Path (Encoder):
•The network consists of a contracting path that
resembles a typical CNN encoder. This path involves a
series of convolutional and pooling layers that
progressively downsample the input image to extract
high-level features

UNet
architecture Expansive Path (Decoder):
•The expansive path, or decoder, involves upsampling
the feature maps to gradually recover the spatial
information lost during the downsampling stage. This
path typically includes transposed convolutions or
upsampling layers.
Feature Hierarchies:
• Convolutions extract localized features (edges,
textures, patterns) at the current resolution.
• Stacking multiple convolutions (e.g., 3×3
Why kernels) allows the network to
Convolutio capture increasingly complex features (e.g.,
from edges to object parts).
ns Before
Pooling? Non-Linearity (ReLU):
• ReLU activations introduce non-linearity,
enabling the network to model complex
relationships.

Channel Expansion:
• Convolutions often increase the number of
channels (feature maps) while reducing spatial
dimensions, balancing the trade-off between
resolution and depth.
Why Max- •Spatial Invariance:
Pooling After • Max-pooling (e.g., 2×2 with
stride 2) downsamples feature
Convolutions maps, making the
network invariant to small
? translations (useful for object
localization).
•Dimensionality Reduction:
• Reduces computational cost and
memory usage by shrinking the
spatial size.
•Highlight Dominant Features:
• Max-pooling retains the most
activated features, suppressing
noise and irrelevant details.
•Role of Skip Connections: The key innovation in the U-
Net architecture is the use of skip connections that
connect the contracting path with the corresponding
layers in the expansive path. These skip connections
help in preserving fine-grained details and spatial
information during the upsampling process.

UNet Skip Connection Mechanism:


•At each level of the encoder (contracting path), the
architecture feature maps are concatenated with the corresponding
feature maps in the decoder (expansive path). This
enables the network to directly access and reuse the
low-level features from the encoder while processing
higher-level features in the decode
Benefits of •Gradient Flow: Skip connections facilitate better
gradient flow during training by providing shortcuts for
Skip gradients to propagate through the network. This helps
in mitigating the vanishing gradient problem and

Connections: enables faster convergence.


•Preservation of Spatial Information: By directly
connecting low-level features to higher-level features,
skip connections aid in preserving spatial details and
contextual information crucial for accurate
segmentation.
Final Layer:
•The final layer of the U-Net typically consists of a
convolutional layer with a softmax activation function to
produce pixel-wise segmentation masks, where each
pixel is classified into different classes or categories.

UNet
Loss Function:
architecture •The network is trained using a loss function such as
cross-entropy loss or Dice coefficient loss, which
compares the predicted segmentation masks with the
ground truth masks to optimize the network parameters
When it comes to assessing the performance
of a segmentation model, it’s crucial to have
reliable metrics that can quantify the
Evaluating accuracy and quality of the segmentation
Segmentation results.

Performance
with the DICE One widely used metric for this purpose is
Metric the DICE coefficient, also known as the
Dice similarity coefficient or Dice index.
The DICE metric provides a measure of the
similarity between two sets, in this case,
the predicted segmentation and the
ground truth segmentation. It calculates
the overlap between the two sets, taking into
account both the false positives and false
negatives.
Mathematically, the DICE score is defined as:
DICE score = 2 * |A ∩ B| / (|A| + |B|)

And can be understood as:


Dice score = 2 * (number of common elements) /
(number of elements in set A + number of elements in
set B)
The DICE coefficient ranges from 0 to 1, where a

DICE score value closer to 1 indicates a higher degree of overlap


and thus better segmentation performance.
A DICE score of 1 would mean a perfect overlap
between the predicted and ground truth
segmentations, while a score of 0 would indicate no
overlap at all.
DICE score
In our case of segmentation, we are
comparing two matrixes.

Consider matrix A as representing


the predicted mask, which is one-
dimensional since it has only one channel.

This matrix contains elements that are


either 0 or 1.

DICE score When matrix A is multiplied by another


matrix, let’s call it matrix B, the reference
mask, which also contains elements that are
either 0 or 1

The resulting matrix will have a value of


1 at positions i,j only if both matrix A
and matrix B have a value of 1 at that
same position i,j.
Thank you

You might also like