0% found this document useful (0 votes)
14 views6 pages

1 Image Segmentation Using Deep Learning

Uploaded by

azimasheikh878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

1 Image Segmentation Using Deep Learning

Uploaded by

azimasheikh878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Image Segmentation using Deep Learning

Sudip Bhattacharya, BIT Durg


July 2024

1 Image Segmentation using Deep Learning


Image segmentation using deep learning is a technique where an image is divided
into multiple segments or regions, each corresponding to a different object or
part of the object within the image. The goal is to assign a label to every pixel
in the image, effectively partitioning the image into meaningful regions. Deep
learning, particularly using convolutional neural networks (CNNs), has become
the standard approach for this task due to its ability to learn complex features
directly from data.

Figure 1: Computer Vision Tasks

2 Types of Image Segmentation


• Semantic Segmentation: Every pixel in the image is classified into a prede-
fined category. The categories represent different classes, but all instances
of the same class share the same label.
• Instance Segmentation: Similar to semantic segmentation, but here, each
instance of an object is separately identified.

1
• Panoptic Segmentation: Combines semantic and instance segmentation.
It labels all pixels in the image, distinguishing between different instances
of objects, and also segments background classes that don’t have individual
instances.

3 Semantic Segmentation
The difference between semantic vs. instance vs. panoptic segmentation lies in
how they process the things and stuff in the image.
Semantic segmentation studies the uncountable stuff in an image. It ana-
lyzes each image pixel and assigns a unique class label based on the texture it
represents. For example, in the 1st cell of the figure for Semantic Segmentation,
the image contains two cars, three pedestrians, a road, and the sky. The two
cars represent the same texture as do the three pedestrians.
Semantic segmentation would assign unique class labels to each of these
textures or categories. However, semantic segmentation’s output cannot dif-
ferentiate or count the two cars or three pedestrians separately. Commonly
used semantic segmentation techniques include SegNet, U-Net, DeconvNet, and
FCNs.

Figure 2: Semantic Segmentation

4 Instance Segmengtation
Instance segmentation typically deals with tasks related to countable things. It
can detect each object or instance of a class present in an image and assigns it

2
a different mask or bounding box with a unique identifier.
For example, instance segmentation would identify the two cars in the previ-
ous example separately as, let’s say, car1 and car2. Commonly used instance seg-
mentation techniques are Mask R-CNN, Faster R-CNN, PANet, and YOLACT.

Figure 3: Instance Segmentation

5 Panoptic Segmentation
Panoptic segmentation is the best of both worlds. It presents a unified image
segmentation approach where each pixel in a scene is assigned a semantic label
(due to semantic segmentation) and a unique instance identifier (due to instance
segmentation).
Panoptic segmentation assigns each pixel only one pair of a semantic label
and an instance identifier. However, objects can have overlapping pixels. In
this case, panoptic segmentation resolves the discrepancy by favoring the object
instance, as the priority is to identify each thing rather than stuff. Most panoptic
segmentation models are based on the Mask R-CNN method. Its backbone
architectures include UPSNet, FPSNet, EPSNet, and VPSNet.

6 Datasets - Pascal VOC and CityScapes


Some important benchmark datasets used for Image Segmentation are:
• PASCAL VOC: 11,530 images, 20 categories, For General object detection
• COCO: 330,000 images, 80 categories, For Complex object detection
• Cityscapes: 5,000 finely annotated images (25,000 coarsely annotated im-
ages), 30 classes (19 for semantic segmentation, including vehicles, pedes-
trians, road, etc.), For Urban scene understanding,

3
Figure 4: Panoptic Segmentation

7 Evaluation Metrics
Mostly used Evaluation Metrics for this task include:
• Intersection over Union (IoU) Click here to see example
• Mean Pixel Accuracy (mPA)

• Pixel Accuracy

8 U-Net and Variants


U-Net is a convolutional neural network (CNN) architecture developed primar-
ily for biomedical image segmentation. It was introduced by Olaf Ronneberger,
Philipp Fischer, and Thomas Brox in their 2015 paper titled ”U-Net: Convo-
lutional Networks for Biomedical Image Segmentation.” The architecture is de-
signed to perform pixel-wise segmentation, making it highly effective for tasks
where precise delineation of object boundaries is crucial, such as in medical
imaging. Key Features of U-Net are:

• Symmetric Architecture: symmetric ”U” shape, consisting of a contracting


path (encoder) and an expansive path (decoder)
• Skip Connections: allow the network to combine high-level semantic in-
formation with low-level spatial details, improving the accuracy

• Fully Convolutional

4
• Data Augmentation: extensive data augmentation, including elastic de-
formations, rotations, and shifts
Several variants of U-Net have been developed to address different challenges
and improve performance in specific tasks. Some of them are Attention U-Net,
ResUNet, U-Net++ .

Figure 5: U-net architecture (example for 32x32 pixels in the lowest resolution).
Each blue box corresponds to a multi-channel feature map. The number of
channels is denoted on top of the box. The x-y-size is provided at the lower
left edge of the box. White boxes represent copied feature maps. The arrows
denote the different operations.

9 Real world applications of Image Segmenta-


tion
• Analyzing medical scans
• Autonomous vehicles (self-driving cars)
• Satellite or aerial imagery

• Surveillance
• Robotics

5
References
[1] https://cs231n.stanford.edu/slides/2023/lecture_11.pdf.
[2] https://pyimagesearch.com/2022/06/29/semantic-vs-instance-vs-panoptic-segmentation/.
[3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep
convolutional encoder-decoder architecture for image segmentation, 2016.

[4] Daan de Geus, Panagiotis Meletis, and Gijs Dubbelman. Fast panoptic
segmentation network, 2019.
[5] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn,
2018.

You might also like