1 Image Segmentation Using Deep Learning
1 Image Segmentation Using Deep Learning
1
• Panoptic Segmentation: Combines semantic and instance segmentation.
It labels all pixels in the image, distinguishing between different instances
of objects, and also segments background classes that don’t have individual
instances.
3 Semantic Segmentation
The difference between semantic vs. instance vs. panoptic segmentation lies in
how they process the things and stuff in the image.
Semantic segmentation studies the uncountable stuff in an image. It ana-
lyzes each image pixel and assigns a unique class label based on the texture it
represents. For example, in the 1st cell of the figure for Semantic Segmentation,
the image contains two cars, three pedestrians, a road, and the sky. The two
cars represent the same texture as do the three pedestrians.
Semantic segmentation would assign unique class labels to each of these
textures or categories. However, semantic segmentation’s output cannot dif-
ferentiate or count the two cars or three pedestrians separately. Commonly
used semantic segmentation techniques include SegNet, U-Net, DeconvNet, and
FCNs.
4 Instance Segmengtation
Instance segmentation typically deals with tasks related to countable things. It
can detect each object or instance of a class present in an image and assigns it
2
a different mask or bounding box with a unique identifier.
For example, instance segmentation would identify the two cars in the previ-
ous example separately as, let’s say, car1 and car2. Commonly used instance seg-
mentation techniques are Mask R-CNN, Faster R-CNN, PANet, and YOLACT.
5 Panoptic Segmentation
Panoptic segmentation is the best of both worlds. It presents a unified image
segmentation approach where each pixel in a scene is assigned a semantic label
(due to semantic segmentation) and a unique instance identifier (due to instance
segmentation).
Panoptic segmentation assigns each pixel only one pair of a semantic label
and an instance identifier. However, objects can have overlapping pixels. In
this case, panoptic segmentation resolves the discrepancy by favoring the object
instance, as the priority is to identify each thing rather than stuff. Most panoptic
segmentation models are based on the Mask R-CNN method. Its backbone
architectures include UPSNet, FPSNet, EPSNet, and VPSNet.
3
Figure 4: Panoptic Segmentation
7 Evaluation Metrics
Mostly used Evaluation Metrics for this task include:
• Intersection over Union (IoU) Click here to see example
• Mean Pixel Accuracy (mPA)
• Pixel Accuracy
• Fully Convolutional
4
• Data Augmentation: extensive data augmentation, including elastic de-
formations, rotations, and shifts
Several variants of U-Net have been developed to address different challenges
and improve performance in specific tasks. Some of them are Attention U-Net,
ResUNet, U-Net++ .
Figure 5: U-net architecture (example for 32x32 pixels in the lowest resolution).
Each blue box corresponds to a multi-channel feature map. The number of
channels is denoted on top of the box. The x-y-size is provided at the lower
left edge of the box. White boxes represent copied feature maps. The arrows
denote the different operations.
• Surveillance
• Robotics
5
References
[1] https://cs231n.stanford.edu/slides/2023/lecture_11.pdf.
[2] https://pyimagesearch.com/2022/06/29/semantic-vs-instance-vs-panoptic-segmentation/.
[3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep
convolutional encoder-decoder architecture for image segmentation, 2016.
[4] Daan de Geus, Panagiotis Meletis, and Gijs Dubbelman. Fast panoptic
segmentation network, 2019.
[5] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn,
2018.