Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
Search Write
Get unlimited access to the best of Medium for less than $1/week. Become a member
Image Segmentation — A
Beginner’s Guide
The essentials of Image Segmentation + implementation in
TensorFlow
96
For example, in a street scene, all pixels belonging to cars might be labeled
with one color, while those belonging to the road might be labeled with
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 1/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
another.
Boring Classifiers
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 2/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
One approach is to draw a bounding box around the dog, which is called
Object Detection.
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 3/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
If that’s all you want, then you’re done! But if you want to know exactly where
the dog is, on the pixel level, then you’ll need something better. That’s where
image segmentation comes into play.
Image Segmentation
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 4/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
There are a couple of ways, such as thresholding and clustering, but deep
learning (my fav) really takes the spotlight when it comes to image
segmentation.
U-Net
The U-Net architecture was initially designed for medical image
segmentation, but it has since been adapted for many other use cases.
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 8/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
The encoder is used to compress the input image into a latent space
representation through convolutions and downsampling.
The long gray arrows running across the “U” are skip connections, and they
serve two main purposes:
1. During the forward pass, they enable the decoder to access information
from the encoder.
The output of the model has the same width and height as the input,
however the number of channels will be equal to the number of classes we
are segmenting.
Code it up
If you’re keen to code, let’s implement the U-Net architecture for semantic
segmentation in TensorFlow.
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 9/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
U-Net Architecture
Defining the model architecture is rather straightforward.
# Bottleneck
bridge = conv_block(p5, n_filters=1024) # bridge=32x32x1024
return model
model.compile(
loss="categorical_crossentropy",
categorical_crossentropy
)
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 12/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
Before we can train the model, we need a dataset. The dataset should
contain (image, mask) pairs, where the image (x) is of shape (512x512x3) and
the mask (y) is of shape (512x512x5).
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 13/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
Each pixel can only belong to one class, so it contains a “1” in one of the class
channels, and a “0” in the other channels. You can think of each pixel as a
one-hot vector (because that’s what it is).
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 14/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
model.fit(
train_ds,
validation_data=val_ds,
epochs=10,
)
Of course, this code would not be enough to run a successful model. If you
actually want to implement this, you need to consider preprocessing,
rescaling, batching etc.
Final Notes
Class Imbalance: Often in image segmentation, there is severe class
imbalance. For example, in an average street view image, cars and
buildings take up a lot of pixels, but stop signs take up very few pixels.
The model has less data on stop signs, so it will perform poorly in
segmenting stop signs. To solve this, you can use Focal Categorical Cross
Entropy and class weights, which place emphasis on minority classes.
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 15/22
23/9/24, 18:23 Image Segmentation — A Beginner’s Guide | Medium
📃 Medium
🌐 LinkedIn
📽️ YouTube
https://medium.com/@raj.pulapakura/image-segmentation-a-beginners-guide-0ede91052db7 16/22