Image_Segmentation_DeepLearning
Image_Segmentation_DeepLearning
Learning-Based
Segmentation
ITS 69204 Computer Vision and NLP
Prepared By: Dr. Toh Leow Bin
Learning Outcomes
Traditional Methods:
Thresholding, Edge Detection, Clustering, Region-Based
Methods.
Depend on handcrafted features, limited scalability for
complex tasks.
Deep Learning-Based Methods:
Learn hierarchical features directly from data.
Robust and adaptable for complex patterns.
Traditional Image
Segmentation techniques
Deep Learning-based Method
Sematic Segmentation
Semantic segmentation models provide segment maps
as outputs corresponding to the inputs they are fed.
These segment maps are often n-channeled with n
being the number of classes the model is supposed to
segment.
Each of these n-channels is binary in nature with
object locations being “filled” with ones and empty
regions consisting of zeros.
The ground truth map is a single channel integer
array the same size as the input and has a range of
“n”, with each segment “filled” with the index value of
the corresponding classes (classes are indexed from 0
to n-1).
Deep Learning-based Method
Sematic Segmentation
The model output in an “n-channel” binary format is
also known as a two-dimensional one-hot encoded
representation of the predictions.
Neural networks that perform segmentation
typically use an encoder-decoder structure
where the encoder is followed by a bottleneck and a
decoder or upsampling layers directly from the
bottleneck (like in the FCN).
Convolutional Encoder-
Decoder Architecture
Encoder decoder architectures for semantic
segmentation became popular with the onset of
works like SegNet (by Badrinarayanan et. a.) in 2015.
SegNet proposes the use of a combination of
convolutional and downsampling blocks to squeeze
information into a bottleneck and form a
representation of the input.
The decoder then reconstructs input information
to form a segment map highlighting regions on the
input and grouping them under their classes.
Finally, the decoder has a sigmoid activation at the
end that squeezes the output in the range (0,1).
Convolutional Encoder-
Decoder Architecture
SegNet was accompanied by the release of another
independent segmentation work at the same time, U-Net ( by
Ronnerberger et. al.), which first introduced skip connections
in Deep Learning as a solution for the loss of information
observed in downsampling layers of typical encoder-
decoder networks.
Skip connections are connections that go from the encoder
directly to the decoder without passing through the
bottleneck.
In other words, feature maps at various levels of encoded
representations are captured and concatenated to
feature maps in the decoder. This helps to reduce data loss
by aggressive pooling and downsampling as done in the
encoder blocks of an encoder-decoder architecture.
U-Net explanation
Why Deep Learning for
Segmentation?
Advantages:
Robust performance in noisy and variable conditions.
End-to-end learning with hierarchical features.
Scalable for diverse datasets.
Examples:
Medical Imaging: Automated cancer detection.
Autonomous Vehicles: Precise road marking detection.
Applications of Image
Segmentation
Robotics (Machine Vision)
Aids machine perception and locomotion by pointing
out objects in their path of motion
Enabling them to change paths effectively and
understand the context of their environment.
Medical Imaging
Helps doctors identify possible malignant features in
images in a fast and accurate manner.
X-ray, CT scan, Dental, pathology cell
Applications of Image
Segmentation
Smart Cities
CCTV cameras for real-time monitoring of pedestrians,
traffic, and crime.
Pedestrian detection, Traffic analytics, License plate
detection and Video Surveillance
Self Driving Cars/Autonomous Driving Cars
Planning of routes and movement depending heavily
on it.
Drivable surface semantic segmentation, Car and
pedestrian instance segmentation, In-vehicle object
detection (stuff left behind by passengers) and Pothole
detection and segmentation
Challenges in Deep Learning-
Based Segmentation
Data Dependency:
Large annotated datasets are required.
Annotation is expensive and time-consuming
Computational Requirements:
High-performance GPUs or TPUs are required for training.
Model Complexity:
Risk of overfitting with insufficient data.
Class Imbalance:
Small objects or regions may be overshadowed by larger
ones in loss functions.
Summary