0% found this document useful (0 votes)
25 views

Image_Segmentation_DeepLearning

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Image_Segmentation_DeepLearning

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Introduction to Deep

Learning-Based
Segmentation
ITS 69204 Computer Vision and NLP
Prepared By: Dr. Toh Leow Bin
Learning Outcomes

 Define and explain the concept of image


segmentation.
 Differentiate between traditional and deep learning-
based approaches.
 Identify key deep learning architectures used for
segmentation.
 Evaluate challenges and benefits in using deep
learning for segmentation.
Introduction to Image
Segmentation
 Image segmentation is an extension of image
classification where, in addition to classification, we
perform localization.
 Image segmentation thus is a superset of image
classification with the model pinpointing where a
corresponding object is present by outlining the
object's boundary.
Introduction to Image
Segmentation
 Most image segmentation models consist of an
encoder-decoder network as compared to a single
encoder network in classifiers.
 The encoder encodes a latent space representation
of the input which the decoder decodes to form
segment maps, or in other words maps outlining
each object’s location in the image.
 A typical segment map looks something like this:
Traditional vs. Deep Learning-
Based Segmentation

 Traditional Methods:
 Thresholding, Edge Detection, Clustering, Region-Based
Methods.
 Depend on handcrafted features, limited scalability for
complex tasks.
 Deep Learning-Based Methods:
 Learn hierarchical features directly from data.
 Robust and adaptable for complex patterns.
Traditional Image
Segmentation techniques
Deep Learning-based Method
Sematic Segmentation
 Semantic segmentation models provide segment maps
as outputs corresponding to the inputs they are fed.
 These segment maps are often n-channeled with n
being the number of classes the model is supposed to
segment.
 Each of these n-channels is binary in nature with
object locations being “filled” with ones and empty
regions consisting of zeros.
 The ground truth map is a single channel integer
array the same size as the input and has a range of
“n”, with each segment “filled” with the index value of
the corresponding classes (classes are indexed from 0
to n-1).
Deep Learning-based Method
Sematic Segmentation
 The model output in an “n-channel” binary format is
also known as a two-dimensional one-hot encoded
representation of the predictions.
 Neural networks that perform segmentation
typically use an encoder-decoder structure
where the encoder is followed by a bottleneck and a
decoder or upsampling layers directly from the
bottleneck (like in the FCN).
Convolutional Encoder-
Decoder Architecture
 Encoder decoder architectures for semantic
segmentation became popular with the onset of
works like SegNet (by Badrinarayanan et. a.) in 2015.
 SegNet proposes the use of a combination of
convolutional and downsampling blocks to squeeze
information into a bottleneck and form a
representation of the input.
 The decoder then reconstructs input information
to form a segment map highlighting regions on the
input and grouping them under their classes.
 Finally, the decoder has a sigmoid activation at the
end that squeezes the output in the range (0,1).
Convolutional Encoder-
Decoder Architecture
 SegNet was accompanied by the release of another
independent segmentation work at the same time, U-Net ( by
Ronnerberger et. al.), which first introduced skip connections
in Deep Learning as a solution for the loss of information
observed in downsampling layers of typical encoder-
decoder networks.
 Skip connections are connections that go from the encoder
directly to the decoder without passing through the
bottleneck.
 In other words, feature maps at various levels of encoded
representations are captured and concatenated to
feature maps in the decoder. This helps to reduce data loss
by aggressive pooling and downsampling as done in the
encoder blocks of an encoder-decoder architecture.
U-Net explanation
Why Deep Learning for
Segmentation?

 Advantages:
 Robust performance in noisy and variable conditions.
 End-to-end learning with hierarchical features.
 Scalable for diverse datasets.

 Examples:
 Medical Imaging: Automated cancer detection.
 Autonomous Vehicles: Precise road marking detection.
Applications of Image
Segmentation
 Robotics (Machine Vision)
 Aids machine perception and locomotion by pointing
out objects in their path of motion
 Enabling them to change paths effectively and
understand the context of their environment.
 Medical Imaging
 Helps doctors identify possible malignant features in
images in a fast and accurate manner.
 X-ray, CT scan, Dental, pathology cell
Applications of Image
Segmentation
 Smart Cities
 CCTV cameras for real-time monitoring of pedestrians,
traffic, and crime.
 Pedestrian detection, Traffic analytics, License plate
detection and Video Surveillance
 Self Driving Cars/Autonomous Driving Cars
 Planning of routes and movement depending heavily
on it.
 Drivable surface semantic segmentation, Car and
pedestrian instance segmentation, In-vehicle object
detection (stuff left behind by passengers) and Pothole
detection and segmentation
Challenges in Deep Learning-
Based Segmentation

 Data Dependency:
 Large annotated datasets are required.
 Annotation is expensive and time-consuming
 Computational Requirements:
 High-performance GPUs or TPUs are required for training.
 Model Complexity:
 Risk of overfitting with insufficient data.
 Class Imbalance:
 Small objects or regions may be overshadowed by larger
ones in loss functions.
Summary

 Image segmentation is key to AI applications


needing fine-grained analysis.
 Deep learning provides scalable, robust solutions
(e.g., FCN, U-Net, Mask R-CNN).
 Challenges include data requirements,
computational costs, and generalization.

You might also like