Unit 4
Unit 4
Introduction to Segmentation
Image segmentation refers to the task of annotating a single class to different groups of pixels. While the
input is an image, the output is a mask that draws the region of the shape in that image. Image segmentation
has wide applications in domains such as medical image analysis, self-driving cars, satellite image analysis,
etc. There are different types of image segmentation techniques like semantic segmentation, instance
segmentation, etc. To summarize the key goal of image segmentation is to recognize and understand what’s
in an image at the pixel level.
Image segmentation is the technique of subdividing an image into constituent sub-regions or distinct objects.
The level of detail to which subdivision is carried out depends on the problem being solved. That is,
segmentation should stop when the objects or the regions of interest in an application have been detected.
Segmentation of non-trivial images is one of the most difficult tasks in image processing. Segmentation
accuracy determines the eventual success or failure of computerized analysis procedures. Segmentation
procedures are usually done using two approaches – detecting discontinuity in images and linking edges to
form the region (known as edge-based segmenting), and detecting similarity among pixels based on intensity
levels (known as threshold-based segmenting).
Image segmentation is a fundamental technique in digital image processing and computer vision. It involves
partitioning a digital image into multiple segments (regions or objects) to simplify and analyze an image by
separating it into meaningful components, Which makes the image processing more efficient by focusing on specific
regions of interest. A typical image segmentation task goes through the following steps:
Groups pixels in an image based on shared characteristics like colour, intensity, or texture.
Assigns a label to each pixel, indicating its belonging to a specific segment or object.
The resulting output is a segmented image, often visualized as a mask or overlay highlighting the different segments.
Image segmentation is crucial in computer vision tasks because it breaks down complex images into manageable
pieces. It's like separating ingredients in a dish. By isolating objects (things) and backgrounds (stuff), image analysis
becomes more efficient and accurate. This is essential for tasks like self-driving cars identifying objects or medical
imaging analyzing tumours. Understanding the image's content at this granular level unlocks a wider range of
applications in computer vision.
Semantic Classes in Image Segmentation: Things and Stuff.
In semantic image segmentation, we categorize image pixels based on their semantic meaning, not just their visual
properties. This classification system often uses two main categories: Things and Stuff.
Things: Things refer, to countable objects or distinct entities in an image with clear boundaries, like people, flowers,
cars, animals etc. So, the segmentation of "Things" aims to label individual pixels in the image to specific classes by
delineating the boundaries of individual objects within the image
Stuff: Stuff refers to specific regions or areas in an image different elements in an image like background or repeating
patterns of similar materials which can not be counted like road, sky and grass which may not have clear boundaries
but play a crucial role in understanding the overall context in an image. The segmentation of "Stuff" involves
grouping of pixels in an image into clearly identifiable regions based on the common properties like colour, texture
or context.
Semantic segmentation
Semantic Segmentation is one of the different types of image segmentation where a class label is assigned to image
pixels using deep learning (DL) algorithm. In Semantic Segmentation, collections of pixels in an image are identified
and classified by assigning a class label based on their characteristics such as colour, texture and shape. This provides
a pixel-wise map of an image (segmentation map) to enable more detailed and accurate image analysis.
For example, all pixels related to a ‘tree’ would be labelled the same object name without distinguishing between
individual trees. Another example would be, group of people in an image would be labelled as single object as
'persons', instead of identifying individual people.
Instance segmentation
Instance segmentation in image segmentation of computer vision task is a more sophisticated feature which involves
identifying and delineating each individual object within an image. So instance segmentation goes beyond just
identifying objects in an image, but also delineate the exact boundaries of each individual instance of that object.
So, the key focus of instance segmentation is to differentiate between separate objects of the same class. for
example, if there are many cats in a image, instance segmentation would identify and outline each specific cat. The
segmentation map is created for each individual pixel and separate labels are assigned to specific object instances by
creating different coloured labels which will represent different 'cat' in the group of cats in an image.
Instance segmentation is useful in autonomous vehicles to identify individual objects like pedestrians, other vehicles
and any objects along the navigation route. In medical imaging, analysing scan images for detection of specific
abnormalities are useful for early detection of cancer and other organ conditions.
The traditional image segmentation techniques which formed the foundation of modern image segmentation
methods using deep learning algorithms, uses thresholding, edge detection, Region-Based Segmentation, clustering
algorithms and Watershed Segmentation. These techniques are more reliant on principle of image processing,
mathematical operation and heuristics to separate an image into meaningful regions.
Thresholding: This method involves selecting a threshold value and classifying image pixels between foreground and
background based on intensity values
Edge Detection: Edge detection method identify abrupt change in intensity or discontinuation in the image. It uses
algorithms like Sobel, Canny or Laplacian edge detectors.
Region-based segmentation: This method segments the image into smaller regions and iteratively merges them
based on predefined attributes in colour, intensity and texture to handle noise and irregularities in the image.
Clustering Algorithm: This method uses algorithms like K-means or Gaussian models to group object pixels in an
image into clusters based on similar features like colour or texture.
Watershed Segmentation:The watershed segmentation treats the image like a topographical map where the
watershed lines are identifies based on pixel intensity and connectivity like water flowing down different valleys.
Below are the list of different uses cases of Image Segmentation in Image processing:
Autonomous Vehicles: Image segmentation helps autonomous vehicles in identifying and segmenting objects like
real time road lane detections, vehicles, pedestrians, traffic signs for safe navigation.
Medical Imaging Analysis: Image segmentation used for segmenting organs, tumours and other anatomical
structures from medical images like X-Rays, MRIs, and CT Scans, helps in diagnosis and treatment planning.
Satellite Image Analysis: Used in analysing satellite images for landcover classification, urban planning, and
environmental changes.
Object Detection and Tracking: Segmenting different objects in image or video for different tasks like person
detection, anomaly detection, and detecting different activities in security systems.
Content Moderation: Used in monitoring and segmenting inappropriate content from images or videos for social
media platforms.
Smart Agriculture: Image segmentation methods are used by farmers and agronomists for crop health monitoring,
estimating yield and detect plant diseases from images and videos.
Industrial Inspection: Image segmentation helps in manufacturing process for quality control, detecting defects in
products.
4.1 Thresholding
Thresholding is one of the segmentation techniques that generates a binary image (a binary image is one
whose pixels have only two values – 0 and 1 and thus requires only one bit to store pixel intensity) from a
given grayscale image by separating it into two regions based on a threshold value. Hence pixels having
intensity values greater than the said threshold will be treated as white or 1 in the output image and the
others will be black or 0.
4.1.1
4.1.2 Global Thresholding
When the intensity distribution of objects and background are sufficiently distinct, it is possible to use a
single or global threshold applicable over the entire image. The basic global thresholding algorithm
iteratively finds the best threshold value so segmenting.
Segment the image using T to form two groups G1 and G2: G1 consists of all pixels with intensity values >
T, and G2 consists of all pixels with intensity values ≤ T.
Compute the average intensity values m1 and m2 for groups G1 and G2.σ
Repeat steps 2 through 4 until the difference in the subsequent value of T is smaller than a pre-defined value
δ.
This algorithm works well for images that have a clear valley in their histogram. The larger the value of δ,
the smaller will be the number of iterations. The initial estimate of T can be made equal to the average pixel
intensity of the entire image.
4.1.3 Otsu Thresholding
In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes,
foreground and background. This threshold is determined by minimizing intra-class intensity variance, or
equivalently, by maximizing inter-class variance
4.1.4 Variable Thresholding/Adaptive thresholding is the method where the threshold value is calculated
for smaller regions. This leads to different threshold values for different regions with respect to the change
in lighting. There are broadly two different approaches to local thresholding. One approach is to partition the
image into non-overlapping rectangles. Then the techniques of global thresholding or Otsu’s method are
applied to each of the sub-images. Hence in the image partitioning technique, the methods of global
thresholding are applied to each sub-image rectangle by assuming that each such rectangle is a separate
image in itself.
4.1.5
For each pixel, the threshold value is calculated by using a weighted sum of the pixel values in a local neighborhood.
The weights are a Gaussian window, which means that pixels closer to the center of the region have a greater
influence.
Using the above function a gaussian kernel of any size can be calculated, by providing it with appropriate values. A
3×3 Gaussian Kernel Approximation(two-dimensional) with Standard Deviation = 1, appears as follows.
3.4375=3 1.625=2
2.5625=3 2.6875=3
Ex-
4.2 Edge-based segmentation techniques work by identifying areas in an image where there is a rapid
change in intensity or color. These changes often mark the edges of objects or regions within the image.
Techniques such as gradient-based methods (like Sobel or Prewitt operators) detect changes in intensity,
while other methods like Canny edge detection apply more sophisticated filtering to get clearer, more
defined edges.
So, when you apply edge-based segmentation to an image, you’re looking for the points where there’s a
sudden jump in brightness or color, marking a transition from one region to another.
The core of edge detection revolves around the concept of gradients. A gradient measures how quickly
image intensity changes at a given pixel. The greater the change, the more likely the pixel is on an edge.
These gradients are typically calculated using filters (or kernels) like Sobel or Prewitt.
Where I is the intensity of the image.
This gives the strength of the edge at each pixel, with larger values indicating stronger
edges.
3. Edge Direction
Once the magnitude is calculated, the direction of the edge can also be determined using:
4. Thresholding
After calculating the gradient magnitude and direction, the next step is to apply thresholding. This step helps
in identifying only the strong edges by filtering out weak gradient values.
4.5 The power of the Markov random field (MRF) in vision, treating the MRF both as a tool for modeling
image data and, utilizing recently developed algorithms, as a means of making inferences about images.
These inferences concern underlying image and scene structure as well as solutions to such problems as
image reconstruction, image segmentation, 3D vision, and object labeling.
4.6 Graph cut is a semiautomatic segmentation technique that you can use to segment an image into
foreground and background elements. Graph cut segmentation does not require good initialization. You
draw lines on the image, called scribbles, to identify what you want in the foreground and what you want in
the background. The Graph Cut technique applies graph theory to image processing to achieve fast
segmentation. The technique creates a graph of the image where each pixel is a node connected by weighted
edges. The higher the probability that pixels are related the higher the weight. The algorithm cuts along
weak edges, achieving the segmentation of objects in the image. Graph cuts to divide an image into
background and foreground segments. The framework consists of two parts. First, a network flow graph is
built based on the input image. Then a max-flow algorithm is run on the graph in order to find the min-cut,
which produces the optimal segmentation.
Image to Graph-
4.7 The Gabor filter
The Gabor filter, named after Dennis Gabor, is a linear filter used in myriad image processing applications
for edge detection, texture analysis, feature extraction, etc. These filters have been shown to possess optimal
localization properties in both spatial and frequency domains and thus are well-suited for texture
segmentation problems. Gabor filters are special classes of bandpass filters, i.e., they allow a certain ‘band’ of
frequencies and reject the others. A Gabor filter can be viewed as a sinusoidal signal of particular frequency
and orientation, modulated by a Gaussian wave. In practice to analyze texture or obtain feature from an
image, a bank of Gabor filter with a number of different orientation are used.
The filter has a real and an imaginary component representing orthogonal directions. The two components
may be formed into a complex number or used individually. The equations are shown below:
In the above equation,
λ — Wavelength of the sinusoidal component. Controls the width of the strips of the Gabor function.
Decreasing the wavelength produces thinner stripes
Ө — The orientation of the normal to the parallel stripes of the Gabor function.
ɣ — The spatial aspect ratio and specifies the ellipticity of the support of the Gabor function. controls the
height of the Gabor filter. If the gamma value is the height of Gabor reduces and if the gamma value is small
the height of Gabor increases.
4.8 DWT- Wavelets are functions that are concentrated in time and frequency around a certain point. This
transformation technique is used to overcome the drawbacks of fourier method. Fourier transformation,
although it deals with frequencies, does not provide temporal details. According to Heisenberg’s Uncertainty
Principle, we can either have high frequency resolution and poor temporal resolution or vice versa. This
wavelet transform finds its most appropriate use in non-stationary signals. This transformation achieves good
frequency resolution for low-frequency components and high temporal resolution for high-frequency
components.
This method starts with a mother wavelet such as Haar, Morlet, Daubechies, etc. The signal is then essentially
translated into scaled and shifted versions of mother wavelet.
Wavelet analysis is used to divide information present on an image (signals) into two discrete components —
approximations and details (sub-signals).
A signal is passed through two filters, high pass and low pass filters. The image is then decomposed into high
frequency (details) and low frequency components (approximation). At every level, we get 4 sub-signals. The
approximation shows an overall trend of pixel values and the details as the horizontal, vertical and diagonal
components.
If these details are insignificant, they can be valued as zero without significant impact on the image, thereby