CV Unit 4
CV Unit 4
Gradients are mathematical functions used in computer vision to measure the intensity changes
of an image. There are various types of gradient operators used for image processing and
analysis, such as Sobel, Scharr, Prewitt, and Laplacian of Gaussian (LoG) operators. Each of
these operators has its own set of properties and advantages, making them suitable for different
applications, such as edge detection, feature extraction, and image segmentation.
Understanding the different types of gradient operators is crucial for effectively analyzing and
processing images in computer vision.
1. Sobel Operator
The Sobel operator is a common gradient operator used for edge detection. It uses a 3x3 kernel
to calculate the partial derivatives of an image in the horizontal and vertical directions.
2. Scharr Operator
The Scharr operator is a more accurate gradient operator than the Sobel operator. It uses a 3x3
kernel with different weightings to calculate the partial derivatives.
3. Prewitt Operator
The Prewitt operator is another gradient operator used for edge detection. It also uses a 3x3
kernel to calculate the partial derivatives but with different weightings than the Sobel operator.
4. Laplacian of Gaussian (LoG) Operator
The Log operator is a second-order derivative operator used for detecting edges and blobs in an
image. It first applies a Gaussian filter to the image to smooth it and then applies the Laplacian
operator to highlight the edges and blobs.
First Order and Second Order Derivative Filters
First and second-order derivative filters are used to compute image gradients, which are
commonly used in computer vision for edge detection, feature extraction, and image
segmentation.
First-order derivative filters, such as Sobel and Prewitt operators, calculate the gradient of an
image by computing the difference between the pixel values in adjacent rows and columns.
These filters are relatively simple and easy to compute, but they may not accurately capture the
full range of edge information in an image.
On the other hand, second-order derivative filters, such as the Laplacian of Gaussian (LoG)
operator, calculate the gradient by computing the difference between the pixel values of
neighboring pixels along both the horizontal and vertical axes. These filters are more complex
and computationally intensive, but they can capture finer details of edges and other features in
an image.
In summary, first-order derivative filters are simpler and faster but may not capture all the details
in an image, while second-order derivative filters are more complex and slower but can capture
finer details. The choice of which filter to use depends on the specific task and the trade-off
between accuracy and computational efficiency.
Applications beyond traditional computer vision: Gradient computation has many
potential applications beyond traditional computer vision tasks, such as in medical
imaging and natural language processing. Future research may explore these
applications and develop new methods to address the unique challenges they present.
EDGE DETECTION
The concept of edge detection is used to detect the location and presence of edges by making changes in
the intensity of an image. Different operations are used in image processing to detect edges. It can detect
the variation of grey levels but it quickly gives response when a noise is detected. In image processing,
edge detection is a very important task. Edge detection is the main tool in pattern recognition, image
segmentation and scene analysis. It is a type of filter which is applied to extract the edge points in an
image. Sudden changes in an image occurs when the edge of an image contour across the brightness of
the image.
In image processing, edges are interpreted as a single class of singularity. In a function, the singularity is
characterized as discontinuities in which the gradient approaches are infinity.
As we know that the image data is in the discrete form so edges of the image are defined as the local
maxima of the gradient. lll
Mostly edges exits between objects and objects, primitives and primitives, objects and background. The
objects which are reflected back are in discontinuous form. Methods of edge detection study to change a
single pixel of an image in gray area.
Edge detection is mostly used for the measurement, detection and location changes in an image gray.
Edges are the basic feature of an image. In an object, the clearest part is the edges and lines. With the help
of edges and lines, an object structure is known. That is why extracting the edges is a very important
technique in graphics processing and feature extraction.
The basic idea behind edge detection is as follows:
1. To highlight local edge operator use edge enhancement operator.
2. Define the edge strength and set the edge points.
NOTE: edge detection cannot be performed when there are noise and blurring image.
Sobel Edge detection operator consists of 3x3 convolution kernels. Gx is a simple kernel and Gy is
rotated by 90°
These Kernels are applied separately to input image because separate measurements can be produced in
each orientation i.e Gx and Gy.
Following is the gradient magnitude:
3. Laplacian of Gaussian
The Laplacian of Gaussian is a 2-D isotropic measure of an image. In an image, Laplacian is the
highlighted region in which rapid intensity changes and it is also used for edge detection. The Laplacian is
applied to an image which is been smoothed using a Gaussian smoothing filter to reduce the sensitivity of
noise. This operator takes a single grey level image as input and produces a single grey level image as
output.
Following is the Laplacian L(x,y) of an image which has pixel intensity value I(x, y).
In Laplacian, the input image is represented as a set of discrete pixels. So discrete convolution kernel
which can approximate second derivatives in the definition is found.
3 commonly used kernels are as following:
This is 3 discrete approximations which are used commonly in Laplacian filter.
Following is 2-D Log function with Gaussian standard deviation:
4. Prewitt operator
Prewitt operator is a differentiation operator. Prewitt operator is used for calculating the approximate
gradient of the image intensity function. In an image, at each point, the Prewitt operator results in gradient
vector or normal vector. In Prewitt operator, an image is convolved in the horizontal and vertical direction
with small, separable and integer-valued filter. It is inexpensive in terms of computations.
Bounding boxes
Pixel-wise segmentation
Single label or category around detected
masks
Output objects
In semantic image segmentation, we categorize image pixels based on their semantic meaning,
not just their visual properties. This classification system often uses two main categories: Things
and Stuff.
Things: Things refer, to countable objects or distinct entities in an image with clear
boundaries, like people, flowers, cars, animals etc. So, the segmentation of “Things”
aims to label individual pixels in the image to specific classes by delineating the
boundaries of individual objects within the image
Stuff: Stuff refers to specific regions or areas in an image different elements in an image
like background or repeating patterns of similar materials which can not be counted like
road, sky and grass which may not have clear boundaries but play a crucial role in
understanding the overall context in an image. The segmentation of “Stuff” involves
grouping of pixels in an image into clearly identifiable regions based on the common
properties like colour, texture or context.
Semantic segmentation
Semantic Segmentation is one of the different types of image segmentation where a class label
is assigned to image pixels using deep learning (DL) algorithm. In Semantic Segmentation,
collections of pixels in an image are identified and classified by assigning a class label based on
their characteristics such as colour, texture and shape. This provides a pixel-wise map of an
image (segmentation map) to enable more detailed and accurate image analysis.
For example, all pixels related to a ‘tree’ would be labelled the same object name without
distinguishing between individual trees. Another example would be, group of people in an image
would be labelled as single object as ‘persons’, instead of identifying individual people.
Instance segmentation
Instance segmentation in image segmentation of computer vision task is a more sophisticated
feature which involves identifying and delineating each individual object within an image. So
instance segmentation goes beyond just identifying objects in an image, but also delineate the
exact boundaries of each individual instance of that object.
So, the key focus of instance segmentation is to differentiate between separate objects of the
same class. for example, if there are many cats in a image, instance segmentation would
identify and outline each specific cat. The segmentation map is created for each individual pixel
and separate labels are assigned to specific object instances by creating different coloured
labels which will represent different ‘cat’ in the group of cats in an image.
Instance segmentation is useful in autonomous vehicles to identify individual objects like
pedestrians, other vehicles and any objects along the navigation route. In medical imaging,
analysing scan images for detection of specific abnormalities are useful for early detection of
cancer and other organ conditions.
Panoptic segmentation
Panoptic segmentation goes a step further in image segmentation of computer vision tasks, by
combining the features and processes of semantic and instance segmentation techniques. So
the panoptic segmentation algorithm creates a comprehensive image analysis by
simultaneously classifying every pixel and identifying distinct object instances of the same class.
So, from an image with multiple cars and pedestrians in an traffic signal, the panoptic
segmentation would label all ‘pedestrians’ and ‘cars’ (semantic segmentation) and draw
bounding boxes around them to identify and segment each individual persons and cars and also
classifying the different surrounding scenarios like road signals, traffic lights and all other
building or backgrounds. So panoptic segmentation detects and interprets everything within a
given image.
Panoptic segmentation leverages the strengths of fully convolutional networks (FCN) for
semantic context and Mask R-CNN for instance-specific details, which gives a combined output
for achieving a more holistic and nuanced understanding of visual data.
Traditional image segmentation techniques
The traditional image segmentation techniques which formed the foundation of modern image
segmentation methods using deep learning algorithms, uses thresholding, edge detection,
Region-Based Segmentation, clustering algorithms and Watershed Segmentation. These
techniques are more reliant on principle of image processing, mathematical operation and
heuristics to separate an image into meaningful regions.
Thresholding: This method involves selecting a threshold value and classifying image
pixels between foreground and background based on intensity values
Edge Detection: Edge detection method identify abrupt change in intensity or
discontinuation in the image. It uses algorithms like Sobel, Canny or Laplacian edge
detectors.
Region-based segmentation: This method segments the image into smaller regions
and iteratively merges them based on predefined attributes in colour, intensity and
texture to handle noise and irregularities in the image.
Clustering Algorithm: This method uses algorithms like K-means or Gaussian models
to group object pixels in an image into clusters based on similar features like colour or
texture.
Watershed Segmentation:The watershed segmentation treats the image like a
topographical map where the watershed lines are identifies based on pixel intensity and
connectivity like water flowing down different valleys.
These traditional methods offer basic techniques of image segmentation with limitations, but
provide foundation for more advanced methods.
Deep learning image segmentation models
Deep learning image segmentation models are a powerful technique which leverages the neural
network architecture to automatically divide an image into different segments and extract
features from images for accurate analysis and segmentation tasks.
Below are some of the popular deep learning models used for image segmentation:
U-Net: This model uses U-Shaped network to efficiently segment medical images. This
model is very efficient in working with small amount of data and provide precise
segmentation.
Fully Convolutional Network (FCN):This model has the ability to process image of any
size and output spatial maps. This is achieved by replacing fully connected layers in a
conventional CNN with convolutional layers. This helps in segmenting an entire image
pixel by pixel.
SegNet: This model includes a encoder-decoder network, used for tasks like scene
understanding and object recognition. The encoder here captures the context in the
image and the decoder performs the precise localization and segmentation objects by
using the context.
DeepLab: The key feature of DeepLab is the use of atrous convolutions used to capture
multi-scale context with multiple parallel filters.
Mask R-CNN: This model extents the Faster R-CNN object detection framework, by
adding a branch for predicting segmentation masks alongside bounding box regression.
Vision Transformer (ViT): A new model that applies transformers to image
segmentation. The image is divided into patches and processes them sequentially to
understand the global context of the image.
Applications of Image segmentation
Below are the list of different uses cases of Image Segmentation in Image processing:
Autonomous Vehicles: Image segmentation helps autonomous vehicles in identifying
and segmenting objects like real time road lane detections, vehicles, pedestrians, traffic
signs for safe navigation.
Medical Imaging Analysis: Image segmentation used for segmenting organs, tumours
and other anatomical structures from medical images like X-Rays, MRIs, and CT Scans,
helps in diagnosis and treatment planning.
Satellite Image Analysis: Used in analysing satellite images for landcover
classification, urban planning, and environmental changes.
Object Detection and Tracking: Segmenting different objects in image or video for
different tasks like person detection, anomaly detection, and detecting different activities
in security systems.
Content Moderation: Used in monitoring and segmenting inappropriate content from
images or videos for social media platforms.
Smart Agriculture: Image segmentation methods are used by farmers and agronomists
for crop health monitoring, estimating yield and detect plant diseases from images and
videos.
Industrial Inspection: Image segmentation helps in manufacturing process for quality
control, detecting defects in products.
Conclusion:
In this article about Image Segmentation in image process, we have discussed about one of the
key computer vision tasks and how this process helps image processing and analysis in many
different fields including medical image analytics for diagnosis and planning better treatment
methods. Also this article delves into the traditional image segmentation models over how
advanced deep learning models are used today in image processing and segmentation tasks.