0% found this document useful (0 votes)
3 views

Cv Unit 3 Feature Detection

The document discusses feature detection in computer vision, covering methods such as edge detection, corner detection, and various operators like Sobel, Prewitt, and Canny. It explains the concepts of feature detection and extraction, detailing practical applications and limitations of different edge detection techniques. Additionally, it outlines the Hough Transform for line detection and the Harris Corner Detector for identifying corners in images.

Uploaded by

Presha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Cv Unit 3 Feature Detection

The document discusses feature detection in computer vision, covering methods such as edge detection, corner detection, and various operators like Sobel, Prewitt, and Canny. It explains the concepts of feature detection and extraction, detailing practical applications and limitations of different edge detection techniques. Additionally, it outlines the Hough Transform for line detection and the Harris Corner Detector for identifying corners in images.

Uploaded by

Presha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

lOMoARcPSD|48134016

CV-Unit 3-Feature Detection

Computer Vision (Gujarat Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Presha Patel ([email protected])
lOMoARcPSD|48134016

Unit 3

Feature Detection: edge detection, corner detection, line and curve


detection, active contours, SIFT and HOG descriptors, shape context
descriptors, Morphological operations

What is Feature?

In computer vision and image processing, a feature is a piece of information about the content of an
image; a certain region of the image has certain properties.
Features may be specific structures in the image such as points, edges or objects. Features may also
be the result of a general neighborhood operation or feature detection applied to the image.

Feature detection

● In computer vision and image processing the concept of feature detection refers to methods
that aim at computing abstractions of image information and making local decisions at every
image point whether there is an image feature of a given type at that point or not.
● The resulting features will be subsets of the image domain, often in the form of isolated
points, continuous curves or connected regions.

Feature extraction

In pattern recognition and in image processing, feature extraction is a special form of


dimensionality reduction.
When the input data to an algorithm is too large to be processed then the input data will be
transformed into a reduced representation set of features (also named features vector). Transforming
the input data into the set of features is called feature extraction.

Feature detection = how to find some interesting points (features) in the image. (For
example, find a corner, find a template, and so on.)

Feature extraction = how to represent the interesting points we found to compare them with
other interesting points (features) in the image.

Practical example: You can find a corner with the harris corner method, but you can describe it with
any method you want (Histograms, HOG, Local Orientation in the 8th adjacency for instance)

Edge Detection

● Edges are significant local changes of intensity in a digital image. An edge can be defined as
a set of connected pixels that forms a boundary between two disjoint regions. There are
three types of edges:
○ Horizontal edges
○ Vertical edges
○ Diagonal edges

● Edge Detection is a method of segmenting an image into regions of discontinuity. It is a


widely used technique in digital image processing like
○ pattern recognition

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ image morphology
○ feature extraction
● Edge detection allows users to observe the features of an image for a significant change in
the gray level. This texture indicates the end of one region in the image and the beginning of
another.
● It reduces the amount of data in an image and preserves the structural properties of an
image.

Edge Detection Operators are of two types:

Sobel Operator:
● It is a discrete differentiation operator.
● The Sobel edge detection operator extracts all the edges of an image, without worrying
about the directions. The main advantage of the Sobel operator is that it provides a
differencing and smoothing effect.
● Sobel edge detection operator is implemented as the sum of two directional edges. And the
resulting image is a unidirectional outline in the original image.

● Sobel Edge detection operator consists of 3x3 convolution kernels. Gx is a simple kernel
and Gy is rotated by 90°
● These Kernels are applied separately to the input image because separate measurements can
be produced in each orientation i.e Gx and Gy.
● Advantages:
○ Simple and time efficient computation
○ Very easy at searching for smooth edges
● Limitations:
○ Diagonal direction points are not preserved always

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ Highly sensitive to noise


○ Not very accurate in edge detection
○ Detect with thick and rough edges does not give appropriate results

Prewitt Operator:
● This operator is almost similar to the sobel operator.
● It also detects vertical and horizontal edges of an image. It is one of the best ways to detect
the orientation and magnitude of an image.
● It uses the kernels or masks –

● Advantages:
○ Good performance on detecting vertical and horizontal edges
○ Best operator to detect the orientation of an image
● Limitations:
○ The magnitude of coefficient is fixed and cannot be changed
○ Diagonal direction points are not preserved always

Robert Operator:

● This gradient-based operator computes the sum of squares of the differences between
diagonally adjacent pixels in an image through discrete differentiation.
● Robert's cross operator is used to perform 2-D spatial gradient measurement on an image
which is simple and quick to compute. In Robert's cross operator, at each point pixel values
represent the absolute magnitude of the input image at that point.
● Robert's cross operator consists of 2x2 convolution kernels. Gx is a simple kernel and Gy is
rotated by 90o

● Advantages:
○ Detection of edges and orientation are very easy
○ Diagonal direction points are preserved
● Limitations:
○ Very sensitive to noise
○ Not very accurate in edge detection

Marr-Hildreth Operator or Laplacian of Gaussian (LoG):

● It is a gaussian-based operator which uses the Laplacian to take the second derivative of an
image. This really works well when the transition of the grey level seems to be abrupt.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● It works on the zero-crossing method i.e when the second-order derivative crosses zero, then
that particular location corresponds to a maximum level. It is called an edge location.
● Here the Gaussian operator reduces the noise and the Laplacian operator detects the sharp
edges.
Laplacian, the input image is represented as a set of discrete pixels. 3 commonly used
kernels are as following:

This is 3 discrete approximations which are used commonly in Laplacian filters.

● The Gaussian function is defined by the formula:

And the LoG operator is computed from

● Advantages:
○ Easy to detect edges and their various orientations
○ There is fixed characteristics in all directions

● Limitations:
○ Very sensitive to noise
○ The localization error may be severe at curved edges
○ It generates noisy responses that do not correspond to edges, so-called “false edges”

Canny Operator:
https://towardsdatascience.com/canny-edge-detection-step-by-step-in-python-computer-vision-b49c3a2d8123

● It is a gaussian-based operator in detecting edges. This operator is not susceptible to noise. It


extracts image features without affecting or altering the feature.
● Canny edge detectors have advanced algorithms derived from the previous work of
Laplacian of Gaussian operators.
● It is widely used as an optimal edge detection technique. It detects edges based on three
criteria:
○ Low error rate
○ Edge points must be accurately localized
○ There should be just one single edge response
● Algorithm:
1. The input image is smoothed using a Gaussian low-pass filter, with a specified value of σ
2. The local gradient (intensity and direction) is computed for each point in the smoothed
image.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

3. The edge points at the output of step 2 result in wide ridges. The algorithm thins those
ridges, leaving only the pixels at the top of each ridge.
4. The ridge pixels are then thresholded using two thresholds Tlow and Thigh: ridge pixels
with values greater than Thigh are considered strong edge pixels; ridge pixels with values
between Tlow and Thigh are said to be weak pixels. This process is known as hysteresis
thresholding.
5. The algorithm performs edge linking, aggregating weak pixels that are 8- connected 2 to the
strong pixels.
● Advantages:
○ It has good localization
○ It extract image features without altering the features
○ Less Sensitive to noise
● Limitations:
○ There is false zero crossing
○ Complex computation and time consuming

Convolution Process in edge detection:

● Example :Lets take mask matrix of Prewitt Operator.

● Some Real-world Applications of Image Edge Detection:


○ medical imaging, study of anatomical structure
○ locate an object in satellite images
○ automatic traffic controlling systems ,
○ face recognition, and fingerprint recognition

Corner Detection

A corner can be defined as the intersection of two edges. A corner can also be defined as a point for
which there are two dominant and different edge directions in a local neighbourhood of the point .

TYPES OF CORNER DETECTORS

1 Template based corner detection:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Template based corner detection methods use different representative templates to match the image.
Correlations between templates and the image are used to detect corners.The detection performance
highly depends on the choice of appropriate templates.
After the correlations between the templates and the image are determined. An appropriate
threshold should be carefully chosen to determine the existence of corners.

2 Contour based corner detection

Contour based corner detection methods are based on edge detection. In this category of methods,
edges in the image are detected first. Then, the corner is detected along the contour.

3 Direct corner detection methods

Direct corner detection methods use mathematical computations to detect the corner. This category
of methods usually applies some statistical operations to the image first. Then, corners are detected
based on statistical information.

Moravec detector

The principle of this detector is to observe if a sub-image, moved around one pixel in all directions,
changes significantly. If this is the case, then the considered pixel is a corner.

Fig. Principle of Moravec detector. From left to right : on a flat area, small shifts in the sub-image
(in red) do not cause any change; on a contour, we observe changes in only one direction; around a
corner there are significant changes in all directions.¶

● Mathematically, the change is characterized in each pixel (m,n) of the image by Em,n(x,y)
which represents the difference between the sub-images for an offset (x,y):

where:
x and y represent the offsets in the four directions: (x,y)∈{(1,0),(1,1),(0,1),(−1,1)},
wm,n is a rectangular window around pixel (m,n),
f(u+x,v+y)−f(u,v) is the difference between the sub-image f(u,v) and the offset patch
f(u+x,v+y),

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● In each pixel (m,n), the minimum of Em,n(x,y) in the four directions is kept and denoted
Fm,n. Finally, the detected corners correspond to the local maxima of Fm,n, that is, at pixels
(m,n) where the smallest value of Em,n(x,y) is large.

It turns out that the Moravec detector has several limitations.


1. w is a binary window and therefore the detector considers all pixels in the window with the
same weight. When the noise in the image is high, it can lead to false corner detections.
2. Only four directions are considered.
3. The detector remains very sensitive to edges because only the minimum of E is considered.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Harris Corner Detector

The Harris corner detection algorithm also called the Harris & Stephens corner detector is one of
the simplest corner detectors available.
The idea is to locate interest points where the surrounding neighbourhood shows edges in more than
one direction. The basic idea of algorithm is to find the difference in intensity for a displacement of
(u,v) in all directions which is expressed as below:

Window function is either a rectangular window or a gaussian window which gives weights to
pixels at (x,y). The above equation can be further approximated using Tayler expansion which gives
us the final formula as:

where,

Ix and Iy are image derivatives in x and y directions respectively. One can compute the derivative
using the sobel kernel.

Then we finally find the Harris response R given by:

where,

where A, B and C are shifts of windows defined by w. The lambdas are the Eigenvalues of M.

We find the corners using the value of R.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Process of Harris Corner Detection Algorithm

1. Color image to Grayscale conversion


2. Spatial derivative calculation
3. Structure tensor setup
4. Harris response calculation
5. Find edges and corners using R
https://www.ijedr.org/papers/IJEDR1404047.pdf

Line and Curve Detection

Hough Transform

Prior to applying Hough transform:

• Compute edge magnitude from input image.


• As always with edge detection simple lowpass filtering can be applied first.
• Threshold the gradient magnitude image. Thus, we have n pixels that may partially describe the
boundary j of some objects.
• We wish to find sets of pixels that make up straight lines.
• Regard a point (xi ; yi ) and a straight line yi= axi + b
– There are many lines passing through the point (xi ,yi ).
– Common to them is that they satisfy the equation for some set of parameters (a b)

This equation can obviously be rewritten as


• We now consider x and y as parameters and a and b as variables.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

•This is a line in (a b, ) space parameterized by x and y. So a single point in xy-space gives a line in
(a,b) space.

The fact is that all points on the line defined by (x y) The fact is that all points on the line defined
by (x,y) and (z,k) in (x,y) space will parameterize lines that intersect in (a’,b’) in (a,b) space
• Points that lie on a line will form a “cluster of crossings” in the (a,b) space.

Accumulator Space:

● Quantize the parameter space (a,b), that is, divide it into cells.This quantized space is often
referred to as the accumulator cells.
● amax is the maximum value of a and amin is the minimal value of a etc. Count the number of
times a line intersects a given cell.
● For each point (x,y) with value 1 in the binary image, find the values of (a,b) in the range
[[amin,amax],[bmin,bmax]] defining the line corresponding to this point.Increase the value of the
accumulator for these [a’,b’] points.Then proceed with the next point in the image.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Cells receiving a minimum number of “votes” are assumed to correspond to lines in (x,y)
space. Lines can be found as peaks in this accumulator space.

polar representation of lines:

● The polar (also called normal) representation of straight lines is x cosθ + y sinθ = ρ
● Each point (xi ,yi ) in the xy-plane gives a sinusoid in the ρ-θ plane.
● M collinear point lying on the line will give M curves that intersect at (ρi ,θj ) in the
parameter
xi cosθ + yi sinθ = ρ
● will give M curves that intersect at (ρi ,θj ) in the parameter plane.
● Local maxima give significant lines.

● The intersection point (ρ0,θ0) corresponds to the lines that y passes through two points
(x1 ,y1 ) and (x2 ,y2 )
● A horizontal horizontal line will have θ=0 and ρ equal to the intercept with the y axis.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● A vertical line will have θ=90 and ρ equal to the intercept with the x axis.

Algorithm Hough Transform:


Step 1: Partition the ρθ-plane into accumulator cells A[ρ,θ], ρ∈[ρmin, ρmax]; θ∈[θmin, θmax]

Step 2: The range of θ is ±90°


i. Horizontal lines have θ=0°, ρ≥0
ii. Vertical lines have θ=90°, ρ≥0

● The range of The range of ρ is ±N√2 if the image is of size NxN


● The discretization of The discretization of θ and ρ must happen with values δθ and δρ
giving acceptable precision and sizes of the parameter space.
● The cell (i j) corresponds to the square associated with parameter values (θj , ρi ).

Step 3:Initialize all cells with value 0.

Step 4: For each foreground point (xk,yk) in the thresholded edge image
Let θj equal all the possible θ-values

Step 5: Solve for ρ using ρ=x cos θj +ysin θj


Round ρ to the closest cell value, ρq
Increment A(i,q) if the θj results in ρq

Output : After this procedure, A(i,j)=P means that P points in the xy-space lie on the line space lie
on the line ρj =x cos θj +y sin θj

Find line candidates where A(i,j) is above a suitable threshold value.

Advantages:
● Conceptually simple.
● Easy implementation Easy implementation.
● Handles missing and occluded data very gracefully.
● Can be adapted to many types of forms, not just lines.
Disadvantages:
● Computationally complex for objects with many parameters.
● Looks for only one single type of object Looks for only one single type of object.
● Can be “fooled” by “apparent lines”.
● The length and the position of a line segment cannot be determined.
● Collinear line segments cannot be separated

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Active Contours

● Active contour is a type of segmentation technique which can be defined as use of energy
forces and constraints for segregation of the pixels of interest from the image .
● Contours are boundaries designed for the area of interest required in an image. Contour is a
collection of points that undergoes an interpolation process. The interpolation process can be
linear, splines and polynomials which describe the curve in the image .
● The main application of active contours in image processing is to define smooth shape in the
image and form a closed contour for the region.
● Energy functional is always associated with the curve defined in the image.
○ External energy is defined as the combination of forces which is used to control the
positioning of the contour onto the image
○ internal energy, to control the deformable changes.
● The desired contour is obtained by defining the minimum of the energy functional.
Deforming of the contour is described by a collection of points that finds a contour.

1. Snake model

● The model mainly works to identify and outline the target object considered for
segmentation. It uses a certain amount of prior knowledge about the target object contour
especially for complex objects.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Active snake model configures by spline , focussed to minimise energy followed by various
forces governing the image. Spline is a mathematical expression of a set of polynomials to
derive geometric figures like curves.
● Spline of minimising energy guides the constraint forces and with the help of internal and
external image forces based on appropriate contour features.
● Snake works efficiently with complex target objects by breaking down the figure into
various smaller targets.
● The parametric form of the curve is exploited in the Snake model that has more advantages
than utilising implicit and explicit curve forms

where x and y are the coordinates of the two-dimensional curve, v is spline parameter in the range
0–1, s is linear parameter ∈ [0,1] and t is time parameter ∈ [0, ∞].

Step 1: Initialize the boundary curve (the active contour)


● Initialization of spline parameter can be Automatically, Manually, or Semi-automatically
● Some snake algorithms require initialization entirely inside or outside of the object.
● It is usually best to initialize on the “cleaner” side of the boundary.
○ Marks 1 or more points inside the object
○ Marks 1 or more boundary points
—and/or—
○ Possibly draws a simple curve, such as an ellipse

Step 2:The contour moves :For Moving the Contour there are two common philosophies:
● Energy minimization
○ “Ad-hoc” energy equation describes how good the curve looks, and how well it
matches the image
○ “Visible” image boundaries represent a low energy state for the active contour
○ The curve is (typically) represented as a set of sequentially connected points.Each
point is connected to its 2 neighboring points.
Two Terms :::: Internal Energy + External Energy
○ External Energy : Also called image energy
○ Designed to capture desired image features
○ Internal Energy :::: Also called shape energy
○ Designed to reduce extreme curvature and prevent outlier points
○ The total energy of active snake model is a summation of three types of energy
namely
(i) internal energy (Ei) which depends on the degree of the spline relating to the
shape of the target image;
(ii) external energy (Ee) which includes the external forces given by the user and
also energy from various other factors;
(iii) energy of the image under consideration (EI) which conveys valuable data on
the illumination of the spline representing the target object.
○ The total energy defined for the contour formation in the snake model is given by

○ Einternal describes the internal energy which defines piecewise smoothness constraints
in the contour, where α decides on how far the snake will be extended and the
capacity of elasticity possible for the snake. β decides on the rigidity level for the
snake.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Eexternal energy constraints are mainly used to define the snake near the required local minimum. It
may be described using high level interpretation and interaction.

The contour of the target object is shown in the above, where w1 is called the line efficient and w2
is called the edge efficient. According to the higher values of w1 and w2, snake will align itself to
darker pixel regions in the case of positive value and it progresses towards
the bright pixels when the value is negative.

● Numerically optimize the curve : Partial differential equations (PDEs)

○ “Active” contour Looks like a wiggling “snake”


○ A different method for moving the active contour’s points
○ Used by “Level Sets” and Operates on discrete “time steps”
○ Snake points move normally to the curve (at each “time step”) .They points move a
distance determined by their speed.
○ Speed is usually a product of internal and external terms:
■ s(x,y) = sI(x,y)sE(x,y)
○ Internal (shape) speed: sI(x,y) = 1 - ek(x,y),where k(x,y) measures the snake’s
curvature at (x,y)
○ External (image) speed: sE(x,y) = (1+D(x,y) )-1, where D(x,y) measures the image’s
edginess at (x,y)
Step 3: The contour stops moving When many/most points on the contour line up with edge pixels

● Snake model used for segmentation of various types of images.


● The applications of active snake model are increasing in a tremendous manner especially in
the various imaging fields.
● The traditional method of active snake model has several inefficiencies like insensitivity to
noises, false contour detection in high complex objects which are solved in advanced
versions of contour methods.

2. Gradient vector flow model

● Gradient vector flow model is an extended and well-defined technique of snake or active
contour models.
● The traditional snake model possesses two limitations that is poor convergence performance
of the contour for concave boundaries and when the snake curve flow is initiated at long
distance from the minimum.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Gradient vector flow (GVF) field is determined based on the following steps.
Step 1 : The primary step is to detect the edge mapping function f(x, y) from the image I(x, y). Edge
mapping function for binary images is described by

● G is a 2D gaussian function with the statistical parameter, standard deviation σ.


● Edge map function for grey-scale images is given by ,

where the gradient operator is ∇

Step 2 : Gradient vector flow field is the equilibrium solution that reduces the functional energy.
● The functional energy possesses two different terms such as
○ smoothing term and
○ data term which depends on the parameter μ.
● The parameter value is based on the noise level in the image, that is if the noise level is high
then the parameter has to be increased.
● Limitation is an increase in the value of μ that reduces the rounding of edges but weakens
the smoothing condition of the contour to a certain extent.
● The gradient vector flow is defined by the energy functional

● In this equation, g describes the gradient vector flow which can be derived based on the
Euler equations.

● Computational solutions to calculate fx and fy in the equation are obtained by using


common gradient operators such as sobel, prewitt, or isotropic operators.
● Based on these parameters the gradient vector flow field is defined. After the determination
of the GVF field g(x,y) it is used to replace the energy constraints in the traditional snake
model.
● Extended version of snake in the form of gradient vector field is used in all medical image
processing applications.

3. Balloon model

If a snake smaller than minima contour will not find the minima and continue to shrink. To
overcome the limitations of the snake model, the balloon model was introduced in which the
inflation term is induced into the forces acting on the snake.
The additional inflation force is given by

Here k1 should possess similar magnitude as that of the image normalisation vector k

Algorithm:
● It will locate an area in the volume, then place an icosahedron in that area such that it
contains no points. Expand (or) subdivide the icosahedron according to force.
● Starts with a small icosahedron inside the object.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● The major two forces act on each vertex.


○ Inflation force is used to push the vertices out
○ spring force is calculated based upon one ring neighbourhood of each vertex.
● Expansion algorithm is a set of instructions used to create a front of the icosahedron which
has all the faces.
○ First, insert the front section into the instructions queue.
○ For each vertex in the front, it is used to calculate the spring force and inflation
force.
○ Now, compute the new location. Compute the nearest point from the dataset. Then,
update the co-ordinator. The expanding triangles will reconstruct the surface less
accurately due to their large size. While expanding the spring forces between
vertices become very large.
○ Subdivide triangles to reduce force. A triangle becomes anchored once it reaches the
surface of the point cloud. If a triangle is anchored it no longer moves, all the
vertices are stationary.

● Geometric contours can be obtained based on regions and edges in the curvature of the
image. Edge-based geometric active contours define a geometric flow curve evolution
depending on the gradients of edges or boundaries in the image that undergoes contour
segmentation.
● Edge-based geometric models possess fast computation speed and can simultaneously
segment different regions of different intensities. In some regions, penetration of the gap
in-between the curvature occurs due to large gradient magnitudes.
● Region-based geometric contour models are based on either the variance inside and outside
contour or the squared difference between average intensities inside and outside the contours
along with the total contour length.
● Geometric active contours are mainly employed in medical image computing in
image-based segmentation.
● In general, active contour models possess different extended versions with change either in
the form of energy constraints or forces. New contour models are designed for the
segmentation of absolute details of the image.

SIFT descriptors

● The scale-invariant feature transform (SIFT) is an algorithm used to detect and describe
local features in digital images.
○ It locates certain key points
○ then furnishes them with quantitative information (so-called descriptors)
○ It will be used for object recognition.

There are mainly four steps involved in the SIFT algorithm.


Step 1 : Scale-space peak Selection
● Real world objects are meaningful only at a certain scale.
● The scale space of an image is a function L(x,y,σ) that is produced from the convolution of a
Gaussian kernel(Blurring) at different scales with the input image.
● Scale-space is separated into octaves
● The number of octaves and scale depends on the size of the original image. Each octave’s
image size is half the previous one.
● Blurring:
○ Mathematically, “blurring” is referred to as the convolution of the Gaussian operator
and the image.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ Gaussian blur has a particular expression or “operator” that is applied to each pixel.
(Blurred Image)
G is the Gaussian Blur operator and I is an image.
While x,y are the location coordinates
σ is the “scale” parameter.
Greater the value, greater the blur.

(GaussianBlur Operator)

● DOG(Difference of Gaussian kernel)

● Blurred images are used to generate another set of images, the Difference of
Gaussians (DoG). These DoG images are great for finding out interesting key points
in the image.
● The difference of Gaussian is obtained as the difference of Gaussian blurring of an
image with two different σ, let it be σ and kσ.
:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Finding keypoints

○ One pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next
scale and 9 pixels in previous scales. This way, a total of 26 checks are made. If it is
a local extrema, it is a potential keypoint. It basically means that keypoint is best
represented in that scale.

Step 2 Keypoint Localization

● Key Points generated in the previous step produce a lot of keypoints. Some of them are not
as useful as features.
● Taylor series expansion of scale space is used to get a more accurate location of extrema,
and if the intensity at this extrema is less than a threshold value (0.03), it is rejected.
● It will be used a 2x2 Hessian matrix (H) to compute the principal curvature.

Step 3 Orientation Assignment

● To assign an orientation to each keypoint to make it rotation invariance.


● A neighborhood is taken around the keypoint location depending on the scale, and the
gradient magnitude and direction is calculated in that region.
● An orientation histogram with 36 bins covering 360 degrees is created.
○ Let's say the gradient direction at a certain point (in the “orientation collection
region”) is 18.759 degrees, then it will go into the 10–19-degree bin.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ The “amount” that is added to the bin is proportional to the magnitude of the
gradient at that point.

● From the bins histogram will be generated.The highest peak in the histogram is taken and
any peak above 80% of it is also considered to calculate the orientation. It creates keypoints
with the same location and scale, but different directions.

Step 4 Keypoint descriptor

● At this point, each keypoint has a location, scale, orientation. Next is to compute a
descriptor for the local image region about each keypoint that is highly distinctive and
invariant as possible to variations such as changes in viewpoint and illumination.
● To do this, a 16x16 window around the keypoint is taken. It is divided into 16 sub-blocks of
4x4 size.

For each sub-block, an 8 bin orientation histogram is created.

So 4 X 4 descriptors over 16 X 16 sample arrays were used in practice. 4 X 4 X 8 directions give


128 bin values. It is represented as a feature vector to form a keypoint descriptor.

1. Rotation dependence The feature vector uses gradient orientations. Clearly, if you rotate
the image, everything changes.To achieve rotation independence, the keypoint’s rotation is
subtracted from each orientation. Thus each gradient orientation is relative to the keypoint’s
orientation.
2. Illumination dependence If we threshold numbers that are big, we can achieve illumination
independence. So, any number (of the 128) greater than 0.2 is changed to 0.2. This resultant
feature vector is normalized again. And now you have an illumination independent feature
vector!

Step 5 Keypoint Matching

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Keypoints between two images are matched by identifying their nearest neighbors. But in some
cases, the second closest-match may be very near to the first.In that case, the ratio of
closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It
eliminates around 90% of false matches.

HOG Descriptors
HOG, or Histogram of Oriented Gradients, is a feature descriptor that is often used to extract
features from image data.
● The HOG descriptor focuses on the structure or the shape of an object. In the case of edge
features, we only identify if the pixel is an edge or not. HOG is able to provide the edge
direction as well. This is done by extracting the gradient and orientation (magnitude and
direction) of the edges

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● These orientations are calculated in ‘localized’ portions. This means that the complete
image is broken down into smaller regions and for each region, the gradients and orientation
are calculated.
● Finally the HOG would generate a Histogram for each of these regions separately. The
histograms are created using the gradients and orientations of the pixel values, hence the
name ‘Histogram of Oriented Gradients’

To put a formal definition to this:

The HOG feature descriptor counts the occurrences of gradient orientation in localized
portions of an image.
Process of Calculating the Histogram of Oriented Gradients (HOG)

Step 1: Preprocess the Data (64 x 128)

● We need to preprocess the image and bring down the width to height ratio to 1:2. The image
size should preferably be 64 x 128.
● This is because we will be dividing the image into 8*8 and 16*16 patches to extract the
features.

Step 2: Calculating Gradients (direction x and y)

The next step is to calculate the gradient for every pixel in the image. Gradients are the small
changes in the x and y directions.

● Get the pixel values for this patch.(the matrix shown here is only used as an example).
● For the pixel value 85. To determine the gradient in the x-direction, subtract the value on the
left from the pixel value on the right. To calculate the gradient in the y-direction, subtract the
pixel value below from the pixel value above the selected pixel.
● Hence the resultant gradients in the x and y direction for this pixel are:
○ Change in X direction(Gx) = 89 – 78 = 11
○ Change in Y direction(G y) = 68 – 56 = 8

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● This process will give us two new matrices – one storing gradients in the x-direction and the
other storing gradients in the y direction.
● The next step would be to find the magnitude and orientation using these values.

Step 3: Calculate the Magnitude and Orientation

Using the gradients we determine the magnitude and direction for each pixel value. For this step, we
will use the Pythagoras theorem .

The gradients are basically the base and perpendicular here. So, for example, we had Gx and Gy as
11 and 8. Total gradient magnitude:

Total Gradient Magnitude = √[(Gx)2+(Gy)2]

Total Gradient Magnitude = √[(11)2+(8)2] = 13.6

Next, calculate the orientation (or direction) for the same pixel.

tan(Φ) = Gy / Gx

Hence, the value of the angle would be: Φ = atan(Gy / Gx)

The orientation comes out to be 36 when we plug in the values. This way for every pixel value, we
have the total gradient (magnitude) and the orientation (direction). We need to generate the
histogram using these gradients and orientations.

Different Methods to Create Histograms using Gradients and Orientation

We take the angle or orientation on the x-axis and the frequency on the y-axis.

Method 1:We will take each pixel value, find the orientation of the pixel and update the frequency
table.For Ex: the process for the highlighted pixel (85). Since the orientation for this pixel is 36, we
will add a number against angle value 36, denoting the frequency:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

This frequency table can be used to generate a histogram with angle values on the x-axis and the
frequency on the y-axis.

Method 2:Here we have a bin size of 20. So, the number of buckets we would get here is 9.
For each pixel, store orientation into the frequency of the orientation values in the form of a 9 x 1
matrix. Plotting this would give us the histogram:

Method 3:Here is another way in which we can generate the histogram – instead of using the
frequency, we can use the gradient magnitude to fill the values in the matrix.

Here we are using the orientation value of 30, and updating the bin 20 only. Additionally, we should
give some weight to the other bin as well.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Method 4:Let’s make a small modification to the above method. Here, we will add the contribution
of a pixel’s gradient to the bins on either side of the pixel gradient. Remember, the higher
contribution should be to the bin value which is closer to the orientation.

Step 4: Calculate Histogram of Gradients in 8×8 cells (9×1)

The histograms created in the HOG feature descriptor are not generated for the whole image.
Instead, the image is divided into 8×8 cells, and the histogram of oriented gradients is computed for
each cell. If we divide the image into 8×8 cells and generate the histograms, we will get a 9 x 1
matrix for each cell. This matrix is generated using method 4 that

Once we have generated the HOG for the 8×8 patches in the image, the next step is to normalize the
histogram.

Step 5: Normalize gradients in 16×16 cell (36×1)

The gradients of the image are sensitive to the overall lighting. This means that for a particular
picture, some portion of the image would be very bright as compared to the other portions.

But we can reduce this lighting variation by normalizing the gradients by taking 16×16 blocks. Here
is an example that can explain how 16×16 blocks are created:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Here, we will be combining four 8×8 cells to create a 16×16 block. And we already know that each
8×8 cell has a 9×1 matrix for a histogram. So, we would have four 9×1 matrices or a single 36×1
matrix. To normalize this matrix, Mathematically, for a given vector V = [a1, a2, a3, ….a36]

The root of the sum of squares: k = √(a1)2+ (a2)2+ (a3)2+ …. (a36)2

And divide all the values in the vector V with this value k:

The resultant would be a normalized vector of size 36×1.

Step 6: Features for the complete image

Now, we will combine all (16 X16 blocks)these to get the features for the final image.

We would have 105 (7×15) blocks of 16×16. Each of these 105 blocks has a vector of 36×1 as
features. Hence, the total features for the image would be 105 x 36×1 = 3780 features.

Shape context descriptors


● Shape context is a feature descriptor used in object recognition.
● The basic idea is to pick n points on the contours of a shape.
● For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all
other points. The set of all these vectors is a rich description of the shape localized at that
point.
● For the point pi, the coarse histogram of the relative coordinates of the remaining n − 1
points,

is defined to be the shape context of p i .

● The bins are normally taken to be uniform in log-polar space. The shape contexts of two
different versions of the letter "A" are shown.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ (a) and (b) are the sampled edge points of the two shapes.
○ (c) is the diagram of the log-polar bins used to compute the shape context.
○ (d) is the shape context for the point marked with a circle in (a), (e) is that for the
point marked as a diamond in (b), and (f) is that for the triangle. As can be seen,
since (d) and (e) are the shape contexts for two closely related points, they are quite
similar, while the shape context in (f) is very different.
● For a feature descriptor to be useful, it needs to have certain invariances.
○ Translational invariance comes naturally to shape context.
○ Scale invariance is obtained by normalizing all radial distances by the mean distance
α between all the point pairs in the shape .

Step 1: Finding a list of points on shape edges

● The shape of an object is essentially captured by a finite subset of the points on the internal
or external contours on the object. These can be simply obtained using the Canny edge
detector and picking a random set of points from the edges.

Step 2: Computing the shape context

● For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all
other points. The set of all these vectors is a rich description of the shape localized at that
point.

Step 3: Computing the cost matrix

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts)
g(k) and h(k). As shape contexts are represented as histograms, it is natural to use the χ2 test
statistic as the "shape context cost" of matching the two points:

The values of this range from 0 to 1.

● In addition to the shape context cost, an extra cost based on the appearance can be added.

Its values also range from 0 to 1.

Now the total cost of matching the two points could be a weighted-sum of the two costs:

● Now for each point pi on the first shape and a point qj on the second shape, calculate the
cost as described and call it Ci,j. This is the cost matrix.

Step 4: Finding the matching that minimizes total cost

● Now, a one-to-one matching pi that matches each point pi on shape 1 and qj on shape 2 that
minimizes the total cost of matching,

Step 5: Modeling transformation

● Given the set of correspondences between a finite set of points on the two shapes, a
transformation can be estimated to map any point from one shape to the other.
● There are several choices for this transformation, described below.

Affine

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

and the translational offset vector o is obtained by:

Where

Step 6: Computing the shape distance

● Now, measure a shape distance between two shapes P and Q.This distance is going to be a
weighted sum of three potential terms:
○ Shape context distance: this is the symmetric sum of shape context matching costs
over best matching points:

○ Appearance cost: the sum of squared brightness differences in Gaussian windows


around corresponding image points:

where Ip and Iq are intensities,G is a Gaussian windowing function.

○ Transformation cost: The final cost measures how much transformation is necessary
to bring the two images into alignment.

Morphological operations
All morphological processing operations are based on mentioned terms.

Structuring Element: It is a matrix or a small-sized template that is used to traverse an image. The
structuring element is positioned at all possible locations in the image, and it is compared with the
connected pixels. It can be of any shape.

● Fit: When all the pixels in the structuring element cover the pixels of the object, we call it
Fit.
● Hit: When at least one of the pixels in the structuring element cover the pixels of the object,
we call it Hit.
● Miss: When no pixel in the structuring element cover the pixels of the object, we call it
miss.

Figure shows the visualization of terminologies used in morphological image processing.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Morphological Operations

The structuring element is moved across every pixel in the original image to give a pixel in a new
processed image. The value of this new pixel depends on the morphological operation performed.

1. Erosion

Erosion shrinks the image pixels, or erosion removes pixels on object boundaries. First, we traverse
the structuring element over the image object to perform an erosion operation, as shown in Figure .
The output pixel values are calculated using the following equation.(f + s ) - s

Pixel (output) = 1 {if FIT} Pixel (output) = 0 {otherwise}

An example of Erosion is shown in Figure . Figure(a) represents the original image, (b) and (c)
shows processed images after erosion using 3x3 and 5x5 structuring elements respectively.

Properties:

1. It can split apart joint objects.


2. It can strip away extrusions.

2. Dilation

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Dilation expands the image pixels, or it adds pixels on object boundaries. First, we traverse the
structuring element over the image object to perform an dilation operation, as shown in Figure . The
output pixel values are calculated using the following equation.(f U s)

Pixel (output) = 1 {if HIT} Pixel (output) = 0 {otherwise}

An example of Dilation is shown in Figure. Figure (a) represents original image, (b) and (c) shows
processed images after dilation using 3x3 and 5x5 structuring elements respectively.

Properties:

1. It can repair breaks


2. It can repair intrusions

Compound Operations

Most morphological operations are not performed using either dilation or erosion; instead, they are
performed by using both. Two most widely used compound operations are:

(a) Closing (by first performing dilation and then erosion), and

(b) Opening ,denoted by f * s, Computed by first performing erosion and then dilation

f *s = (f - s) + s

Figure shows both compound operations on a single object.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Application: Edge Extraction of an Object

Extracting the boundary is an important process to gain information and understand the feature of
an image. It is the first process in preprocessing to present the image’s characteristics. This
process can help the researcher to acquire data from the image. We can perform boundary extraction
of an object by following the below steps.

Step 1. Create an image (E) by erosion process; this will shrink the image slightly. The kernel size
of the structuring element can be varied accordingly.

Step 2. Subtract image E from the original image. By performing this step, we get the boundary of
our object.

Downloaded by Presha Patel ([email protected])

You might also like