0% found this document useful (0 votes)
10 views

06 Features

The document discusses features that can be extracted from images and videos. It describes different types of features including global, region-based, and local features. It also discusses key-points and interest points that can be used for tasks like detection, recognition, and tracking. The document covers techniques for finding features, including histogram of oriented gradients (HOG) and scale-invariant feature transform (SIFT).

Uploaded by

mohammadtestpi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

06 Features

The document discusses features that can be extracted from images and videos. It describes different types of features including global, region-based, and local features. It also discusses key-points and interest points that can be used for tasks like detection, recognition, and tracking. The document covers techniques for finding features, including histogram of oriented gradients (HOG) and scale-invariant feature transform (SIFT).

Uploaded by

mohammadtestpi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Features

Lecture 6
Outline
• Features introduction
• Key-points
• Histogram of Oriented gradients (HOG)
• Scale-Invariant Feature Transform (SIFT)
What is a Feature?
• Information extracted from an image/video.
• Hand-crafted
• Learned
• We can define a function
• Takes an image/video as an input
• Produces one or more numbers as output
• Hand-crafted features
• Feature engineering
• Learned features
• Automatically learned
Types of Features
• Global features
• Extracted from the entire image
• Examples: template (the image itself), HOG, etc.
• Region-based features
• Extracted from a smaller window of the image.
• Applying global method to a specific image region.
• Local features
• Describe a pixel, and the vicinity around a specific pixel.
• Local feature always refer to a specific pixel location.
Uses of Features
• Features can be used for many computer vision problems.
• Detection.
• Recognition.
• Tracking.
• Stereo estimation.
• Different types of features for different problems,
• Different assumptions about the images.
• That is why there are many different types of features.
Uses of Features: Matching
Uses of Features: Matching

Credit: Fei Fei Li


Uses of Features: structure from motion
Uses of Features: panorama stitching
• Given two images
• How do we overlay them?
Finding Features in Videos

• Complex actions can be recognized on the


basis of 'point-light displays’,
• Facial expressions,
• Sign Language,
• Arm movements,
• Various full-body actions.
Finding Features in Videos
Characteristics of good features
•Distinctiveness
Each feature can be uniquely identified
•Repeatability
The same feature can be found in several images :
- geometrically (translation, rotation, scale, perspective)
- photometrically (reflectance, illumination)
•Compactness and efficiency
- Many fewer features than image pixels
- run independently per image

Kristen Grauman
Compactness and Efficiency
• We want the representation to be as small
and as fast as possible
– Much smaller than a whole image

• We‘d like to be able to run the detection


procedure independently per image
- Match just the compact descriptors for speed.
- Difficult! We don‘t get to see ‗the other image‘ at
match time, e.g., object detection.
Kristen Grauman
Key-points
Choosing interest points

Where would you tell your


friend to meet you?

Slide Credit: James Hays


What is an interest point?
• Expressive texture
• The point at which the direction of the boundary of object changes abruptly
• Intersection point between two or more edge segments
What is an interest point?
• Expressive texture
• The point at which the direction of the boundary of object changes abruptly
• Intersection point between two or more edge segments
What is an interest point?
Properties of Interest Points
• Detect all (or most) true interest points
• No false interest points
• Well localized
• Robust with respect to noise
• Efficient detection
Possible approaches: corner detection
• Based on brightness of images
• Usually image derivatives
• Based on boundary extraction
• First step edge detection
• Curvature analysis of edges
Goals for KeyPoints

Detect points that are repeatable and distinctive


Application: KeyPoint Matching
1. Find a set of
A1
distinctive key-points

2. Define a region
A2 around each key-point
A3
3. Extract and normalize
the region content

fA fB 4. Compute a local
descriptor from the
normalized region

d( f A , f B )  T 5. Match local
descriptors
K. Grauman, B. Leibe
• Corner point can be
recognized in a window
• Shifting a window in any
direction should give a large
change in intensity
• LOCALIZING and
UNDERSTANDING shapes…
Basic Idea in Corner Detection
• Recognize corners by looking at small window.
• Shift in any direction to give a large change in intensity.

―Flat‖ region: ―Edge‖: ―Corner‖:


no change in all no change along significant change
directions the edge direction in all directions
A. Efros
Template Matching
Template Matching

Complete set of eight templates can be generated by successive 90


degree of rotations.
Template Matching

Complete set of eight templates can be generated by successive 90


degree of rotations.

Why the summation of filter is 0?


Template Matching

Complete set of eight templates can be generated by successive 90


degree of rotations.

Why the summation of filter is 0? Insensitive to


absolute change
In intensity!
Correlation - revisit

f  Image
h  Kernel

f h
f1 f2 f3 h1 h2 h3 f  h  f1h1  f 2 h2  f 3 h3
f4 f5 f6  h4 h5 h6  f 4 h 4  f5h5  f 6 h 6
f7 f8 f9 h7 h8 h9  f 7 h7  f 8 h8  f 9 h9
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

Window function w(x,y) = or

1 in window, 0 outside Gaussian


Source: R. Szeliski
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

E(u, v)   w(x, y) I (x  u, y  v)  I (x, y) 


2

x, y

I(x, y)
E(u, v)

E(0,0)

w(x, y)
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

E(u, v)   w(x, y) I (x  u, y  v)  I (x, y) 


2

x, y

I(x, y)
E(u, v)

E(3,2)

w(x, y)
Corner detection
Three different cases

As a surface
36
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But this is very slow to compute naively.


O(window_width2 * shift_range2 * image_width2)

O( 112 * 112 * 6002 ) = 5.2 billion of these


14.6k ops per image pixel
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But we know the response in E that we are looking


for – strong peak.
Corner Detection: strategy
Approximate E(u,v) locally by a quadratic surface


Recall: Taylor series expansion
• A function f can be represented by
• an infinite series of its derivatives at a single point a:

Wikipedia

As we care about window centered, we set a = 0


(MacLaurin series)

Approximation of
f(x) = ex
centered at f(0)
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But we know the response in E that we are looking


for – strong peak.
Corner Detection: Mathematics
The quadratic approximation simplifies to

where M is a second moment matrix computed from image derivatives:


Corners as distinctive interest points
I x I x IxI y 
M   w(x, y)  
I x I y IyIy
2 x 2 matrix of image derivatives
(averaged in neighborhood of a point)

I I I I
Notation: Ix  Iy  Ix I y 
x y x y
James Hays
Harris corner detection
1) Compute M matrix for each window to recover a cornerness
score 𝐶.
Note: We can find M purely from the per-pixel image derivatives!
2) Threshold to find pixels which give large corner response
𝐶 > threshold.
3) Find the local maxima pixels,
i.e., non-maximal suppression.

C.Harris and M.Stephens. ―A Combined Corner and Edge Detector.‖ Proceedings of the 4th Alvey Vision
Conference: pages 147—151, 1988.
0. Input image
We want to compute M at each pixel.
𝐼
1. Compute image derivatives (optionally, blur first).
𝐼𝑥 𝐼𝑦
2. Compute 𝑀 components as squares of derivatives.

𝐼𝑥2 𝐼𝑦2 𝐼𝑥𝑦

𝑔(𝐼𝑥2) 𝑔(𝐼𝑦2) 𝑔(𝐼𝑥 ∘ 𝐼𝑦)

𝑅
Harris Detector: Steps
Harris Detector: Steps
Compute corner response 𝐶
Harris Detector: Steps
Find points with large corner response: 𝐶 > threshold
Harris Detector: Steps
Take only the points of local maxima of 𝐶
Harris Detector: Steps
Histogram of
Oriented Gradients
Edges
HOG: Human Detection
Histogram - revisit

0 1 1 2 4 6

2 1 0 0 2 4

5 2 0 0 4 2

1 1 2 4 1 0 0 1 2 3 4 5 6

image histogram
Image Histogram - revisit
Histograms of Oriented Gradients
• Given an image I, and a pixel location (i,j).
• We want to compute the HOG feature for that pixel.
• The main operations can be described as a sequence of five steps.

Pixel (i,j)
Histograms of Oriented Gradients
• Step 1: Extract a square window (called “block”) of some size.

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 2: Divide block into a square grid of sub-blocks
(called “cells”) (2x2 grid in our example, resulting in four cells).

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 3: Compute orientation histogram of each cell.

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 4: Concatenate the four histograms.

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Step 5: normalize v. Here we have three options for how to do it:
• Option 1: Divide v by its Euclidean norm.

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Step 5: normalize v. Here we have three options for how to do it:
• Option 2: Divide v by its L1 norm (the L1 norm is the sum of all absolute values of v).

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Option 3:
• Divide v by its Euclidean norm.
• In the resulting vector, clip any value over 0.2
• Then, renormalize the resulting vector by dividing again by its Euclidean norm.

Pixel (i,j)

Block
Histogram of Oriented Gradients
Image gradients

d
I
dx

d
I
dy
Image gradients
Histogram of Oriented Gradients
Summary of HOG Computation
• Step 1: Extract a square window (called “block”) of some size around
the pixel location of interest.
• Step 2: Divide block into a square grid of sub-blocks (called “cells”) (2x2
grid in our example, resulting in four cells).
• Step 3: Compute orientation histogram of each cell.
• Step 4: Concatenate the four histograms.
• Step 5: normalize v using one of the three options described previously.
Histograms of Oriented Gradients
• Parameters and design options:
• Angles range from 0 to 180 or from 0 to 360 degrees?
• In the Dalal & Triggs paper, a range of 0 to 180 degrees is used,
• and HOGs are used for detection of pedestrians.
• Number of orientation bins.
• Usually 9 bins, each bin covering 20 degrees.
• Cell size.
• Cells of size 8x8 pixels are often used.
• Block size.
• Blocks of size 2x2 cells (16x16 pixels) are often used.
• Usually a HOG feature has 36 dimensions.
• 4 cells * 9 orientation bins.
HOG
SIFT
Scale Invariant Feature Transform (SIFT)
• Lowe., D. 2004, IJCV

cited > 70K


Scale Invariant Feature Transform (SIFT)
• Image content is transformed into local feature coordinates
• Invariant to
• translation
• rotation
• scale, and
• other imaging parameters
Scale Invariant Feature Transform (SIFT)
• Image content is transformed into local feature coordinates
Overall Procedure at a High Level
Scale-Space
Search over multiple scales and image
Extrema locations
Detection

KeyPoint Fit a model to determine location and


scale. Select KeyPoints based on a
Localization measure of stability.

Orientation Compute best orientation(s) for each


Assignment keyPoint region.

KeyPoint Use local image gradients at selected


scale and rotation to describe each
Description keyPoint region.
Automatic Scale Selection

f (I i1…im (x,  ))  f (I i1…im (x,  ))

How to find patch sizes at which f response is equal?


What is a good f ?
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f

f (I i …i (x,  ))
1 m
f (I i1…i m (x,))
What Is A Useful Signature Function f ?

1st Derivative of Gaussian

(Laplacian of Gaussian)

Earl F. Glynn
What Is A Useful Signature Function f ?
“Blob” detector is common for corners
• Laplacian (2nd derivative) of Gaussian (LoG)

Scale space
Function
response

Image blob size


Find local maxima in position-scale space

Find maxima/minima





 List of

(x, y, s)

CAP5415 - Lecture 9
Alternative kernel
Approximate LoG with Difference-of-Gaussian (DoG).
Alternative kernel
Approximate LoG with Difference-of-Gaussian (DoG).
1. Blur image with σ Gaussian kernel
2. Blur image with kσ Gaussian kernel
3. Subtract 2. from 1.

- =
Scale-space
Find local maxima in position-scale space of DoG

Find maxima/minima


k 2k
- =

 List of
- k = (x, y, s)

- =

Input image
Results: Difference-of-Gaussian
• Larger circles = larger scale
• Descriptors with maximal scale response
SIFT Orientation estimation
• Compute gradient orientation histogram
• Select dominant orientation ϴ
SIFT Orientation Normalization
• Compute gradient orientation histogram
• Select dominant orientation ϴ
• Normalize: rotate to fixed orientation

0 2
T. Tuytelaars, B. Leibe
SIFT descriptor formation
• Compute on local 16 x 16 window around detection.
• Rotate and scale window according to discovered
orientation ϴ and scale σ (gain invariance).
• Compute gradients weighted by a Gaussian of variance
half the window (for smooth falloff).

Actually 16x16, only showing 8x8

James Hays
SIFT descriptor formation
• 4x4 array of gradient orientation histograms weighted by
gradient magnitude.
• Bin into 8 orientations x 4x4 array = 128 dimensions.
Showing only 2x2 here but is 4x4

James Hays
SIFT Descriptor Extraction

Gradient 8 bin ‘histogram’


magnitude - add magnitude
and amounts!
orientation
Utkarsh Sinha
Reduce effect of illumination
• 128-dim vector normalized to 1
• Threshold gradient magnitudes to avoid excessive
influence of high gradients
• After normalization, clamp gradients > 0.2
• Renormalize

James Hays
Review: Local Descriptors
• Most features can be thought of as
• templates,
• histograms (counts),
• or combinations
• The ideal descriptor should be
– Robust and Distinctive
– Compact and Efficient
• Most available descriptors focus on edge/gradient information
– Capture texture information
– Color rarely used

You might also like