0% found this document useful (0 votes)

10 views

06 Features

The document discusses features that can be extracted from images and videos. It describes different types of features including global, region-based, and local features. It also discusses key-points and interest points that can be used for tasks like detection, recognition, and tracking. The document covers techniques for finding features, including histogram of oriented gradients (HOG) and scale-invariant feature transform (SIFT).

Uploaded by

mohammadtestpi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

06 Features

Uploaded by

mohammadtestpi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 94

Features

Lecture 6
Outline
• Features introduction
• Key-points
• Histogram of Oriented gradients (HOG)
• Scale-Invariant Feature Transform (SIFT)
What is a Feature?
• Information extracted from an image/video.
• Hand-crafted
• Learned
• We can define a function
• Takes an image/video as an input
• Produces one or more numbers as output
• Hand-crafted features
• Feature engineering
• Learned features
• Automatically learned
Types of Features
• Global features
• Extracted from the entire image
• Examples: template (the image itself), HOG, etc.
• Region-based features
• Extracted from a smaller window of the image.
• Applying global method to a specific image region.
• Local features
• Describe a pixel, and the vicinity around a specific pixel.
• Local feature always refer to a specific pixel location.
Uses of Features
• Features can be used for many computer vision problems.
• Detection.
• Recognition.
• Tracking.
• Stereo estimation.
• Different types of features for different problems,
• Different assumptions about the images.
• That is why there are many different types of features.
Uses of Features: Matching
Uses of Features: Matching

Credit: Fei Fei Li

Uses of Features: structure from motion
Uses of Features: panorama stitching
• Given two images
• How do we overlay them?
Finding Features in Videos

• Complex actions can be recognized on the

basis of 'point-light displays’,
• Facial expressions,
• Sign Language,
• Arm movements,
• Various full-body actions.
Finding Features in Videos
Characteristics of good features
•Distinctiveness
Each feature can be uniquely identified
•Repeatability
The same feature can be found in several images :
- geometrically (translation, rotation, scale, perspective)
- photometrically (reflectance, illumination)
•Compactness and efficiency
- Many fewer features than image pixels
- run independently per image

Kristen Grauman
Compactness and Efficiency
• We want the representation to be as small
and as fast as possible
– Much smaller than a whole image

• We‘d like to be able to run the detection

procedure independently per image
- Match just the compact descriptors for speed.
- Difficult! We don‘t get to see ‗the other image‘ at
match time, e.g., object detection.
Kristen Grauman
Key-points
Choosing interest points

Where would you tell your

friend to meet you?

Slide Credit: James Hays

What is an interest point?
• Expressive texture
• The point at which the direction of the boundary of object changes abruptly
• Intersection point between two or more edge segments
What is an interest point?
• Expressive texture
• The point at which the direction of the boundary of object changes abruptly
• Intersection point between two or more edge segments
What is an interest point?
Properties of Interest Points
• Detect all (or most) true interest points
• No false interest points
• Well localized
• Robust with respect to noise
• Efficient detection
Possible approaches: corner detection
• Based on brightness of images
• Usually image derivatives
• Based on boundary extraction
• First step edge detection
• Curvature analysis of edges
Goals for KeyPoints

Detect points that are repeatable and distinctive

Application: KeyPoint Matching
1. Find a set of
A1
distinctive key-points

2. Define a region
A2 around each key-point
A3
3. Extract and normalize
the region content

fA fB 4. Compute a local
descriptor from the
normalized region

d( f A , f B )  T 5. Match local
descriptors
K. Grauman, B. Leibe
• Corner point can be
recognized in a window
• Shifting a window in any
direction should give a large
change in intensity
• LOCALIZING and
UNDERSTANDING shapes…
Basic Idea in Corner Detection
• Recognize corners by looking at small window.
• Shift in any direction to give a large change in intensity.

―Flat‖ region: ―Edge‖: ―Corner‖:

no change in all no change along significant change
directions the edge direction in all directions
A. Efros
Template Matching
Template Matching

Complete set of eight templates can be generated by successive 90

degree of rotations.
Template Matching

Complete set of eight templates can be generated by successive 90

degree of rotations.

Why the summation of filter is 0?

Template Matching

Complete set of eight templates can be generated by successive 90

degree of rotations.

Why the summation of filter is 0? Insensitive to

absolute change
In intensity!
Correlation - revisit

f  Image
h  Kernel

f h
f1 f2 f3 h1 h2 h3 f  h  f1h1  f 2 h2  f 3 h3
f4 f5 f6  h4 h5 h6  f 4 h 4  f5h5  f 6 h 6
f7 f8 f9 h7 h8 h9  f 7 h7  f 8 h8  f 9 h9
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

Window function w(x,y) = or

1 in window, 0 outside Gaussian

Source: R. Szeliski
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

E(u, v)   w(x, y) I (x  u, y  v)  I (x, y) 

x, y

I(x, y)
E(u, v)

E(0,0)

w(x, y)
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

E(u, v)   w(x, y) I (x  u, y  v)  I (x, y) 

x, y

I(x, y)
E(u, v)

E(3,2)

w(x, y)
Corner detection
Three different cases

As a surface
36
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But this is very slow to compute naively.

O(window_width2 * shift_range2 * image_width2)

O( 112 * 112 * 6002 ) = 5.2 billion of these

14.6k ops per image pixel
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But we know the response in E that we are looking

for – strong peak.
Corner Detection: strategy
Approximate E(u,v) locally by a quadratic surface

≈
Recall: Taylor series expansion
• A function f can be represented by
• an infinite series of its derivatives at a single point a:

Wikipedia

As we care about window centered, we set a = 0

(MacLaurin series)

Approximation of
f(x) = ex
centered at f(0)
Corner Detection by Auto-correlation
Change in appearance of window w(x,y) for shift [u,v]:

We want to discover how E behaves for small shifts

But we know the response in E that we are looking

for – strong peak.
Corner Detection: Mathematics
The quadratic approximation simplifies to

where M is a second moment matrix computed from image derivatives:

Corners as distinctive interest points
I x I x IxI y 
M   w(x, y)  
I x I y IyIy
2 x 2 matrix of image derivatives
(averaged in neighborhood of a point)

I I I I
Notation: Ix  Iy  Ix I y 
x y x y
James Hays
Harris corner detection
1) Compute M matrix for each window to recover a cornerness
score 𝐶.
Note: We can find M purely from the per-pixel image derivatives!
2) Threshold to find pixels which give large corner response
𝐶 > threshold.
3) Find the local maxima pixels,
i.e., non-maximal suppression.

C.Harris and M.Stephens. ―A Combined Corner and Edge Detector.‖ Proceedings of the 4th Alvey Vision
Conference: pages 147—151, 1988.
0. Input image
We want to compute M at each pixel.
𝐼
1. Compute image derivatives (optionally, blur first).
𝐼𝑥 𝐼𝑦
2. Compute 𝑀 components as squares of derivatives.

𝐼𝑥2 𝐼𝑦2 𝐼𝑥𝑦

𝑔(𝐼𝑥2) 𝑔(𝐼𝑦2) 𝑔(𝐼𝑥 ∘ 𝐼𝑦)

𝑅
Harris Detector: Steps
Harris Detector: Steps
Compute corner response 𝐶
Harris Detector: Steps
Find points with large corner response: 𝐶 > threshold
Harris Detector: Steps
Take only the points of local maxima of 𝐶
Harris Detector: Steps
Histogram of
Oriented Gradients
Edges
HOG: Human Detection
Histogram - revisit

0 1 1 2 4 6

2 1 0 0 2 4

5 2 0 0 4 2

1 1 2 4 1 0 0 1 2 3 4 5 6

image histogram
Image Histogram - revisit
Histograms of Oriented Gradients
• Given an image I, and a pixel location (i,j).
• We want to compute the HOG feature for that pixel.
• The main operations can be described as a sequence of five steps.

Pixel (i,j)
Histograms of Oriented Gradients
• Step 1: Extract a square window (called “block”) of some size.

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 2: Divide block into a square grid of sub-blocks
(called “cells”) (2x2 grid in our example, resulting in four cells).

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 3: Compute orientation histogram of each cell.

Pixel (i,j)

Block
Histograms of Oriented Gradients
• Step 4: Concatenate the four histograms.

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Step 5: normalize v. Here we have three options for how to do it:
• Option 1: Divide v by its Euclidean norm.

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Step 5: normalize v. Here we have three options for how to do it:
• Option 2: Divide v by its L1 norm (the L1 norm is the sum of all absolute values of v).

Pixel (i,j)

Block
Histograms of Oriented Gradients
Let vector v be concatenation of the four histograms from step 4.
• Option 3:
• Divide v by its Euclidean norm.
• In the resulting vector, clip any value over 0.2
• Then, renormalize the resulting vector by dividing again by its Euclidean norm.

Pixel (i,j)

Block
Histogram of Oriented Gradients
Image gradients

d
I
dx

d
I
dy
Image gradients
Histogram of Oriented Gradients
Summary of HOG Computation
• Step 1: Extract a square window (called “block”) of some size around
the pixel location of interest.
• Step 2: Divide block into a square grid of sub-blocks (called “cells”) (2x2
grid in our example, resulting in four cells).
• Step 3: Compute orientation histogram of each cell.
• Step 4: Concatenate the four histograms.
• Step 5: normalize v using one of the three options described previously.
Histograms of Oriented Gradients
• Parameters and design options:
• Angles range from 0 to 180 or from 0 to 360 degrees?
• In the Dalal & Triggs paper, a range of 0 to 180 degrees is used,
• and HOGs are used for detection of pedestrians.
• Number of orientation bins.
• Usually 9 bins, each bin covering 20 degrees.
• Cell size.
• Cells of size 8x8 pixels are often used.
• Block size.
• Blocks of size 2x2 cells (16x16 pixels) are often used.
• Usually a HOG feature has 36 dimensions.
• 4 cells * 9 orientation bins.
HOG
SIFT
Scale Invariant Feature Transform (SIFT)
• Lowe., D. 2004, IJCV

cited > 70K

Scale Invariant Feature Transform (SIFT)
• Image content is transformed into local feature coordinates
• Invariant to
• translation
• rotation
• scale, and
• other imaging parameters
Scale Invariant Feature Transform (SIFT)
• Image content is transformed into local feature coordinates
Overall Procedure at a High Level
Scale-Space
Search over multiple scales and image
Extrema locations
Detection

KeyPoint Fit a model to determine location and

scale. Select KeyPoints based on a
Localization measure of stability.

Orientation Compute best orientation(s) for each

Assignment keyPoint region.

KeyPoint Use local image gradients at selected

scale and rotation to describe each
Description keyPoint region.
Automatic Scale Selection

f (I i1…im (x,  ))  f (I i1…im (x,  ))

How to find patch sizes at which f response is equal?

What is a good f ?
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

Response
of some
function f