Motion Estimation
Motion Estimation
Yao Wang
Tandon School of Engineering, New York University
Moving scene, moving camera Static camera, moving scene, changing lighting
From http://courses.cs.washington.edu/courses/cse576/16sp/Slides/15_Flow.pdf
3-D MV
2-D MV
Anchor frame: y 1 ( x)
Target frame: y 2 (x)
Motion parameters: a
Motion vector at a
pixel in the anchor
frame: d (x)
Motion field: d(x; a), x Î L
Mapping function:
w (x; a) = x + d(x; a), x Î L
Ψ!(x)= Ψ"(w(x;a))=Ψ"(x+d(x;a))
(Anchor)
https://en.wikipedia.org/wiki/Barberpole_illusion
Key assumptions
• color constancy: a point in H looks the same in I
– For grayscale images, this is brightness constancy
• small motion: points do not move very far
Courtesy of Rob Fergus, http://cs.nyu.edu/~fergus/teaching/vision/5_6_Fitting_Matching_Opticalflow.pdf
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 13
Optical flow constraints
(Assuming grayscale images and small motion)
𝐼% 𝑥, 𝑦 = 𝐼 𝑥, 𝑦 − 𝐻(𝑥, 𝑦)
Global: Pixel-based:
Entire motion field is One MV at each pixel,
represented by a few with some smoothness
global parameters constraint between
(affine, homography) adjacent MVs.
Block-based: Region-based:
Entire frame is divided Entire frame is divided
into blocks, and motion into regions, each
in each block is region corresponding
characterized by a few to an object or sub-
parameters (e.g. a object with consistent
constant MV) motion, represented by
a few parameters.
a= MV for each block a= Motion
parameters for
each region
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 24
Motion Representation: Mesh-Based
xÎL
a is motion parameter
p = 1 : MAD; P = 2 : MSE vector that defines the
entire motion field
• To satisfy the optical flow equation
p
( )
EOF (a) = ∑ ∇ψ 2(x) d(x;a)+ψ 2(x)−ψ 1(x) → min
T
! x∈Λ
l1
Courtesy of Rob Fergus, http://cs.nyu.edu/~fergus/teaching/vision/5_6_Fitting_Matching_Opticalflow.pdf
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 35
Edge
Surface of 𝐴𝑑 − 𝑏
actual shift
estimated shift
u=1.25 pixels
u=2.5 pixels
u=5 pixels
image
image HH u=10 pixels image
image II
imageHJ
image image
image II
Courtesy
Yao Wang, of Ali Farhadi. From http://courses.cs.washington.edu/courses/cse576/16sp/Slides/15_Flow.pdf
2021 ECE-GY 6123: Image and Video Processing 50
Challenge for Optical Flow Estimation
http://vision.middlebury.edu/flow/
• Detect feature points in the first image where both eigenvalues of the
moment matrix are large (similar to Harris corner detector)
• For each feature point in the first image, estimate its motion between first
and second frames using a small window surrounding it using the Lucas-
Kanade flow estimation method
• Repeat between frame 2 and frame 3, using tracked feature points in
Frame 2.
• Verify that the corresponding features between non-adjacent frames
satisfy a global affine mapping. Features that are outliers are dropped.
• References:
– Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an
Application to Stereo Vision. International Joint Conference on Artificial Intelligence, pages
674–679, 1981.
– Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie
Mellon University Technical Report CMU-CS-91-132, April 1991.
– Jianbo Shi and Carlo Tomasi. Good Features to Track. IEEE Conference on Computer
Vision and Pattern Recognition, pages 593–600, 1994.
• Overview:
– Assume all pixels in a block undergo a translation, denoted by a single
MV
– Estimate the MV for each block independently, by minimizing the DFD
error over this block
• Minimizing function:
å y 2 ( x + d m ) - y 1 ( x)
p
EDFD (d m ) = ® min
xÎBm
• Optimization method:
– Exhaustive search (feasible as one only needs to search one MV at a
time), using MAD criterion (p=1)
– Fast search algorithms
– Integer vs. fractional pel accuracy search
for i=1:N:height-N,
for j=1:N:width-N %for every block in the anchor frame
MAD_min=256*N*N;mvx=0;mvy=0;
for k=-R:1:R,
for l=-R:1:R %for every search candidate (needs to be modified so that i+k etc are
within the image domain!)
MAD=sum(sum(abs(f1(i:i+N-1,j:j+N-1)-f2(i+k:i+k+N-1,j+l:j+l+N-1))));
% calculate MAD for this candidate
if MAD<MAX_min
MAD_min=MAD,dy=k,dx=l;
end;
end;end;
fp(i:i+N-1,j:j+N-1)= f2(i+dy:i+dy+N-1,j+dx:j+dx+N-1);
%put the best matching block in the predicted image
iblk=(floor)(i-1)/N+1; jblk=(floor)(j-1)/N+1; %block index
mvx(iblk,jblk)=dx; mvy(iblk,jblk)=dy; %record the estimated MV
end;end;
Note: A real working program needs to check whether a pixel in the candidate matching block falls outside the image
boundary and such pixel should not count in MAD. This program is meant to illustrate the main operations involved. Not the
actual working matlab script.
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 61
Complexity of Integer-Pel EBMA
• Assumption
– Image size: MxM
– Block size: NxN
– Search range: (-R,R) in each dimension
– Search stepsize: 1 pixel (assuming integer MV)
• Operation counts (1 operation=1 “-”, 1 “+”, 1 “*”):
– Each candidate position: N^2
– Each block going through all candidates: (2R+1)^2 N^2
– Entire frame: (M/N)^2 (2R+1)^2 N^2=M^2 (2R+1)^2
• Independent of block size!
• Example: M=512, N=16, R=16, 30 fps
– Total operation count = 2.85x10^8/frame =8.55x10^9/second
• Regular structure suitable for VLSI implementation
• Challenging for software-only implementation
(2x,2y+1) (2x+1,2y+1)
(x,y+!) (x+1,y+1)
O[2x,2y]=I[x,y]
O[2x+1,2y]=(I[x,y]+I[x+1,y])/2
O[2x,2y+1]=(I[x,y]+I[x+1,y])/2
O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4
anchor frame
Predicted anchor frame (29.86dB)
Motion field
• Suppose you use a block size BxB, your search range is (-R to R)
in both directions, and you only search for integer MVs. How many
addition and multiplication you need to do for each block?
• Assumption
– Image size: MxM; Block size: NxN at every level; Levels: L
– Search range:
• 1st level: R/2^(L-1) (Equivalent to R in L-th level)
• Other levels: R/2^(L-1) (can be smaller)
• Operation counts for EBMA
– image size M, block size N, search range R
– # operations: M 2 (2R + 1)2
• Operation counts at l-th level (Image size: M/2^(L-l))
(M / 2 ) (2R / 2
L -l 2 L -1
)
+1
2
å (M / 2 ) (2 R / 2 )
L
L -l 2 L -1 2 1
+ 1 » 4 -( L - 2 ) 4 M 2 R 2
l =1 3
• Saving factor:
3 × 4( L-2) = 3( L = 2); 12( L = 3)
http://insy.ewi.tudelft.nl/content/image-and-video-compression-learning-tool-vcdemo
Use the ME tool to show the motion estimation results with different parameter choices
[PWCnet] Deqin Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz, PWC-Net: CNNs for Optical
Flow Using Pyramid, Warping, and Cost Volume. CVPR 2018.
https://arxiv.org/abs/1709.02371
• http://vision.middlebury.edu/flow/data/
• http://people.csail.mit.edu/celiu/motionAnnotation/
• More recent
• MPI Sintel flow dataset (animation video)
– http://sintel.is.tue.mpg.de/
• KITTI flow dataset 2015
– http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?bench
mark=flow
1) Consider two successive frames shown below. Using the Lucas-Kanade method to
determine the optical flow vector of the center pixel. To determine the horizontal and
vertical gradient image, you could simply use difference of two horizontally and
vertically adjacent pixels. Your solution should show the horizontal, vertical and
temporal gradient image, the moment matrix and the final solution. You should use a
3x3 neighborhood block surrounding the center pixel.
0 0 10 10 10 0 0 0 10 10
0 0 10 10 10 0 0 0 10 10
10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10
2) Would you be able to derive the optical flow using the LK method, if the two frames
look like the following? Why? What if you use a block matching method?
0 0 10 10 10 0 0 0 10 10
0 0 10 10 10 0 0 0 10 10
0 0 10 10 10 0 0 0 10 10
0 0 10 10 10 0 0 0 10 10
3) Consider half-pel accuracy exhaustive search for motion estimation. Assume the
video frame size is MxN, block size is NxN, and motion search range is –R to R in both
horizontal and vertical directions. You use the sum of absolute error as your matching
criterion. What is the total number of multiplications and additions that you need use to
estimate the motion between every two frames? You can ignore the computation for
frame interpolation.
4) Consider HBMA. Support you use 3 resolution levels. Original image size is MxN.
You use a block size of NxN at all levels.
a) Suppose you want the effective search range to be –R to R at the original image resolution.
What should be the search range at the top level?
b) Suppose for the middle level, you use a search range of -2 to 2, and at the bottom level you use
a search range of -1 to 1. Suppose you use integer accuracy search in all levels, what will be
the complexity?
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 100
What if different regions move differently?
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 101
Deformable Image Registration
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 102
Diffeomorphism Mapping
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 103
From: http://campar.in.tum.de/twiki/pub/DefRegTutorial/WebHome/MICCAI_2010_Tutorial_Def_Reg_Darko.pdf
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 104
Many ways to impose regularization à Different methods
From: http://campar.in.tum.de/twiki/pub/DefRegTutorial/WebHome/MICCAI_2010_Tutorial_Def_Reg_Darko.pdf
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 105
Example of Deformable Registration
From [Szeliski2010]
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 106
Demon’s Algorithms
• Basic Idea
– Iteratively update the motion vector at each pixel based on the image
gradient (image warped after iteration)
– Smooth the new motion field using a Gaussian kernel
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 107
Python for Demons’ Registration Algorithm
• Package : SimpleITK
• Install : conda install -c simpleitk simpleitk=0.10.0
• Detail
:http://insightsoftwareconsortium.github.io/SimpleITK-
Notebooks/66_Registration_Demons.html
Yao Wang, 2021 ECE-GY 6123: Image and Video Processing 108