CV III-Unit Notes
CV III-Unit Notes
• The most general algorithms for structure from motion make no prior assumptions
about the objects or scenes that they are reconstructing.
• Lines and planes can provide information complementary to interest points and also
serve as useful building blocks for 3D modeling and visualization.
• Many lines and planes are either parallel or orthogonal to each other
Line-based techniques
• When lines are visible in three or more views, the trifocal tensor can be used to
transfer lines from one pair of images to another. The trifocal tensor can also be
computed on the basis of line matches alone.
• Another technique for matching 2D lines based on the average of 15 15 pixel
correlation scores evaluated at all pixels along their common line segment
intersection.
• An alternative to grouping lines into coplanar subsets is to group lines by parallelism.
Whenever three or more 2D lines share a common vanishing point, there is a good
likelihood that they are parallel in 3D. By finding multiple vanishing points in an
image and establishing correspondences between such vanishing points in different
images, the relative rotations between the various images (and often the camera
intrinsics) can be directly estimated
• Other techniques first finds lines and group them by common vanishing points in
each image . The vanishing points are then used to calibrate the camera, i.e., to
perform a “metric upgrade”. Lines corresponding to common vanishing points are
then matched using both appearance and trifocal tensors. These lines are then used
to infer planes and a block-structured model for the scene
Plane-based techniques
• In scenes that are rich in planar structures, it is possible to directly estimate
homographies between different planes, using either feature-based or intensity-
based methods. In principle, this information can be used to simultaneously infer the
camera poses and the plane equations, i.e., to compute plane-based structure from
motion.
• A better approach is to hallucinate virtual point correspondences within the areas
from which each homography was computed and to feed them into a standard
structure from motion algorithm
Dense Motion Estimation: Translational alignment
• The simplest way to establish an alignment between two images or image patches is
to shift one image relative to the other. Given a template image I0(x) sampled at
discrete pixel locations Xi = (xi,yi) , we wish to find where it is located in image I1(x). A
least squares solution to this problem is to find the minimum of the sum of squared
differences (SSD) function
• where u = (u, v) is the displacement and ei = I1(xi + u)-I0(xi) is called the residual
error. Here the assumption that corresponding pixel values remain the same in the
two images is often called the brightness constancy constraint.
• Color images can be processed by summing differences across all three color
channels, although it is also possible to first transform the images into a different
color space or to only use the luminance
• Robust error metrics. We can make the above error metric more robust to outliers by
replacing the squared error terms with a robust function p(ei)
• The robust norm p(e) is a function that grows less quickly than the quadratic penalty
associated with least squares.
• One such function, sometimes used in motion estimation for video coding because of
its speed, is the sum of absolute differences (SAD) metric
Parametric Motion
parametric motion is used to model and analyze the movement of objects within images or video
sequences. This can involve several applications and techniques:
1. Motion Estimation: One of the primary applications is to estimate the motion of objects
between consecutive frames of a video. Parametric models can describe how the position of
an object changes over time. Common methods include:
o Optical Flow: Estimates the motion of objects by analyzing the pattern of apparent
motion of brightness patterns in an image. Optical flow algorithms often use
parametric models to estimate velocity fields.
2. Object Tracking: In tracking, parametric motion models help predict the future position of an
object based on its past positions. Different models can be used depending on the
complexity of the motion:
o Constant Velocity Model: Assumes the object moves at a constant speed and
direction. Useful for simple tracking scenarios.
o Kalman Filter: A recursive algorithm that uses a parametric motion model to predict
the state of a moving object and update the predictions based on new observations.
It is particularly useful for tracking objects in noisy environments.
3. Camera Motion: Parametric models are also used to understand and compensate for the
motion of the camera itself. This is crucial in applications like:
5. Model-Based Tracking: When tracking specific objects, such as faces or vehicles, parametric
models can represent the shape and motion of these objects. For example, models based on
the appearance of a face (like Active Appearance Models) can be used to track facial
expressions and movements.
In summary, parametric motion models in computer vision provide a framework for understanding
and predicting the movement of objects and cameras, enabling a range of applications from tracking
and stabilization to 3D reconstruction and beyond.
Spline-based Motion
Spline-based motion in computer vision refers to using spline functions to model and analyze motion
or trajectories of objects within images or video sequences. Splines are flexible mathematical
functions used to create smooth curves and surfaces, and they can be particularly useful for
describing complex, smooth motion paths.
Key Concepts:
1. Splines:
o Cubic Splines: These are piecewise cubic polynomials that ensure smoothness at the
points where the pieces connect (known as knots). They are widely used in motion
modeling due to their smoothness and flexibility.
o B-Splines (Basis Splines): These provide a way to represent curves and surfaces with
a set of control points. B-splines are particularly useful for their local control and
smoothness properties.
o Bézier Curves: Defined by control points, these curves are used in graphics and
animation to model smooth paths.
o Object Tracking:
o Motion Estimation:
Optical Flow: Splines can be used to model the flow of pixels between
frames, helping to estimate the motion field. For instance, cubic splines
might be used to model the displacement of points across a sequence of
images.
o 3D Reconstruction:
3. Advantages:
o Smoothness: Splines ensure smooth transitions between points, which is important
for accurately modeling continuous motion.
o Flexibility: They can represent a wide variety of shapes and motion patterns, making
them suitable for complex scenarios.
o Local Control: In B-splines, changes to control points affect only a local portion of the
curve, allowing for precise adjustments.
4. Challenges:
o Parameter Tuning: Choosing the right type of spline and tuning its parameters can
require careful consideration to balance smoothness with the fidelity of the
representation.
2. Trajectory Fitting: Use splines to fit a smooth curve through the detected keypoints,
representing the object's motion path.
3. Motion Prediction: Use the spline to predict future positions of the object, improving
tracking accuracy.
4. Motion Correction: Adjust the tracking results based on the spline model to account for any
deviations or noise.
Optical flow
• optical flow algorithms compute an independent motion estimate for each pixel, i.e.,
the number of flow vectors computed is equal to the number of input pixels. The
general optical flow analog to Equation can thus be written as
• It is also possible to combine ideas from local and global flow estimation into a single
framework by using a locally aggregated (as opposed to single-pixel) Hessian as the
bright ness constancy term
• Another extension to the basic optical flow model is to use a combination of global
(para metric) and local motion models. For example, if we know that the motion is
due to a camera moving in a static scene (rigid motion), we can re-formulate the
problem as the estimation of a per-pixel depth along with the parameters of the
global camera motion
Assumptions:
• Brightness Constancy: The brightness of a point remains constant over time.
• Spatial Coherence: Neighboring pixels tend to have similar motion.
Applications:
• Object Tracking: Following moving objects in a scene.
• Video Compression: Reducing data by encoding motion instead of individual
frames.
• Scene Reconstruction: Understanding 3D structures from 2D video data.
• Robotics: Navigating and understanding the environment
Algorithms:
• Lucas-Kanade Method: Assumes a constant flow in a small neighborhood and solves
for motion vectors using least squares.
• Horn-Schunck Method: A global approach that considers smoothness of the flow
field and minimizes a cost function.
• Farneback Method: A dense optical flow algorithm that computes flow at all points
using polynomial expansion.
Types:
• Dense Optical Flow: Estimates flow for every pixel in the image, providing a
comprehensive view of motion.
• Sparse Optical Flow: Estimates flow for selected key points, often used for tracking
specific objects.
Challenges
• Occlusion: When objects overlap, the flow can become ambiguous.
• Lighting Changes: Variations in lighting can affect the brightness constancy
assumption.
• Large Motion: Rapid movements can lead to errors if the flow exceeds the
assumption of small displacements
Layered Motion
• In many situations, visual motion is caused by the movement of a small number of
objects at different depths in the scene. In such situations, the pixel motions can be
described more succinctly (and estimated more reliably) if pixels are grouped into
appropriate objects or layers
• Layered motion representations not only lead to compact representations but they
also exploit the information available in multiple video frames, as well as accurately
modeling the appearance of pixels near motion discontinuities. This makes them
particularly suited as a representation for image-based rendering
• To compute a layered representation of a video sequence, first estimate affine
motion models over a collection of non-overlapping patches and then cluster these
estimates using k-means. They then alternate between assigning pixels to layers and
recomputing motion estimates for each layer using the assigned pixels.
• Once the parametric motions and pixel-wise layer assignments have been computed
for each frame independently, layers are constructed by warping and merging the
various layer pieces from all of the frames together. Median f iltering is used to
produce sharp composite layers that are robust to small intensity variations, as well
as to infer occlusion relationships between the layers.
• You can see both the initial and final layer assignments for one of the frames, as well
as the composite flow and the alpha-matted layers with their corresponding flow
vectors overlaid
Frame interpolation
• Frame interpolation is a widely used application of motion estimation, often
implemented in hardware to match an incoming video to a monitor’s actual refresh
rate, where information in novel in-between frames needs to be interpolated from
preceding and subsequent frames. The best results can be obtained if an accurate
motion estimate can be computed at each unknown pixel’s location.
Transparent layers and reflections
• A special case of layered motion that occurs quite often is transparent motion, which
is usually caused by reflections seen in windows and picture frames.
• If the motions of the individual layers are known, the recovery of the individual layers
is a simple constrained least squares problem, with the individual layer images are
constrained to be positive and saturated pixels provide an inequality constraint on
the summed values