Lecture 9.1 Motion & Video Analysis in Computer Vision 2025
Lecture 9.1 Motion & Video Analysis in Computer Vision 2025
Lecture 9.1
a
Motion & Video
Analysis in
Computer Vision
Applying Deep Learning Algorithms
for Motion & Video Analysis
CPCS432 Lecture 9 6
Motion Analysis
Global Motion: Movement that affects the entire scene, often due to camera
motion (e.g., panning, tilting, rotating). Global motion is typically consistent
across all pixels and is represented by a coherent flow in one direction.
Motion Detection
Motion detection involves identifying moving objects in a scene, typically
Motion Detection
from a sequence of images or video frames. Motion analysis builds upon this
by determining the characteristics of movement, such as direction, speed, and
trajectory. Together, these concepts allow computers to interpret dynamic
scenes, making them essential for real-time applications.
CPCS432 Lecture 9 8
Motion Detection
2. Feature Matching
3. Template Matching.
4. Optical Flow
https://ijssaggu.github.io/mog/
CPCS432 Lecture 9 9
Motion detection techniques
Background Subtraction
• Background subtraction is a method where a static background image is subtracted
from each frame to isolate moving objects, effective when the background remains
unchanged.
• This method can be implemented with basic image processing techniques.
https://www.coursera.org/learn/object-tracking-and-motion-computer-vision/lecture/MkMFh/detecting-motion
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 10
Motion detection techniques
Background Subtraction
Simple approach:
1. Estimate background for time t
2. Subtract estimated background from current input frame
3. Apply a threshold to the absolute difference to get the foreground mask
But of course, the question is, what's a good estimate of the background in a video sequence?
Assumes the background is whatever was in the previous frame
https://www.coursera.org/learn/object-tracking-and-motion-computer-vision/lecture/MkMFh/detecting-motion
Motion detection techniques
Background Subtraction (Frame Difference)
Frame Difference
Frame differencing is a basic method for detecting motion by comparing the current frame to the previous frame in a
video sequence.
• This technique isn’t true background subtraction; instead, it assumes the background is whatever was in the frame
immediately before. By subtracting (or differencing) the current image from the last one, frame differencing
highlights areas where there is movement, making it effective for capturing recent changes.
• This method is simple and useful for identifying motion in real-time applications, though it’s limited when dealing
with gradual or slow movements.
- Pixel values in the current frame are compared with those in the previous frame to identify differences. This
approach provides a straightforward method for detecting motion or changes between consecutive frames.
- - To determine whether a difference is substantial, a threshold is set. It is considered a change if the pixel value
difference exceeds this threshold.
- This threshold is usually chosen based on the specific application and the level of sensitivity required.
- When significant changes occur in the video sequence, such as moving objects like cars, these changes are
detected as motion.
- This is useful for applications like video surveillance or tracking moving objects.
- To make motion detection more robust, more sophisticated techniques can be used.
- For example, creating a background model based on the average of the first K frames helps filter out static
elements and provides a better reference for detecting real changes.
Limitations
- It tends to detect any change, including background noise, flickering, or minor variations in lighting.
- This can lead to false positives, as seen in the GIF, with leaves waving in the background.
- In practice, simple frame differencing can be a quick and easy way to detect motion, but it may require
additional processing and filtering to reduce false alarms and improve accuracy, especially in complex video
scenes.
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 16
Motion detection techniques
Background Subtraction (Frame Difference)
Histogram
Average of previous K frames
Median of previous K frames
Moving Median of previous K frames
Gaussian Mixture Model (GMM)
- A background modelling technique using the average of the first K frames in a video sequence as the background image.
- This technique provides a better reference for detecting changes in subsequent frames than the simple frame difference
approach mentioned earlier.
- The first step in this approach is to compute a background image by taking the average pixel values in the first K frames
of the video sequence. The resulting image represents the static or background scene without moving objects or changes.-
- In the subsequent frames, each pixel's value is compared to the corresponding pixel in the background image. The idea
is to identify any substantial differences, which could indicate the presence of moving objects or changes in the scene.
Limitations of Average Background
- Sensitivity to Changes: While using the average background improves the simple frame difference technique, it is still
limited. It can be sensitive to changes in lighting, shadows, or variations in the scene. For example, if leaves are
waving in the background or other minor changes in lighting or the environment, these changes may be detected as
motion, leading to false positives.
Static Pixel Model: The approach uses a static model for each pixel, based on the initial average. This means that any
significant change in lighting or scene elements will not be effectively captured.
- The median is more robust because it represents the middle value of a data set, making it less sensitive to extreme values.
- Using the median as the background model can help mitigate the impact of outliers, such as changes in lighting or abrupt
variations in pixel values.
- In this technique, the first K frames of the video are used to compute the median value at each pixel.
- The median is chosen because it represents the middle value in a dataset. Unlike the average, which can be sensitive to
outliers and extreme values, the median is more robust.
- It is less affected by individual pixel value variations and can better capture the central tendency of the background.
- Compared to the average, the median background modelling approach offers more stability. It is less prone to false
positives caused by minor variations in pixel values, changes in lighting, or waving leaves.
- Despite the improvement provided by the median, an ideal background model should be adaptive.
- An adaptive model has memory and can adapt to changing scenes over time. This adaptability is important for accurately
capturing and distinguishing background and foreground elements.
- In dynamic environments, where lighting, objects, or environmental conditions may vary, an adaptive model can provide
better performance.
- Moving Median is designed to create a more adaptive and robust background model for video processing.
- Unlike the initial background modelling approach that used the median of the first few frames, the moving median
computes the median using the most recent frames, allowing the background model to adapt slowly to changes in the
scene.
- The moving median technique is more adaptive compared to the static median. It can adjust to variations in the scene,
such as moving objects, changes in lighting, and other factors that affect the background. This adaptability helps
reduce false positives.
The most recent K frames Ref: Lecture 11: Object Tracking Sejong University
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 22
Motion detection techniques
Background Subtraction (Frame Difference)
Gaussian Mixture Model (GMM)
- Simple change detection algorithms, like using the average or median can effectively detect changes in
pixel values over time.
- However, they may not be very resilient to uninteresting changes, such as changes caused by raindrops
or noise.
- To handle complex scenes and distinguish between interesting and uninteresting changes, more
sophisticated models are required.
- Gaussian mixture models (GMM) can be used to model the variation of intensities of colours at each
pixel in the image.
- A scenario involves counting passing cars or identifying cars violating traffic rules in a street scene.
- However, there are complicating factors, including a window with raindrops, snow, and bad weather.
These factors add complexity to the scene.
Ref: Lecture 11: Object Tracking Sejong University
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 23
Motion detection techniques
Background Subtraction (Frame Difference)
- Focus on the representation of a single pixel within the red window and monitor it over a certain period of time. This
implies tracking changes in the pixel's intensity values to detect interesting events.
- The histogram contains several peaks, and these variations in intensity can be attributed to three main factors:
- Static Background (Road): The intensity variations due to the static background, which may change over time due to
factors like illumination changes.
- Noise: Variations in intensity caused by noise, including image noise and fluctuations due to snow passing through the
pixel.
- Moving Objects (e.g., Cars): Occasional moving objects (like cars) pass through the pixel, resulting in distinctive
histogramaround
zuren
Median of previous frames Moving Median of previous frames Gaussian Mixture Model (GMM)
2. Feature Matching
3. Template Matching.
4. Optical Flow
CPCS432 Lecture 9 33
Motion detection techniques
Feature Matching
Rather than focusing on every pixel, these methods track specific features that are more
likely to remain consistent over time.
It works similarly to image registration. You detect and extract features from an object
in one frame, then match those features in later frames. By doing so, the translation and
rotation of an object can be computed.
Examples include:
• SIFT (Scale-Invariant Feature Transform) and SURF (Speeded-Up Robust
Features): These are methods for detecting and matching key points in consecutive
frames, useful for tracking rigid or deformable objects.
• KLT (Kanade-Lucas-Tomasi) Tracker: A popular algorithm in object tracking that
selects and follows feature points with strong gradients over a sequence of frames.
Motion Detection
2. Feature Matching
3. Template Matching.
4. Optical Flow
https://theailearner.com/2020/12/12/template-matching-using-opencv/
CPCS432 Lecture 9 35
Motion detection techniques
Template Matching
Template matching, you select a portion of an
image and search the following frames for that
pattern of pixels. This method is especially
useful for stabilizing jittery video, where
orientation and lighting are consistent between
frames. You determine the motion by keeping
track of the template location in each frame.
https://www.coursera.org/learn/object-tracking-and-motion-computer-vision/lecture/MkMFh/detecting-motion
Motion Detection
2. Feature Matching
3. Template Matching.
4. Optical Flow
CPCS432 Lecture 9 37
Motion detection techniques
Optical Flow
Optical flow is a powerful technique to determine motion. It uses the differences between subsequent video frames
and the gradient of those frames to estimate a velocity vector for every pixel.
wie
>
- iti
Motion detection techniques
Optical Flow
• Thus, you don't need to first identify an object or static background. You can then annotate the video by adding
the velocity vectors to each frame.
• Objects moving right or left are easy to distinguish by the large arrows pointing in the direction of motion.
• Velocity arrows indicating motion towards or away from the camera are less obvious objects moving away from
the camera will have outlines getting smaller, so the edges will have velocities that converge objects moving
towards.
Motion detection techniques
Optical Flow
• There is a key constraint with optical flow.
- The illumination of the scene must be approximately constant because optical flow uses the
difference in pixel intensities between frames a shadow or change in lighting could appear as motion.
- This affects the other approaches to motion detection as well, but it is still possible to match features or a
template with some changes in illumination.
n Estimation
Motion
Motion Estimation
a
Motion estimation is the process of determining the Motion Vectors (MV) of objects or
Motion Estimation
regions in a scene between successive frames. A motion vector represents the change in
the position of a pixel or an object between frames, describing both the magnitude and
direction of movement. These motion vectors form a motion field that helps to describe
the overall dynamics of a scene.
CPCS432 Lecture 9 41
Motion Estimation
General Methodologies in Motion
Estimation
Ref: Yao Wang Tandon School of Engineering, New York University Yao Wang, 2021 ECE-GY 6123: Image and Video Processin
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 44
Motion Estimation
Motion Representation
Ref: Yao Wang Tandon School of Engineering, New York University Yao Wang, 2021 ECE-GY 6123: Image and Video Processin
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 46
n a
Motion Estimation Criterion
Motion estimation criterion refers to the mathematical and algorithmic framework used to
Motion Estimation
evaluate and optimize the estimation of motion vectors between consecutive frames of a video or
image sequence. These criteria form the basis for selecting the most accurate motion vectors that
minimize errors while capturing the actual displacement of objects or pixels.
Goals of Motion Estimation Criterion
– Accuracy: Ensure that the estimated motion vector matches the true motion as closely as
possible. r
– Efficiency: Achieve the goal with minimal computational resources, especially for real-time
applications. ge
– Robustness: Handle variations in lighting, noise, occlusions, and complex motion patterns. Sortt
– Compactness: Enable efficient storage and transmission, especially in video compression. -
Ref: Yao Wang Tandon School of Engineering, New York University Yao Wang, 2021 ECE-GY 6123: Image and Video Processin
war
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 47
Motion Estimation
Motion Estimation Criterion
Common motion estimation criterion
A-Matching Criteria evaluate the similarity between blocks or pixels in two frames to find the best
correspondence.1-Sum of Absolute Differences (SAD): Measures the absolute difference in pixel
intensities between a block in the current frame and a candidate block in the reference
frame.
Advantages: Simple to compute. Works well for translational motion.
Disadvantages: Sensitive to lighting variations and noise.
Formula:
3-Normalized Cross-Correlation (NCC):
2- Sum of Squared Differences (SSD):
Measures the correlation between pixel
Squares the differences between pixel intensities,
intensities of the blocks.
emphasizing larger errors.
Advantages: Invariant to global intensity
Advantages: Penalizes larger intensity differences
variations (lighting changes).
more than SAD. Reduces the effect of minor errors.
Disadvantages: Computationally expensive
Disadvantages: Computationally heavier than SAD.
due to normalization.
Formula: Formula:
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 48
Motion Estimation
Motion Estimation Criterion
Common motion estimation criterion
B-Gradient-Based Criteria These rely on optical flow principles and use image gradients to estimate
motion.
1-Optical Flow Constraint (Brightness Constancy): Assumes that pixel intensity remains
constant between frames:
Spatial gradients:
Temporal gradient
Solves for (𝑢,𝑣) the motion vector.
2-Horn-Schunck Method: Adds a smoothness constraint to ensure neighbouring pixels have
similar motion:
C-Block Matching Algorithm (BMA): A popular approach in video compression standards like MPEG
and H.264.
Evaluates candidate motion vectors by comparing blocks in the current frame to those in the reference
frame using matching criteria like SAD or SSD.
1-Optical Flow Networks: Networks like FlowNet and PWC-Net use CNNs trained on large datasets to
estimate optical flow directly, achieving high accuracy and robustness to noise and complex motions.
These networks perform dense motion estimation by learning from real-world scenarios, making them
highly effective for applications like autonomous driving.
2-Recurrent Neural Networks (RNNs): RNNs and LSTMs are used to capture temporal dependencies,
allowing better predictions of motion patterns over time. They are particularly useful for sequential data
and can help improve the accuracy of motion predictions by learning historical patterns.
3-Attention Mechanisms and Transformers: Attention-based networks can focus on specific parts of a
scene, enabling accurate motion estimation even in crowded or cluttered environments. These
mechanisms help networks prioritize areas of interest, which can improve efficiency and accuracy in
tracking applications.
CPCS432 Lecture 9 Dr. Arwa Basbrain 5/11/2024 52
Motion Estimation
Applications