Lecture 10 Introduction To Video Processing & Applications: CSE 489-02 & CSE 589-02 Multimedia Processing
Lecture 10 Introduction To Video Processing & Applications: CSE 489-02 & CSE 589-02 Multimedia Processing
Spring 2009
06/25/21 1
Digital Video Processing
Digital Video
Digitized
A sequence of images along the temporal axis
Processing
A running program or computing operation
Driven by the real-world applications (e.g.,
compression, filtering, retrieval)
06/25/21 2
A Brief History
Born of Television (1920s)
Cable TV system (1968)
Video games (1970s)
All-digital HDTV (1990s)
Video streaming (2000s)
Everyday video transmission through internet
and wireless networks (20??)
06/25/21 3
Why Video?
The magic of Tele-Vision
Our vision capability is extended in space
06/25/21 4
Why Video? (Cont’d)
Our vision capability is extended in time
If time can be reversed, I will not need a
Gigabyte hard-drive to store the moments of
how a baby is growing
06/25/21 5
Importance of Motion
Our HVS routinely perceives and interprets
motion (neurobiology)
Gait-based biometrics
The characteristics of an individual’s walk
06/25/21 6
Diversity of Motion
06/25/21 7
Motion in Video
It is not an arbitrary concatenation of
images, but a sequence of images carrying
a coherent interpretation of natural scene
Ordering is important
Sampling rate is important
The role of a single frame is less important
due to the masking effect of HVS
06/25/21 8
How to Understand Video?
Understand the source
How to model the motion of a camera?
(relatively easy)
How to model the motion in the real
world? (notoriously difficult)
Understand the mechanism of time-
varying image formation model
Two sides: geometric and photometric
06/25/21 9
Camera Motion
How many scene changes?
Within each scene, what kind of camera
motion do you see?
camera panning
zoom in/out
combination
06/25/21 10
2-D Motion Corresponding to
Camera Motion
06/25/21 11
Real-world Motion
Every motion you observe for a day
Can you classify them into a few simple
classes?
Rigid motion vs. deformable motion
IF you observe multiple motions at the
same time, how about the spatial
relationship among different moving
objects?
Overlapping vs. non-overlapping
06/25/21 12
Rigid Object Motion
06/25/21 13
Flexible Object Motion
Two ways to describe
Decompose into multiple but connected rigid
sub-objects
Global motion plus local motion in sub-objects
Ex. Human body consists of many parts each
undergo a rigid motion
06/25/21 14
An Example
06/25/21 15
Geometric Image Formation Models
06/25/21 16
Photometric Image Formation Models
06/25/21 17
Photometric Image Formation Models
06/25/21 18
Why Video is Hard?
The daunting modeling complexity
Scene geometry, lighting condition,
object/camera motion, sensor
characteristics
We have to rely on digital computers
to process video
Limited memory and computation
resource
Fundamental question about computing
06/25/21 19
Example: 2D Motion Estimation
06/25/21 20
Fundamental Assumption
(vx,vy)
In-1(x,y) In(x,y)
I n ( x, y ) I n 1 ( x vx , y v y )
06/25/21 21
Overview of Video Processing
Video
Manipulation Video
Display
Video
Video Compression Video
Acquisition Database
Video
Computer Transmission
Graphics
Video Computer
Analysis Vision
06/25/21 22
Video Acquisition
computer-generated
06/25/21 23
Acquisition-related Problems
Video camera
What if camera is not kept still?
Why is it difficult to improve the spatial resolution of
video cameras?
VHS digitization
What if VHS contains some scratches?
How to handle interlaced video?
Computer-generated
How is this type of video different? Shouldn’t we have a
separate coding algorithm for this type of video?
06/25/21 24
Video Manipulation
Why?
Fight against a non-ideal video acquisition (e.g.,
analog heritage, film scratches, limited
resolution) or transmission environment
Create new and artificial video content (e.g.,
spatio-temporal interpolation,
background/foreground modification)
06/25/21 25
Video Dejittering
06/25/21 26
Video Inpainting
06/25/21 27
Error Concealment
… …
… …
HR sequence
06/25/21 30
Post-processing
Deblocking: suppress block artifacts in video
06/25/21 31
Video Matting
06/25/21 32
Video Games
06/25/21 33
Video Dynamosaics
06/25/21 34
Dynamosaics Result
Source: http://www.vision.huji.ac.il/dynmos/
06/25/21 35
Video Coding Overview
The grand challenge
We still face insufficient storage space for
video data even with Gigabyte hard disks
Video transmission through limited
bandwidth channels
Existing approaches
Three-dimensional waveform coding
Motion-compensated hybrid coding
Model-based coding
Video coding standards
06/25/21 36
Three-dimensional Waveform Coding*
Image and video coding
Sub-band/wavelet coding of 2D signals
Wavelet works because of its good localization
property in both space and frequency
SPHIT AND SPHIT3
http://www.cipr.rpi.edu/research/SPIHT/
06/25/21 37
Motion-compensated
Predictive Coding
Basic idea
DPCM coding in temporal domain
To reduce overhead on motion field, motion vector is
assigned to each block instead of each pixel
After block-wise motion compensation, code motion-
compensated residues like still images
Variations: variable block size, fractional-pel
accuracy, overlapped block motion compensation
(OBMC)
All existing video coding standards from H.261 to
the latest H.264 fall under such category
06/25/21 38
Model-based Coding
Object-based coding
Attempt to replace blocks by objects
Its success remains uncertain due to
difficulty of segmentation
Knowledge-based coding
Explicitly build 3D waveframe models to
represent moving objects
Limited success in videophone
applications
06/25/21 39
Video Coding Standards
ISO ITU
MPEG-1 (1992) H.261 (1990)
1.5Mbps, VCD p×64Kbps
06/25/21 42
Video Transmission
Downloading
Pro: you can have your own copy and can
watch it offline
Con: you have to wait!!!
Streaming
Pro: no need to store (we seldom watch a
movie again and again)
Con: you have to have a good network
connection and pray for less traffic
06/25/21 43
Video Transmission Through Networks
Networking protocols
Transmission Control Protocol (TCP)
User Datagram Protocol (UDP)
Real Time Protocol (RTP) and VDP
Real Time Streaming Protocol (RTSP)
ReSerVation Protocol (RSVP)
Transmission Control Protocol is not suitable
for video streaming because
TCP imposes its own flow control and windowing
schemes on the data stream, effectively
destroying temporal relations between video
frames
Reliable message delivery is unnecessary for video
- losses are tolerable and TCP retransmission
causes further jitter and skew.
06/25/21 44
Security issues
Video is unique
high data rate, power hungry, time
constrained, loss-tolerant, content with varying
importance
Content access control
Cryptographic approaches
Digital video scrambling techniques
Piracy and malicious attacks
Video watermarking
06/25/21 45
Video Content Protection by
Watermarking Techniques
Signature insertion
Signature extraction
06/25/21 46
Research Ideas
Distributed video coding for error resilience
Further extension of multiple descriptions
Motion estimation/compensation is performed
at the decoder instead of encoder
Power-constrained transmission
Sensor network applications and handheld
devices
Authentication in networked transmission
Transmission errors vs. malicious attacks
Transcoding distortions vs. intentional attacks
06/25/21 47
Video Analysis
Motion segmentation
In contrast to image segmentation, motion
offers valuable clues for separating different
objects
Motion tracking
Track the same object across video frames
Motion interpretation
Easy for HVS, difficult for a computer (e.g.,
summarize a 6-hr. baseball video into 30min.)
06/25/21 48
Motion Segmentation
06/25/21 49
Motion Tracking
06/25/21 50
Motion Interpretation
Scene change detection
Where motion tracking fails
Cut, dissolve, wipe classification
Those are artificial features added by video
editing staff
Analyze each video segment
Camera motion: panning or zooming or still
Object motion: shape, direction, speed, etc.
06/25/21 51
Application (I): Video Summarization
06/25/21 52
Application (II): Video-based Lifeguard
Source: http://www.wisdom.weizmann.ac.il/~vision/Irregularities.html
06/25/21 54
06/25/21 55
Application (III): Irregularity
Detection
Source: http://www.wisdom.weizmann.ac.il/~vision/Irregularities.html
06/25/21 56
Video Database Management
Database management
Indexing, parsing, browsing, querying
Retrieval
What is special about video?
Formidable amount of data
Difficulty with query (content-based)
Inherent uncertainty and imprecision
06/25/21 57
Content-Based Video Retrieval
(CBVR)
How to provide a compact and complete
video sequence representation?
Spatial analysis (histogram, color, texture)
Temporal analysis (cut, dissolve, wipe)
How to provide easy-to-use and efficient
query interface to user
Video browsing (slide vs. 3D)
Video querying (example-based, text-based)
06/25/21 58
Compressed-domain Video Analysis
Since video data often exist in compressed
format, it is preferred to do analysis with bit
streams rather than pixel values
Examples: caption detection, shot detection etc.
The key issue lies in how to exploit the
information contained in the bit stream
It does not cost much computation
It is constrained by the adopted compression
techniques and never perfect (e.g., block motion
field)
06/25/21 59
Two –Dimensional
Motion Estimation
06/25/21 60
Motion vs. Optical Flow
06/25/21 61
General Consideration
06/25/21 62
Motion Representation
06/25/21 63
Notations
06/25/21 64
Motion Estimation Criterion
06/25/21 65
Optimization Methods
06/25/21 66
Block-Based Motion Estimation
06/25/21 67
Block-Matching Algorithm
06/25/21 68
Exhaustive Block Matching Algorithm
(EBMA)
06/25/21 69
Complexity of Integer-Pel EBMA
06/25/21 70
Fast BMA (1): 3-Step-Search
search 9+8+8=
25 points
06/25/21 71
Fast BMA (2): 2D-Log Search
search at most
5+4+2+3+2+8=
24 points
06/25/21 72
Fast BMA (3): Orthogonal Search
search at most
2(3+2+2+2+2+2)=
26 points
06/25/21 73
Fast BMA (4): Cross Search
As the step size decreased to one, a
(+) cross search pattern (as shown in
lower-left side of figure) is used if the
minimum BDM point of the previous
step is either the center, upper-left or
lower-right checking point. Otherwise,
(X) cross search pattern (as shown in
upper-right side of figure) is used.
search at most
5+4+4+4=
17 points
06/25/21 74
Fast BMA (5): New 3-Step Search
06/25/21 75
New 3-Step Search: Examples
06/25/21 76
Fast BMA (6): 4-Step Search
Search the 9 checking points located at
a 5-by-5 window to see if the point reaching
the minimum distortion is found at the center?
N
Y
Is it at the corner or not? N
06/25/21 78
Multi-resolution Representation of Images
M/4
N/4
M/2
N/2
06/25/21 79
Why does Hierarchical Strategy Help?
Level-2
ME result
Level-1
ME result
Level-0
06/25/21 80
Hierarchical Block Matching Algorithm
(HBMA)
06/25/21 81
Example: Three-level HBMA
06/25/21 82
Fast BMA (7): Hierarchical Search
06/25/21 83
Summary
Why do we care fast BMA?
Driven by the application demands of video
coding
Can we go beyond BMA?
The block-based constraint is simple but not
appropriate for accounting for arbitrary shape
of moving objects
The integer-pel accuracy is not sufficient to
account for continuous nature of motion
06/25/21 84
Fractional Accuracy EBMA
06/25/21 85
Why Do We Need Fraction-pel?
06/25/21 86
Fractional-pel BMA
2N
M linear 2M
interpolation
06/25/21 87
Half-pel BMA 1
1
1
1
current frame
(2x,2y+1) (2x+1,2y+1)
(x,y+!) (x+1,y+1)
O[2x,2y]=I[x,y]
O[2x+1,2y]=(I[x,y]+I[x+1,y])/2
O[2x,2y+1]=(I[x,y]+I[x,y+1])/2
O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4
06/25/21 89
Hierarchical Strategy for
Half-pel BMA
Integer-pel
Half-pel
06/25/21 90
Generalizations of BMA
Variable block-size matching algorithms
Widely used by various video coding standards
H.264 includes three variable block sizes: 4-
by-4, 8-by-8 and 16-by-16
Fractional-pel accuracy BMA
Half-pel : MPEG-1/2/4, H.263/H.263+
Quarter-pel: H.264 (even 1/8-pel)
Tradeoff between overhead on motion and
MCP efficiency
06/25/21 91
Variable Block-size BMA
06/25/21 92
BMA Strategy Adopted by H.263
16-by-16 8-by-8
06/25/21 93
BMA Strategy Adopted by H.264
06/25/21 95
Overview of DBMA
Three steps:
Partition the anchor frame into regular blocks
Model the motion in each block by a more
complex motion
The 2-D motion caused by a flat surface patch
undergoing rigid 3-D motion can be approximated well
by projective mapping
Projective Mapping can be approximated by affine
mapping and bilinear mapping
Estimate the motion parameters block by block
independently
Discontinuity problem cross block boundaries still
remain
06/25/21 96
Affine and Bilinear Model
Affine (6 parameters):
Good for mapping triangles to triangles
d x ( x, y ) a0 a1 x a2 y
d ( x, y ) b b x b y
y 0 1 2
Bilinear (8 parameters):
Good for mapping blocks to quadrangles
d x ( x, y ) a0 a1 x a2 y a3 xy
d ( x, y ) b b x b y b xy
y 0 1 2 3
06/25/21 97
Mesh Based Estimation
The computation of a motion vector is affected by the
neighboring vectors.
Step 1: The current frame is divided into picture elements
( which may be any polygon) such that a mesh or control
grid is formed .
Step 2: Then the nodes of each mesh is searched for in the
previous reference frame.
Step 3: After knowing the displacement vectors of the nodes
of the picture element the displacement vectors of the rest
of the pixels are obtained by interpolating the known motion
vectors.
06/25/21 98
Node Search Technique
1. Hierarchical mesh based matching algorithm (HMMA).
2. Hierarchical block based matching algorithm (HBMA).
DISADVANTAGES:
While in terms of computational complexity the BMAs certainly
have an edge over Mesh based ME
06/25/21 100
Mesh-Based Motion Estimation
A control grid is used to partition a
frame into non-overlapping polygon
elements. The nodal motion is
constrained so that a feasible mesh
is still formed with the motion.
06/25/21 101
Mesh-based vs Block-based
(a) block-based ME
(b) mesh-based ME
06/25/21 102
Example: BMA vs. Mesh-based
Target
Anchor
06/25/21 103
Frame #1 Frame #2
06/25/21 104