0% found this document useful (0 votes)
67 views104 pages

Lecture 10 Introduction To Video Processing & Applications: CSE 489-02 & CSE 589-02 Multimedia Processing

This document provides an introduction to video processing and applications. It discusses how digital video is a sequence of digitized images along the temporal axis. Video processing involves running programs to perform operations like compression, filtering, and retrieval that are driven by real-world applications. The document then provides a brief history of video technology and discusses why video is important by extending our vision capabilities in space and time. It also discusses the diversity and importance of motion in video and challenges in understanding and modeling video.

Uploaded by

pranay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views104 pages

Lecture 10 Introduction To Video Processing & Applications: CSE 489-02 & CSE 589-02 Multimedia Processing

This document provides an introduction to video processing and applications. It discusses how digital video is a sequence of digitized images along the temporal axis. Video processing involves running programs to perform operations like compression, filtering, and retrieval that are driven by real-world applications. The document then provides a brief history of video technology and discusses why video is important by extending our vision capabilities in space and time. It also discusses the diversity and importance of motion in video and challenges in understanding and modeling video.

Uploaded by

pranay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 104

CSE 489-02 & CSE 589-02 Multimedia Processing

Lecture 10 Introduction to Video


Processing & Applications

Spring 2009

New Mexico Tech

06/25/21 1
Digital Video Processing
 Digital Video
 Digitized
 A sequence of images along the temporal axis
 Processing
 A running program or computing operation
 Driven by the real-world applications (e.g.,
compression, filtering, retrieval)

06/25/21 2
A Brief History
 Born of Television (1920s)
 Cable TV system (1968)
 Video games (1970s)
 All-digital HDTV (1990s)
 Video streaming (2000s)
 Everyday video transmission through internet
and wireless networks (20??)

06/25/21 3
Why Video?
 The magic of Tele-Vision
 Our vision capability is extended in space

You don’t need to travel to north pole to watch polar bears

06/25/21 4
Why Video? (Cont’d)
 Our vision capability is extended in time
 If time can be reversed, I will not need a
Gigabyte hard-drive to store the moments of
how a baby is growing

 The fundamental interplay between time


and motion
 We measure time by the motion of material
things
 Motion offers a new horizon for us to
understand the world

06/25/21 5
Importance of Motion
 Our HVS routinely perceives and interprets
motion (neurobiology)

 Functional MRI (fMRI)


 By measuring the increase in blood flow to the
local vasculature that accompanies neural
activity in the brain, fMRI studies brain
function instead of anatomy

 Gait-based biometrics
 The characteristics of an individual’s walk
06/25/21 6
Diversity of Motion

06/25/21 7
Motion in Video
 It is not an arbitrary concatenation of
images, but a sequence of images carrying
a coherent interpretation of natural scene
 Ordering is important
 Sampling rate is important
 The role of a single frame is less important
due to the masking effect of HVS

06/25/21 8
How to Understand Video?
 Understand the source
 How to model the motion of a camera?
(relatively easy)
 How to model the motion in the real
world? (notoriously difficult)
 Understand the mechanism of time-
varying image formation model
 Two sides: geometric and photometric

06/25/21 9
Camera Motion
 How many scene changes?
 Within each scene, what kind of camera
motion do you see?
 camera panning
 zoom in/out
 combination

06/25/21 10
2-D Motion Corresponding to
Camera Motion

06/25/21 11
Real-world Motion
 Every motion you observe for a day
 Can you classify them into a few simple
classes?
 Rigid motion vs. deformable motion
 IF you observe multiple motions at the
same time, how about the spatial
relationship among different moving
objects?
 Overlapping vs. non-overlapping

06/25/21 12
Rigid Object Motion

06/25/21 13
Flexible Object Motion
 Two ways to describe
 Decompose into multiple but connected rigid
sub-objects
 Global motion plus local motion in sub-objects
 Ex. Human body consists of many parts each
undergo a rigid motion

06/25/21 14
An Example

06/25/21 15
Geometric Image Formation Models

06/25/21 16
Photometric Image Formation Models

 Modeling surface reflectance function

 Modeling illumination condition


 Light source location and intensity

 Modeling the photometric impact of 3D motion

06/25/21 17
Photometric Image Formation Models

06/25/21 18
Why Video is Hard?
 The daunting modeling complexity
 Scene geometry, lighting condition,
object/camera motion, sensor
characteristics
 We have to rely on digital computers
to process video
 Limited memory and computation
resource
 Fundamental question about computing

06/25/21 19
Example: 2D Motion Estimation

1st frame 2nd frame

06/25/21 20
Fundamental Assumption

(vx,vy)
In-1(x,y) In(x,y)

the n-1-th frame the n-th frame


Image intensity field is smooth along the motion trajectory

I n ( x, y )  I n 1 ( x  vx , y  v y )

06/25/21 21
Overview of Video Processing
Video
Manipulation Video
Display
Video
Video Compression Video
Acquisition Database
Video
Computer Transmission
Graphics
Video Computer
Analysis Vision

06/25/21 22
Video Acquisition

Video camera VHS digitization

computer-generated
06/25/21 23
Acquisition-related Problems
 Video camera
 What if camera is not kept still?
 Why is it difficult to improve the spatial resolution of
video cameras?
 VHS digitization
 What if VHS contains some scratches?
 How to handle interlaced video?
 Computer-generated
 How is this type of video different? Shouldn’t we have a
separate coding algorithm for this type of video?

06/25/21 24
Video Manipulation
 Why?
 Fight against a non-ideal video acquisition (e.g.,
analog heritage, film scratches, limited
resolution) or transmission environment
 Create new and artificial video content (e.g.,
spatio-temporal interpolation,
background/foreground modification)

06/25/21 25
Video Dejittering

PDE-based approach by Jackie Shen


http://epubs.siam.org/sam-bin/dbq/article/41869

06/25/21 26
Video Inpainting

Cool application: remove the annoying texts added by


various video conversion software

06/25/21 27
Error Concealment

some blocks are corrupted corrupted blocks are recovered


due to channel errors From surrounding neighbors
06/25/21
in space and time 28
Deinterlacing

field odd even odd even frame n-2 n-1 n n+1


interlaced progressive
06/25/21 29
Superresolution
LR sequence

… …

… …

HR sequence

06/25/21 30
Post-processing
Deblocking: suppress block artifacts in video

decoded video frame processed video frame


at very low bit rate after deblocking

06/25/21 31
Video Matting

06/25/21 32
Video Games

06/25/21 33
Video Dynamosaics

06/25/21 34
Dynamosaics Result

Source: http://www.vision.huji.ac.il/dynmos/

06/25/21 35
Video Coding Overview
 The grand challenge
 We still face insufficient storage space for
video data even with Gigabyte hard disks
 Video transmission through limited
bandwidth channels
 Existing approaches
 Three-dimensional waveform coding
 Motion-compensated hybrid coding
 Model-based coding
 Video coding standards

06/25/21 36
Three-dimensional Waveform Coding*
 Image and video coding
 Sub-band/wavelet coding of 2D signals
 Wavelet works because of its good localization
property in both space and frequency
 SPHIT AND SPHIT3

http://www.cipr.rpi.edu/research/SPIHT/

06/25/21 37
Motion-compensated
Predictive Coding
 Basic idea
 DPCM coding in temporal domain
 To reduce overhead on motion field, motion vector is
assigned to each block instead of each pixel
 After block-wise motion compensation, code motion-
compensated residues like still images
 Variations: variable block size, fractional-pel
accuracy, overlapped block motion compensation
(OBMC)
 All existing video coding standards from H.261 to
the latest H.264 fall under such category

06/25/21 38
Model-based Coding
 Object-based coding
 Attempt to replace blocks by objects
 Its success remains uncertain due to
difficulty of segmentation
 Knowledge-based coding
 Explicitly build 3D waveframe models to
represent moving objects
 Limited success in videophone
applications

06/25/21 39
Video Coding Standards
ISO ITU
MPEG-1 (1992) H.261 (1990)
1.5Mbps, VCD p×64Kbps

MPEG-2 (1996) H.263


2-10Mbps, DVD 8-64Kbps, videophone
MPEG-4 (2000) H.263+/++
8-1024Kbps, videophone 8-64Kbps, videophone

Digital cinema (ongoing) H.264/AVC (2003)

windows media player(Microsoft)


Skype Video??
real player(Real-Networks)
06/25/21 40
Transcoding Problem
 How to translate a piece of MPEG2 (DVD)
video into WMV format?
 Straightforward approach: decode it by MPEG2
decoder and then encoder it by MSC MPEG4
encoder
 Transcoding approach: achieve the same goal
with reduced computational cost
 When spatial or temporal resolution
changes, the goal of complexity reduction
becomes more difficult to achieve in
transcoding
06/25/21 41
New Directions in Video Coding
 Distributed video coding for sensor
networks
 How to shift MC from encoder to decoder?
 Video coding for cartoon sequences
 Existing techniques work terribly on them
 Video coding inspired by studies of HVS
 You have seen the impact of motion masking
 There also exists other properties of HVS that
can be exploited

06/25/21 42
Video Transmission
 Downloading
 Pro: you can have your own copy and can
watch it offline
 Con: you have to wait!!!
 Streaming
 Pro: no need to store (we seldom watch a
movie again and again)
 Con: you have to have a good network
connection and pray for less traffic

06/25/21 43
Video Transmission Through Networks
 Networking protocols
 Transmission Control Protocol (TCP)
 User Datagram Protocol (UDP)
 Real Time Protocol (RTP) and VDP
 Real Time Streaming Protocol (RTSP)
 ReSerVation Protocol (RSVP)
 Transmission Control Protocol is not suitable
for video streaming because
 TCP imposes its own flow control and windowing
schemes on the data stream, effectively
destroying temporal relations between video
frames
 Reliable message delivery is unnecessary for video
- losses are tolerable and TCP retransmission
causes further jitter and skew.
06/25/21 44
Security issues
 Video is unique
 high data rate, power hungry, time
constrained, loss-tolerant, content with varying
importance
 Content access control
 Cryptographic approaches
 Digital video scrambling techniques
 Piracy and malicious attacks
 Video watermarking

06/25/21 45
Video Content Protection by
Watermarking Techniques

Signature insertion

Signature extraction

06/25/21 46
Research Ideas
 Distributed video coding for error resilience
 Further extension of multiple descriptions
 Motion estimation/compensation is performed
at the decoder instead of encoder
 Power-constrained transmission
 Sensor network applications and handheld
devices
 Authentication in networked transmission
 Transmission errors vs. malicious attacks
 Transcoding distortions vs. intentional attacks

06/25/21 47
Video Analysis
 Motion segmentation
 In contrast to image segmentation, motion
offers valuable clues for separating different
objects
 Motion tracking
 Track the same object across video frames
 Motion interpretation
 Easy for HVS, difficult for a computer (e.g.,
summarize a 6-hr. baseball video into 30min.)

06/25/21 48
Motion Segmentation

06/25/21 49
Motion Tracking

06/25/21 50
Motion Interpretation
 Scene change detection
 Where motion tracking fails
 Cut, dissolve, wipe classification
 Those are artificial features added by video
editing staff
 Analyze each video segment
 Camera motion: panning or zooming or still
 Object motion: shape, direction, speed, etc.

06/25/21 51
Application (I): Video Summarization

Extract “important” motion pictures such as home-runs

06/25/21 52
Application (II): Video-based Lifeguard

Application in swimming pool monitoring to prevent drowning


06/25/21 53
Application (III): Irregularity Detection

Source: http://www.wisdom.weizmann.ac.il/~vision/Irregularities.html
06/25/21 54
06/25/21 55
Application (III): Irregularity
Detection

Source: http://www.wisdom.weizmann.ac.il/~vision/Irregularities.html
06/25/21 56
Video Database Management
 Database management
 Indexing, parsing, browsing, querying
 Retrieval
 What is special about video?
 Formidable amount of data
 Difficulty with query (content-based)
 Inherent uncertainty and imprecision

06/25/21 57
Content-Based Video Retrieval
(CBVR)
 How to provide a compact and complete
video sequence representation?
 Spatial analysis (histogram, color, texture)
 Temporal analysis (cut, dissolve, wipe)
 How to provide easy-to-use and efficient
query interface to user
 Video browsing (slide vs. 3D)
 Video querying (example-based, text-based)

06/25/21 58
Compressed-domain Video Analysis
 Since video data often exist in compressed
format, it is preferred to do analysis with bit
streams rather than pixel values
 Examples: caption detection, shot detection etc.
 The key issue lies in how to exploit the
information contained in the bit stream
 It does not cost much computation
 It is constrained by the adopted compression
techniques and never perfect (e.g., block motion
field)

06/25/21 59
Two –Dimensional
Motion Estimation

06/25/21 60
Motion vs. Optical Flow

06/25/21 61
General Consideration

06/25/21 62
Motion Representation

06/25/21 63
Notations

06/25/21 64
Motion Estimation Criterion

06/25/21 65
Optimization Methods

06/25/21 66
Block-Based Motion Estimation

06/25/21 67
Block-Matching Algorithm

06/25/21 68
Exhaustive Block Matching Algorithm
(EBMA)

06/25/21 69
Complexity of Integer-Pel EBMA

06/25/21 70
Fast BMA (1): 3-Step-Search

search 9+8+8=
25 points

06/25/21 71
Fast BMA (2): 2D-Log Search

search at most
5+4+2+3+2+8=
24 points

06/25/21 72
Fast BMA (3): Orthogonal Search

search at most
2(3+2+2+2+2+2)=
26 points

06/25/21 73
Fast BMA (4): Cross Search
As the step size decreased to one, a
(+) cross search pattern (as shown in
lower-left side of figure) is used if the
minimum BDM point of the previous
step is either the center, upper-left or
lower-right checking point. Otherwise,
(X) cross search pattern (as shown in
upper-right side of figure) is used.

search at most
5+4+4+4=
17 points

06/25/21 74
Fast BMA (5): New 3-Step Search

06/25/21 75
New 3-Step Search: Examples

06/25/21 76
Fast BMA (6): 4-Step Search
Search the 9 checking points located at
a 5-by-5 window to see if the point reaching
the minimum distortion is found at the center?

N
Y
Is it at the corner or not? N

Search 5 additional Search 3 additional


Checking points Checking points Y

Repeat the procedure


in the dashed box

Final 3-by-3 search


06/25/21 77
4-Step Search: Examples

06/25/21 78
Multi-resolution Representation of Images
M/4
N/4

M/2
N/2

Multi-resolution representation by pyramid

06/25/21 79
Why does Hierarchical Strategy Help?
Level-2
ME result

Level-1

ME result

Level-0

06/25/21 80
Hierarchical Block Matching Algorithm
(HBMA)

06/25/21 81
Example: Three-level HBMA

06/25/21 82
Fast BMA (7): Hierarchical Search

06/25/21 83
Summary
 Why do we care fast BMA?
 Driven by the application demands of video
coding
 Can we go beyond BMA?
 The block-based constraint is simple but not
appropriate for accounting for arbitrary shape
of moving objects
 The integer-pel accuracy is not sufficient to
account for continuous nature of motion

06/25/21 84
Fractional Accuracy EBMA

06/25/21 85
Why Do We Need Fraction-pel?

06/25/21 86
Fractional-pel BMA
2N

M linear 2M
interpolation

original reference frame

interpolated reference frame

06/25/21 87
Half-pel BMA 1

1
1
1

current frame

digits indicate physical distances


reference frame
06/25/21 88
Bilinear Interpolation
(x,y) (x+1,y) (2x,2y) (2x+1,2y)

(2x,2y+1) (2x+1,2y+1)

(x,y+!) (x+1,y+1)

O[2x,2y]=I[x,y]
O[2x+1,2y]=(I[x,y]+I[x+1,y])/2
O[2x,2y+1]=(I[x,y]+I[x,y+1])/2
O[2x+1,2y+1]=(I[x,y]+I[x+1,y]+I[x,y+1]+I[x+1,y+1])/4

06/25/21 89
Hierarchical Strategy for
Half-pel BMA

Integer-pel

Half-pel

06/25/21 90
Generalizations of BMA
 Variable block-size matching algorithms
 Widely used by various video coding standards
 H.264 includes three variable block sizes: 4-
by-4, 8-by-8 and 16-by-16
 Fractional-pel accuracy BMA
 Half-pel : MPEG-1/2/4, H.263/H.263+
 Quarter-pel: H.264 (even 1/8-pel)
 Tradeoff between overhead on motion and
MCP efficiency

06/25/21 91
Variable Block-size BMA

16-by-16 8-by-8 4-by-4

06/25/21 92
BMA Strategy Adopted by H.263

16-by-16 8-by-8

Macroblock level Block level

06/25/21 93
BMA Strategy Adopted by H.264

16-by-16 8-by-16 16-by-8 8-by-8

8-by-8 4-by-8 8-by-4 4-by-4

Note: require overhead to signal which partition is adopted by the encoder


06/25/21 94
Deformable Block Matching Algorithm

06/25/21 95
Overview of DBMA
 Three steps:
 Partition the anchor frame into regular blocks
 Model the motion in each block by a more
complex motion
 The 2-D motion caused by a flat surface patch
undergoing rigid 3-D motion can be approximated well
by projective mapping
 Projective Mapping can be approximated by affine
mapping and bilinear mapping
 Estimate the motion parameters block by block
independently
 Discontinuity problem cross block boundaries still
remain
06/25/21 96
Affine and Bilinear Model
 Affine (6 parameters):
 Good for mapping triangles to triangles

 d x ( x, y )  a0  a1 x  a2 y 
 d ( x, y )    b  b x  b y 
 y   0 1 2 

 Bilinear (8 parameters):
 Good for mapping blocks to quadrangles

 d x ( x, y )  a0  a1 x  a2 y  a3 xy 
d ( x, y )   b  b x  b y  b xy 
 y   0 1 2 3 

06/25/21 97
Mesh Based Estimation
The computation of a motion vector is affected by the
neighboring vectors.
Step 1: The current frame is divided into picture elements
( which may be any polygon) such that a mesh or control
grid is formed .
Step 2: Then the nodes of each mesh is searched for in the
previous reference frame.
Step 3: After knowing the displacement vectors of the nodes
of the picture element the displacement vectors of the rest
of the pixels are obtained by interpolating the known motion
vectors.

06/25/21 98
Node Search Technique
1. Hierarchical mesh based matching algorithm (HMMA).
2. Hierarchical block based matching algorithm (HBMA).

In HMMA the corners of blocks are taken as nodes while in


HBMA the centers of blocks are taken as nodes.
While in terms of PSNR values : The coding gain of HMMA is
not significant.
But in case of prediction accuracy mesh based models tend to
give more pleasing prediction, especially in the presence of non-
translational motions, like rotation and turning.
So, by using HBMA we can certainly exploit lower complexity
advantage of BMAs in mesh based models as well.
06/25/21 99
Mesh Based Estimation vs. BMAs
ADVANTAGES:
Mesh based models give in general a more continuous effect
than BMAs .
So, in terms of prediction accuracy, mesh based models can give
visually more pleasing prediction, specially in the presence non-
translational motions, such as head rotation and turning.

DISADVANTAGES:
While in terms of computational complexity the BMAs certainly
have an edge over Mesh based ME

06/25/21 100
Mesh-Based Motion Estimation
A control grid is used to partition a
frame into non-overlapping polygon
elements. The nodal motion is
constrained so that a feasible mesh
is still formed with the motion.

(a) Using a triangular mesh

(b) Using a quadrilateral mesh

06/25/21 101
Mesh-based vs Block-based
(a) block-based ME

(b) mesh-based ME

(c) mesh-based motion tracking

06/25/21 102
Example: BMA vs. Mesh-based
Target

Anchor

EBMA (half-pel) (29.86dB) Predicted

06/25/21 103

Mesh-based method (29.72dB)


Experiment

Frame #1 Frame #2

06/25/21 104

You might also like