Lecture 20- Video Coding
Lecture 20- Video Coding
Sudipta Mahapatra
1
Video Coding
Resources:
1 Introduction to Multimedia Systems and Processing, nptel.
2. H.264 and MPEG-4 Video Compression: Video Coding for
Next-generation Multimedia by I. E. Richardson, Wiley, 2003.
3. Introduction to Data Compression by K. Sayood, Morgan
Kaufmann, 2006.
2
Introduction
8
H.261
• Introduced as a standard for teleconferencing
applications.
• Assumes one of two formats: CIF, QCIF.
• Divide an input image (frame) into 88 blocks.
• For a given block, subtract the prediction generated by
using the previous frame (might be zero).
• Apply DCT to the prediction error; quantize the DCT
coefficients; apply a VLC to the quantization labels.
9
Hybrid Video Encoder
10
Hybrid Video Decoder
11
Operation
• The current frame is temporally predicted from the
previous one(s).
• The temporal prediction is based on the assumption
that the consecutive frames in a video sequences
exhibit very close similarity, except for the fact that
the objects or the parts of a frame in general may get
somewhat displaced in position.
• This assumption is mostly valid except for the frames
having significant change of contents.
12
Operation (Contd.)
• The predicted frame generated by the exploitation of
temporal redundancy is subtracted from the incoming
video frame, pixel by pixel and the difference is the error
image.
• This will, in general, exhibit considerable spatial
redundancy; this goes through a transform block: DCT for
MPEG-1, MPEG-2 and ITU-T standard H 261, H 263 etc.
• The ITU-T standard H 264 uses an integer-DCT and
MPEG-4 supports wavelet transforms.
13
Operation (Contd.)
• The transformed coefficients are quantized
and entropy coded before being added to
the bit stream.
• The encoder has a built in decoder so as to
reconstruct the error frame, which however
will not be exact because of the quantizer.
• The error frame is added to the predicted
frame to reconstruct the next frame.
14
Operation (Contd.)
• The motion estimation block determines the
displacement between the current frame and
the stored frame.
• The displacements so computed are applied
on the stored frame in the motion
compensation unit to generate the predicted
frame.
15
Intra-coded and inter-coded frames
• Video coding exploits two types of
redundancies: Intra-frame and Inter-frame
redundancy.
• The first frame is always intra-coded, i.e., like a
still image.
• The first frame is not the only one to be intra-
coded; intra-coded frames are periodically
introduced at regular intervals to prevent
accumulation of prediction error over frames.
16
Motion estimation and motion
compensation
• These are the two additional blocks used by a video
coder.
• The motion estimation block computes the
displacement between the current frame and a stored
past frame that is used as the reference.
• Usually, the immediate past frame is considered to be
the reference; though more recent video coding
standards, such as the H.264 offer flexibility in
selecting the references frames.
17
Motion Estimation
• Consider a pixel belonging to the current frame, in association
with its neighbourhood as the candidates and then determine its
best matching position in the references frame.
• The difference in position between the candidates and its match
in the reference frame is defined as the displacement vector, or
more commonly, the motion vector.
18
Translational Motion Estimation
• Let (s1,s2) be a pixel at spatial coordinate (n1,n2) in an
integer grid corresponding to the frame-k.
• Thus, (n1,n2,k)∈Λ3, i.e., the 3-D integer space. We choose
k as the current frame, and a past frame (k-l) with l >0 as
the reference frame.
• If (d1,d2) is the motion vector corresponding to the position
(n1,n2), the translational model assumes that
19
Backward Motion Estimation
• Backward motion estimation requires that l in the
motion estimation equation should be greater than
zero.
20
Forward Motion Estimation
• Forward motion estimation requires l to be smaller
than zero. This is useful if the future frames are
buffered and later used to predict past frames.
• MPEG-1 and MPEG-2 standards support both
forward and backward motion estimation.
21
Motion Compensation
The motion compensation unit applies the
displacements to the reference frame to predict the
current frame.
22
Basic Approaches to
Motion Estimation
1. Pixel based motion estimation
Try to determine motion vectors for every pixel in the image;
referred to as the optical flow method, this works on the
fundamental assumption that is the intensity of a pixel remains
constant, when it is displaced.
2. Block-based motion estimation
The candidates frame is divided into non-overlapping blocks (of
size 16x16, or 8x8 or even 4x4 pixels in the recent standards).
23
Pixel Based ME
• However, no unique match for a pixel in the
reference frame is found in the direction normal to
the intensity gradient.
• Due to this, an additional constraint is introduced
in terms of the smoothness of velocity (or
displacement) vectors in the neighbourhood.
• The smoothness constraint makes the algorithm
interactive and requires excessively large
computation time, making it unsuitable for
practical and real time implementation.
24
Optical flow method
• Changes between video frames may be caused by:
1. Object motion (rigid object motion, e.g., a moving car, and
deformable object motion, for example a moving arm);
2. Camera motion (panning, tilt, zoom, rotation);
3. Uncovered regions (e.g., a portion of the scene background
uncovered by a moving object);
4. Lighting changes.
• With the exception of uncovered regions and lighting changes,
these differences correspond to pixel movements between
frames.
• It is possible to estimate the trajectory of each pixel between
successive video frames, producing a field of pixel trajectories
or the optical flow 25
Optical flow (Example)
Frame 1 Frame 2
Difference
26
Block based ME and MC
• Compare the M × N block in the current frame with some or all
of the possible M × N regions in the search area (usually a
region centred on the current block position);
• Find the region that gives the best match; this is the block that
gives the minimum residual energy (obtained by subtracting the
candidate region from the current M × N block);
• Finding the best match is known as motion estimation.
• The chosen candidate region becomes the predictor for the
current M×N block; this is subtracted from the current block to
form a residual M×N block (motion compensation).
• The residual block is transmitted after entropy coding along
with the offset between the current block and the position of the
candidate block (motion vector).
27
Block Based ME
• For each candidate block, the best motion vector is determined
in the reference frame.
• A single motion vector is computed for the entire block; we
make an inherent assumption that the entire block undergoes
translational motion.
• This assumption is reasonably valid, except for the object
boundaries and smaller blocks.
• Block based ME is amenable to hardware implementation,
facilitating real time ME.
28
Block based ME
29
Important observations
• When the motion compensation block size is
reduced, the residual energy is also reduced.
• But, a smaller block size increases the complexity
as more search operations are needed;
• Also, there is an increase in the number of motion
vectors that are to be transmitted.
• Better performance may be achieved by adapting
the block size to the picture characteristics: use a
bigger block size in homogeneous regions and a
smaller block size in areas of significant detail and
complex motion.
30
Advantages and Drawbacks
• Block based ME is relatively straightforward and computationally
tractable, it fits well with rectangular video frames and with
block-based image transforms (e.g. the DCT) and it provides a
reasonably effective temporal model for many video sequences.
However,
• Real objects rarely have neat edges that match rectangular
boundaries; objects often move by a fractional number of pixel
positions between frames and many types of object motion are
hard to compensate for using block-based methods (e.g.
deformable objects, rotation and warping, complex motion such
as a cloud of smoke).
31
Motion Compensated
Prediction of a Macroblock
• The macroblock, corresponding to a 16×16-pixel region of
a frame, is the basic unit for motion compensated
prediction in a number of important visual coding
standards including MPEG-1, MPEG-2, MPEG-4 Visual,
H.261, H.263 and H.264.
• For source video material in 4:2:0 format, a macroblock is
organised as shown below:
32
Motion Compensated
Prediction of a Macroblock
• A 16 × 16-pixel block (region) in the source frame is
represented by 256 luminance samples (arranged in four
8×8-sample blocks), 64 blue chrominance samples (one
33
Reading Assignment
• Go through Chapter 3 of I. Richardson’s
book on H.264 and MPEG4 VC.
34
What you should be able to do at this point
1. Name at least five major applications of video
compression and coding.
2. Present the block-diagram of a hybrid video codec.
3. Define intra-coded and inter-coded frames.
4. Explain the role of motion estimation in video codecs.
5. Explain the role of motion compensation in video
codecs.
6. Define the translational model of motion estimation.
7. Define backward motion estimation.
8. Name two distinct approaches to motion estimation.
35
Assignment
36