0% found this document useful (0 votes)
23 views

Lecture 20- Video Coding

The document discusses video coding and compression, emphasizing the importance of reducing data size for various applications such as video conferencing and streaming. It outlines different video formats, standards, and the processes involved in motion estimation and compensation, which are crucial for efficient video encoding. Additionally, it highlights the distinction between intra-coded and inter-coded frames and the significance of block-based motion estimation in video coding standards like H.264 and MPEG-4.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 20- Video Coding

The document discusses video coding and compression, emphasizing the importance of reducing data size for various applications such as video conferencing and streaming. It outlines different video formats, standards, and the processes involved in motion estimation and compensation, which are crucial for efficient video encoding. Additionally, it highlights the distinction between intra-coded and inter-coded frames and the significance of block-based motion estimation in video coding standards like H.264 and MPEG-4.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Multimedia Systems and Applications

Sudipta Mahapatra

1
Video Coding

Resources:
1 Introduction to Multimedia Systems and Processing, nptel.
2. H.264 and MPEG-4 Video Compression: Video Coding for
Next-generation Multimedia by I. E. Richardson, Wiley, 2003.
3. Introduction to Data Compression by K. Sayood, Morgan
Kaufmann, 2006.
2
Introduction

Out of the various data sources, the largest amount of data


is generated by video. Therefore, compression is
mandatory for the viability of a number of applications,
including the following:
 conversational video like video telephone,
 video conferencing through wired and wireless
medium;
 streaming video, such as video on demand;
 digital TV/HDTV broadcasting,
 image / video database services,
 CD/DVD storage.
3
Introduction (Contd.)
For compressing video, in addition to spatial redundancy, one
can exploit temporal redundancy between successive video
frames.
 Composition of video signal:
For displaying colour images we need the video signal to have
three components: a red component (R), a green component
(G) and a blue component (B).
Instead of the three colour components, a composite signal
may be transmitted (to maintain compatibility with b/w
systems). This consists of a luminance component (Y) and two
chrominance components (U and V) such that
Y=0.299R+0.587G+0.114B
Cb=B-Y
Cr=R-Y
4
Video Signal Format
For different applications we use different formats to represent
video signal.
CIF: Luminance represented by an array of size 288352; The
chrominance components are represented by two arrays, each of
size 144 176.
QCIF: In Quarter CIF format we have half the number of pixels in
both the rows and columns.
MPEG-SIF: Luminance-360240, Chrominance-180120.

Note: Go through Chapter 2 of the book on H.264 and MPEG-


4 Video compression by Ian Richardson on Video Formats and
Quality.
5
Sampling Rate
• The CCIR (International Consultative Committee on Radio)
recommended base sampling rate is based on a sampling
frequency of 3.725 MHz. Multiples of this sampling frequency
make samples of each line to align vertically, thus producing a
rectangular array of pixels or an image.
• Sampling rate is a multiple (up to 4) of the base sampling
frequency.
• Sampling rate is represented as a triple of integer numbers, the
first one corresponds to the sampling of the luminance (Y)
component and the remaining two correspond to chrominance
components (U and V).
Note: The Cr and Cb components may be represented
with a lower resolution than Y because the HVS is less
6
sensitive to colour than luminance.
Standards
Standard Video-coding Typical range Typical
Organization standard application

ITU-T H.261 px64 Kb/s ISDN video phone


P=1,2,…,30
ISO MPEG-1 1.2 Mb/s CD-ROM
ISO MPEG-2 4-80 Mb/s HDTV
ITU-T H.263 64 Kb/s or below PSTN video phone
ISO MPEG-4 24-1024 Kb/s Interactive audio/
video
ITU-T H.263 v2 <64 Kb/s PSTN video phone
ITU-T H.26L <64 Kb/s Network-friendly
packet-based
video
ITU-T H.264 Low to very high Low bit-rate
streaming to
HDTV broadcast
7
Video Coding Standards

8
H.261
• Introduced as a standard for teleconferencing
applications.
• Assumes one of two formats: CIF, QCIF.
• Divide an input image (frame) into 88 blocks.
• For a given block, subtract the prediction generated by
using the previous frame (might be zero).
• Apply DCT to the prediction error; quantize the DCT
coefficients; apply a VLC to the quantization labels.

9
Hybrid Video Encoder

10
Hybrid Video Decoder

11
Operation
• The current frame is temporally predicted from the
previous one(s).
• The temporal prediction is based on the assumption
that the consecutive frames in a video sequences
exhibit very close similarity, except for the fact that
the objects or the parts of a frame in general may get
somewhat displaced in position.
• This assumption is mostly valid except for the frames
having significant change of contents.

12
Operation (Contd.)
• The predicted frame generated by the exploitation of
temporal redundancy is subtracted from the incoming
video frame, pixel by pixel and the difference is the error
image.
• This will, in general, exhibit considerable spatial
redundancy; this goes through a transform block: DCT for
MPEG-1, MPEG-2 and ITU-T standard H 261, H 263 etc.
• The ITU-T standard H 264 uses an integer-DCT and
MPEG-4 supports wavelet transforms.

13
Operation (Contd.)
• The transformed coefficients are quantized
and entropy coded before being added to
the bit stream.
• The encoder has a built in decoder so as to
reconstruct the error frame, which however
will not be exact because of the quantizer.
• The error frame is added to the predicted
frame to reconstruct the next frame.

14
Operation (Contd.)
• The motion estimation block determines the
displacement between the current frame and
the stored frame.
• The displacements so computed are applied
on the stored frame in the motion
compensation unit to generate the predicted
frame.

15
Intra-coded and inter-coded frames
• Video coding exploits two types of
redundancies: Intra-frame and Inter-frame
redundancy.
• The first frame is always intra-coded, i.e., like a
still image.
• The first frame is not the only one to be intra-
coded; intra-coded frames are periodically
introduced at regular intervals to prevent
accumulation of prediction error over frames.
16
Motion estimation and motion
compensation
• These are the two additional blocks used by a video
coder.
• The motion estimation block computes the
displacement between the current frame and a stored
past frame that is used as the reference.
• Usually, the immediate past frame is considered to be
the reference; though more recent video coding
standards, such as the H.264 offer flexibility in
selecting the references frames.

17
Motion Estimation
• Consider a pixel belonging to the current frame, in association
with its neighbourhood as the candidates and then determine its
best matching position in the references frame.
• The difference in position between the candidates and its match
in the reference frame is defined as the displacement vector, or
more commonly, the motion vector.

18
Translational Motion Estimation
• Let (s1,s2) be a pixel at spatial coordinate (n1,n2) in an
integer grid corresponding to the frame-k.
• Thus, (n1,n2,k)∈Λ3, i.e., the 3-D integer space. We choose
k as the current frame, and a past frame (k-l) with l >0 as
the reference frame.
• If (d1,d2) is the motion vector corresponding to the position
(n1,n2), the translational model assumes that

19
Backward Motion Estimation
• Backward motion estimation requires that l in the
motion estimation equation should be greater than
zero.

20
Forward Motion Estimation
• Forward motion estimation requires l to be smaller
than zero. This is useful if the future frames are
buffered and later used to predict past frames.
• MPEG-1 and MPEG-2 standards support both
forward and backward motion estimation.

21
Motion Compensation
The motion compensation unit applies the
displacements to the reference frame to predict the
current frame.

22
Basic Approaches to
Motion Estimation
1. Pixel based motion estimation
Try to determine motion vectors for every pixel in the image;
referred to as the optical flow method, this works on the
fundamental assumption that is the intensity of a pixel remains
constant, when it is displaced.
2. Block-based motion estimation
The candidates frame is divided into non-overlapping blocks (of
size 16x16, or 8x8 or even 4x4 pixels in the recent standards).

23
Pixel Based ME
• However, no unique match for a pixel in the
reference frame is found in the direction normal to
the intensity gradient.
• Due to this, an additional constraint is introduced
in terms of the smoothness of velocity (or
displacement) vectors in the neighbourhood.
• The smoothness constraint makes the algorithm
interactive and requires excessively large
computation time, making it unsuitable for
practical and real time implementation.

24
Optical flow method
• Changes between video frames may be caused by:
1. Object motion (rigid object motion, e.g., a moving car, and
deformable object motion, for example a moving arm);
2. Camera motion (panning, tilt, zoom, rotation);
3. Uncovered regions (e.g., a portion of the scene background
uncovered by a moving object);
4. Lighting changes.
• With the exception of uncovered regions and lighting changes,
these differences correspond to pixel movements between
frames.
• It is possible to estimate the trajectory of each pixel between
successive video frames, producing a field of pixel trajectories
or the optical flow 25
Optical flow (Example)

Frame 1 Frame 2

Difference

26
Block based ME and MC
• Compare the M × N block in the current frame with some or all
of the possible M × N regions in the search area (usually a
region centred on the current block position);
• Find the region that gives the best match; this is the block that
gives the minimum residual energy (obtained by subtracting the
candidate region from the current M × N block);
• Finding the best match is known as motion estimation.
• The chosen candidate region becomes the predictor for the
current M×N block; this is subtracted from the current block to
form a residual M×N block (motion compensation).
• The residual block is transmitted after entropy coding along
with the offset between the current block and the position of the
candidate block (motion vector).

27
Block Based ME
• For each candidate block, the best motion vector is determined
in the reference frame.
• A single motion vector is computed for the entire block; we
make an inherent assumption that the entire block undergoes
translational motion.
• This assumption is reasonably valid, except for the object
boundaries and smaller blocks.
• Block based ME is amenable to hardware implementation,
facilitating real time ME.

28
Block based ME

29
Important observations
• When the motion compensation block size is
reduced, the residual energy is also reduced.
• But, a smaller block size increases the complexity
as more search operations are needed;
• Also, there is an increase in the number of motion
vectors that are to be transmitted.
• Better performance may be achieved by adapting
the block size to the picture characteristics: use a
bigger block size in homogeneous regions and a
smaller block size in areas of significant detail and
complex motion.
30
Advantages and Drawbacks
• Block based ME is relatively straightforward and computationally
tractable, it fits well with rectangular video frames and with
block-based image transforms (e.g. the DCT) and it provides a
reasonably effective temporal model for many video sequences.
However,
• Real objects rarely have neat edges that match rectangular
boundaries; objects often move by a fractional number of pixel
positions between frames and many types of object motion are
hard to compensate for using block-based methods (e.g.
deformable objects, rotation and warping, complex motion such
as a cloud of smoke).

31
Motion Compensated
Prediction of a Macroblock
• The macroblock, corresponding to a 16×16-pixel region of
a frame, is the basic unit for motion compensated
prediction in a number of important visual coding
standards including MPEG-1, MPEG-2, MPEG-4 Visual,
H.261, H.263 and H.264.
• For source video material in 4:2:0 format, a macroblock is
organised as shown below:

32
Motion Compensated
Prediction of a Macroblock
• A 16 × 16-pixel block (region) in the source frame is
represented by 256 luminance samples (arranged in four
8×8-sample blocks), 64 blue chrominance samples (one

8×8 block) and 64 red chrominance samples (8 × 8),


giving a total of six 8 × 8 blocks.
• An MPEG-4 Visual or H.264 CODEC processes each
video frame in units of a macroblock.

33
Reading Assignment
• Go through Chapter 3 of I. Richardson’s
book on H.264 and MPEG4 VC.

34
What you should be able to do at this point
1. Name at least five major applications of video
compression and coding.
2. Present the block-diagram of a hybrid video codec.
3. Define intra-coded and inter-coded frames.
4. Explain the role of motion estimation in video codecs.
5. Explain the role of motion compensation in video
codecs.
6. Define the translational model of motion estimation.
7. Define backward motion estimation.
8. Name two distinct approaches to motion estimation.
35
Assignment

36

You might also like