0% found this document useful (0 votes)
21 views127 pages

Unit-Iii: Audio & Video Coding

The document discusses video coding, highlighting its characteristics, the need for compression, and the principles behind video coding techniques. It explains various types of redundancies in video data, the importance of motion estimation, and the different types of frames used in video coding. Additionally, it covers digital video formats, chrominance sub-sampling, and the chronological development of video coding standards.

Uploaded by

ritnainverma123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views127 pages

Unit-Iii: Audio & Video Coding

The document discusses video coding, highlighting its characteristics, the need for compression, and the principles behind video coding techniques. It explains various types of redundancies in video data, the importance of motion estimation, and the different types of frames used in video coding. Additionally, it covers digital video formats, chrominance sub-sampling, and the chronological development of video coding standards.

Uploaded by

ritnainverma123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

UNIT-III:

AUDIO & VIDEO CODING

PART-II
VIDEO CODING

ELE-4410: Multimedia Systems & Networks 1


Video= Motion Picture
◼ Frame by frame => image sequence
◼ An image sequence (or video) is a series of 2-D images that are sequentially
ordered in time. (3-D digital signal)
◼ Video is sequence of images captured/played @ 25/30/60/100 frames/sec.

0 1 2 3

ELE-4410: Multimedia Systems & Networks 2


Characteristics of Video
◼ Adjacent frames are similar and changes are
due to object or camera motion

ELE-4410: Multimedia Systems & Networks 3


Characteristics of Video: Example
◼ Only the sun has changed position between these 2
frames

Current Frame
Previous Frame

ELE-4410: Multimedia Systems & Networks 4


Need of Video Compression?
◼ Uncompressed 1080p high definition (HD) video at 24
frames/ second
❑ Pixels per frame: 1920x1080
❑ Bits per pixel: 8-bits x 3 (RGB)
❑ 1.5 hours: 806 GB
❑ Bit-rate: 1.2 Gbits/s

◼ Blu-Ray DVD
❑ Capacity: 25 GB (single layer)
❑ Read rate: 36 Mbits/s
◼ Video Streaming or TV Broadcast
❑ 1 Mbits/s to 20 Mbits/s
Requires 30x to 1200x compression
ELE-4410: Multimedia Systems & Networks 5
Principles of video coding
◼ Compression is achieved by removing redundant and irrelevant
information from the video sequence
◼ Redundancies in videos:
❑ Spatial redundancy
◼ Neighbouring pixels inside a picture are similar.
❑ Statistical redundancy
◼ Unequal distribution of colour intensities.
◼ Some colours are more dominant than others.
❑ Temporal redundancy
◼ Similarity among the frames.

◼ Irrelevant Information in Videos:


❑ Minute color and intensity differences which are imperceptible by HVS (psycho
visual redundancy).

2 3
0 1
6
Image Vs Video Coding
◼ Image coding: uses Spatial and Statistical
redundancy reductions
◼ Video coding: uses Spatial, Statistical AND
Temporal redundancy reductions
Perceptual
Redundancy
reduction

Image data
DCT Q VLC

Spatial & Statistical redundancy reduction

ELE-4410: Multimedia Systems & Networks 7


Inter-frame Video Coding
Perceptual
Spatial

out
in + DCT Q VLC

-
Statistical
IDCT
+

Buffer +
ME
Temporal
ELE-4410: Multimedia Systems & Networks 8
Inter-frame Video Coding

ELE-4410: Multimedia Systems & Networks 9


Key ideas in Video Compression
◼ Predict a new frame from a previous frame and only code
the prediction error
◼ Prediction error will be coded using the Transform
methods (DCT or wavelet).
◼ Prediction errors have smaller energy than the original
pixel values and can be coded with fewer bits
◼ Those regions that cannot be predicted well will be coded
directly using Transform coding (DCT).
◼ Divide each frame (predicted/unpredicted) into smaller
block, called Macroblock (MB).
◼ Work on each MB (16x16 pixels) independently for
reduced complexity
❑ Motion compensation done at the MB level.
❑ DCT coding of error at the block level (8x8 pixels).

ELE-4410: Multimedia Systems & Networks 10


Temporal Redundancies

Frame 0 Frame 1

Scaled Frame
Difference

ELE-4410: Multimedia Systems & Networks 11


Key Ideas in Video Coding
Transform Coding: Predictive Coding:
(DPCM)

ELE-4410: Multimedia Systems & Networks 12


Motion Compensated Hybrid Video
Coding

Video
input

ELE-4410: Multimedia Systems & Networks 13


Hybrid Video Decoder

ELE-4410: Multimedia Systems & Networks 14


Chronological Table of Video Coding
Standards
(2000)
H.263++

ITU-T
(1995/96)
H.263

H.263 H.263++
(1995/96) (2000)
(1997/98)
H.263+
(1990)
H.261

H.261 H.263+
( MPEG-4
H.264
Part
(2002)
10 )

(1990) (1997/98) H.264


H.264/SVC, MVC Extension
(2006/2009)

MV-HEVC, 3D-HEVC
H.265/SHVC
(2016)

H.264/SVC,
(1994/95)
(H.262)
MPEG-2

H.265/
(Jan 2013)
HEVC

MPEG-2 H.265/ H.265/SHVC


( MPEG-4 MVC
HEVC MV-HEVC,
(H.262) MPEG-4 v1
(1998/99)
Part 10 ) Extension
3D-HEVC
MPEG-4 v1 (2002) (2006/2009) (Jan 2013)
(1994/95)
ISO/IEC (1998/99)
(2016)

MPEG MPEG-4 v2
(1999/00)

MPEG-4 v2
(1999/00)
(1993)
MPEG-1

MPEG-1 MPEG-4 v3
(2001)

(1993) MPEG-4 v3
(2001)

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

ELE-4410: Multimedia Systems & Networks 15


Application Scenario

Application Bit Rate Video Standard


UHDTV 50-100 Mbps HEVC

Digital TV Broadcasting 2…6 Mbps (10…20 MPEG-2, H.264/AVC


Mbps for HD)

DVD Video 6…8 Mbps MPEG-2, H.264/AVC

Internet video streaming 20…200 kbps H.263, MPEG-4 or H.264

Video conferencing, 20-320 kbps H.261, H.263,


video telephony H.264/AVC

Video over 3G wireless 20-200kbps H.263, MPEG-4

ELE-4410: Multimedia Systems & Networks 16


Digital Video Formats
◼ Composite video
❑ Convert RGB to YIQ (YUV)
❑ Multiplexing YIQ into a single signal
❑ Used in most consumer analog video devices
◼ Each frame is divided into 16x16 non-overlapping blocks,
consisting of luminance and chrominance (YUV or YCbCr)
components. Each block is called Macroblock (MB).
◼ The luminance and chrominance components share the
same image boundary in the image.
◼ Two chrominance components (CbCr) will always occur in
pairs.

ELE-4410: Multimedia Systems & Networks 17


Pixel Representation
◼ Y,U,V Colour Space
❑ The Human Visual System (HVS) is sensitive to three colour
components.
❑ Colour can be represented by Red, Green and Blue components (RGB).
❑ Transform to YUV or YCbCr with less correlated representation:.

◼ Note:
❑ The two chrominance components (U,V) contain considerably less
information than the luminance component. For this reason,
chrominance is often sub-sampled

ELE-4410: Multimedia Systems & Networks 18


Pixel Representation

ELE-4410: Multimedia Systems & Networks 19


Chrominance Sub-sampling
◼ Human vision is relatively insensitive to
chrominance.
❑ For this reason, chrominance is often sub-sampled.
◼ Chrominance sub-sampling is specified as a
three-element ratio.
◼ Depending upon the number of chrominance
samples for every four luminance samples,
different colour formats such as 4:4:4, 4:2:2, 4:1:1
and 4:2:0 are defined.
◼ 4:1:1 and 4:2:0 have same number of luminance
and chrominance samples.

ELE-4410: Multimedia Systems & Networks 20


Chrominance Sub-sampling

ELE-4410: Multimedia Systems & Networks 21


◼ Frame is divided in 16x16 non-overlapping blocks,
consisting of luminance and chrominance (YUV or YCbCr)
components.
◼ The luminance and chrominance components share the
same image boundary in the image.
◼ Two chrominance samples will always occur in pairs.

A Macro-block in Systems
ELE-4410: Multimedia 4:2:0 format
& Networks 22
Digital Video Formats
◼ Common Intermediate Format (CIF):
❑ This format was defined by CCITT for H.261 coding standard
(teleconferencing and videophone).
❑ Several size formats:
◼ SQCIF: 88x72 pixels. QCIF: 176x144 pixels.
◼ CIF: 352x288 pixels. 4CIF: 704x576 pixels.
◼ Non-interlaced (progressive), and chrominance sub-sampling using
4:2:0.
◼ Source Input Format (SIF):
❑ Utilized in MPEG as a compromise with Rec. 601.
❑ Two size formats (similar to CIF):
◼ QSIF: 180x120 or 176x144 pixels at 30 or 25 fps
◼ SIF: 360x240 or 352x288 pixels at 30 or 25 fps
❑ It is assumed that SIF is derived from a Rec.601.
◼ High Definition Television (HDTV):
❑ 1080x720 pixels.
❑ 1920x1080 pixels.

ELE-4410: Multimedia Systems & Networks 23


Digital Video Formats

ELE-4410: Multimedia Systems & Networks 24


Digital Video Formats

ELE-4410: Multimedia Systems & Networks 25


Different type frames in video coding
◼ I- frames (Intra-coded frames)
❑ No prediction, coded in the same way as an image.
◼ P-frames (Inter-coded or predictive frames)
❑ Uses backward prediction.
❑ Current frame is predicted from previously coded I or P-frame.
❑ Does not work well for uncovered regions by object motion.
◼ B-frames (Bi-directional Predicted frames)
❑ Uses forward and backward prediction.
❑ Not used for predicting any other frame.
❑ Can handle better covered/uncovered regions
❑ These predicted from previous as well as next I- or P-frames.

GroupELE-4410:
of Frames (GOF)
Multimedia Systems & Networks 26
Motion Estimation &
Compensation

ELE-4410: Multimedia Systems & Networks 27


Types of Motion
Frame n

◼ Translation: simple
movement of typically
rigid objects Frame n+1 Frame n+2
(Rotation) (Zoom)
◼ Camera pans vs.
movement of objects Rotation: spinning about
an axis
❑ Camera versus object
rotation
Zooms –in/out
❑ Camera zoom vs. object
Frame n Frame n+1
zoom (movement in/out)

ELE-4410: Multimedia Systems & Networks 28


Describing Motion
◼ Translational
❑ Move (object) from (x,y) to (x+dx,y+dy)
◼ Rotational
❑ Rotate (object) by (r rads) (counter/clockwise)
◼ Zoom
❑ Move (in/out) from (object) to increase its size by
(t times)

Which is easiest? Which are we most likely to


encounter?

ELE-4410: Multimedia Systems & Networks 29


Motion Estimation
◼ Determining parameters for the motion
descriptions
◼ For some portion of the frame, estimate its
movement between 2 frames- the current
frame and the reference frame
◼ What is some portion?
❑ Individual pixels (all of them)?
❑ Lines/edges (have to find them first)
❑ Objects (must define them)
❑ Uniform regions (just chop up the frame)

ELE-4410: Multimedia Systems & Networks 30


General Idea
◼ For a region PC in the current frame, find a
region PR in the search window in reference
frame so that Error(PR,PC) is minimized
◼ Issues: Error measures, search techniques,
choice of search window, choice of reference
frame, choice of region PC

Current
Portion
Reference of
Search Frame
Frame interest
window
PC

ELE-4410: Multimedia Systems & Networks 31


Block-based Motion Estimation
◼ PC is a block of pixels (in the current frame)
◼ The search window is a rectangular segment
(in the reference frame)

T=1 (reference) T=2 (current)


ELE-4410: Multimedia Systems & Networks 32
Motion Vectors
◼ A motion vector (MV) describes the offset between
the location of the block being coded (in the current
frame) and the location of the best-match block in
the reference frame

T=1 (reference) T=2 (current)


ELE-4410: Multimedia Systems & Networks 33
Motion Compensation

The blocks being predicted are on a grid


1 3 1 2 3 4
2 4
5 6 7 8
6 7 8
5
9 10 11 12
10
9 12
11
14 13 14 15 16

13 15 16

The blocks used for prediction are NOT


ELE-4410: Multimedia Systems & Networks 34
Motion Vector Search

◼ 1. Mean squared error ◼ Given error measure,


❑ Select a block in the how to efficiently
reference frame to determine best-match
minimize block in search window?
Σ(b(Bref)-b(Bcurr))2 ❑ Full search: best results,
◼ 2. Mean abs. error most computation
❑ Select block to ❑ Logarithmic search –
minimize heuristic, faster
Σ|b(Bref)-b(Bcurr)| ❑ Hierarchical motion
estimation

ELE-4410: Multimedia Systems & Networks 35


Block-Matching Algorithm: Matching
Criterion
◼ For an N x N block B, MSE:

◼ MAD:

◼ The very idea of using a block of pixels and assuming a


common displacement for them in the matching process
corresponds to a local smoothness (coherence) constraint on
the displacement vector field.Systems & Networks
ELE-4410: Multimedia 36
Motion Vector Search
Logarithmic Search: First examine positions
◼ Full search: Evaluate marked 1.
every position in the Choose best of these (lowest error
measure) and examine positions
search window marked 2 surrounding it
Choose the best of these, and examine the
positions marked 3
Final result = best of these

ELE-4410: Multimedia Systems & Networks 37


Hierarchical Motion Estimation
◼ Use an averaging filter on the image, then
downsample by a factor of 2
◼ Conduct a search on the downsampled
image (only ¼ of the size)
◼ Given the results of the search on the
downsampled image, return to the full
resolution image and refine the search there

ELE-4410: Multimedia Systems & Networks 38


Motion Compensation
◼ The standards do not specify HOW the
encoder will find the motion vectors (MVs)
◼ The encoder can use exhaustive/fast search,
MSE /MAE/other error metric, etc.
◼ The standard DOES specify
❑ The allowable syntax for specifying the MVs
❑ What the decoder will do with them
◼ What the decoder does is to grab the
indicated block from reference frame, and
glue it in place

ELE-4410: Multimedia Systems & Networks 39


Motion Compensation Example

Frame n-1 Frame n

(0,0) (-16,0) (5,0) (0,0)

(0,0) (16,7) (5,2) (0,0)

(20,-24) (0,0) (-20,-18) (0,0) MOTION COMPENSATED


Frame n

ELE-4410: Multimedia Systems & Networks 40


Objects versus Macroblocks
◼ Real moving objects will not coincide with
boundaries of macroblocks

background Prediction error


Background well
encoded (no
moving object motion vector)
Moving object well encoded Prediction error
with motion vector

◼ If encoder sends MV=(MVX,MVY), object well coded,


but background poorly coded
◼ If encoder sends MV=(0,0), background well coded,
but moving object poorly coded
◼ Either approach is valid
ELE-4410: Multimedia Systems & Networks 41
Motion Compensation
◼ This glued together frame is called
the motion compensated frame
◼ The encoder can also form the difference between
the motion compensated frame and the actual
frame.
◼ This is called the motion compensated difference
frame
◼ This difference frame formed using MC should have
less correlation between pixels than the difference
frame formed without using MC

ELE-4410: Multimedia Systems & Networks 42


Motion Compensated Difference
Frames
◼ Suppose we are doing lossless coding
◼ Encoder has sequence of frames: …, F(n-2), F(n-1)
◼ Next: encode F(n)
◼ Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
◼ Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)
❑ Encoder knows M(n), Decoder knows M(n)

ELE-4410: Multimedia Systems & Networks 43


Motion Compensated Prediction: An
example

ELE-4410: Multimedia Systems & Networks 44


Encoding Difference Frames
◼ Encoder forms motion ◼ With no motion compensation
compensated diff frame: encoder could do frame diff:
MCD(n) = F(n) – M(n) FD(n) = F(n) – F(n-1)
◼ Encoder losslessly ◼ Encoder losslessly
encodes MCD(n) encodes FD(n)
◼ Decoder can then do ◼ Decoder can then do
F(n) = MCD(n) + M(n) F(n) = FD(n) + F(n-1)
→ knows F(n) exactly → knows F(n) exactly

If successive frames are very similar:


fewer bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer bits to send FD(n) instead of F(n)

ELE-4410: Multimedia Systems & Networks 45


Motion compensated difference frames

◼ Decoder knows F(n-1) and, once you send the


motion vectors, it knows M(n)
Send FD(n)

Reference Frame Original Frame Difference Image


F(n-1) F(n) FD(n)=F(n)-F(n-1)

Send Motion
Vectors Send MCD(n)
Motion compensated Motion compensated
frame M(n) difference image
MCD(n) =F(n) – M(n)

ELE-4410: Multimedia Systems & Networks 46


Motion Compensated Difference
Frames
◼ But we are NOT doing lossless coding
◼ Encoder has sequence of frames: …, F(n-2), F(n-1)
◼ Next: encode F(n)
◼ Past frames have been lossy encoded, so the
decoder has versions …, G(n-2), G(n-1)
◼ Encoder knows …, G(n-2), G(n-1) also
◼ Encoder sends the motion vectors for frame F(n)
relative to frame G(n-1), to form motion
compensated frame M(n)
ELE-4410: Multimedia Systems & Networks 47
Video Coding Standards
(ITU-T and MPEG
standards)

ELE-4410: Multimedia Systems & Networks 48


Major Applications of Video Compression

ELE-4410: Multimedia Systems & Networks 49


Video Coding
Standardization Organizations
◼ Two organizations dominate video compression
standardization:
❑ ITU-T Video Coding Experts Group (VCEG)
International Telecommunications Union – Telecommunications
Standardization Sector (ITU-T, a United Nations Organization,
formerly CCITT), Study Group 16, Question 6
❑ ISO/IEC Moving Picture Experts Group (MPEG)
International Standardization Organization and International
Electrotechnical Commission, Joint Technical Committee Number
1, Subcommittee 29, Working Group 11

ELE-4410: Multimedia Systems & Networks 50


Dynamics of the Video
Standardization Process
◼ VCEG is older and more focused on conventional (esp. low-
delay) video coding goals (e.g. good compression and
packet-loss/error resilience)
◼ MPEG is larger and takes on more ambitious goals (e.g.
“object oriented video”, “synthetic-natural hybrid coding”, and
digital cinema)
◼ Sometimes the major organizations team up (e.g. ISO, IEC
and ITU teamed up for both MPEG-2 and JPEG)
◼ Relatively little industry consortium activity (DV and
organizations that tweak the video coding standards in minor
ways, such as DVD, 3GPP, 3GPP2, SMPTE, IETF, etc.)
◼ Growing activity for internet streaming media outside of
formal standardization (e.g., Microsoft, Real Networks,
Quicktime) ELE-4410: Multimedia Systems & Networks 51
The Scope of Video Coding Standardization

◼ Only restrictions on the Bitstream, Syntax, and


Decoder are standardized:
❑ Permits the optimization of encoding
❑ Permits complexity reduction for implementability
❑ Provides no guarantees on quality

ELE-4410: Multimedia Systems & Networks 52


Standard specifies bit stream
◼ The video compression ◼ This allows future encoders of
standards define syntax and better performance to remain
semantics for the bit stream compatible with existing
between encoder and decoder decoders.
bit stream
◼ Also allows for commercially
ENCODER DECODER secret encoders to be
compatible with standard
decoders
not this Standard defines not this
this Today’s Ho-Hum Today’s
Encoder Decoder
◼ Encoder is not specified by
MPEG except that it produces Tomorrow’s Nifty
a compliant bit stream Encoder
◼ Compliant decoder must Today’s decoder
interpret all legal MPEG bit Very secret still works!
streams Encoder

ELE-4410: Multimedia Systems & Networks 53


Target Applications
◼ Standards
❑ MPEG-1: Video CD
❑ MPEG-2: Digital TV
❑ MPEG-4: Multimedia
❑ H.261: ISDN videophone
❑ H.263: PSTN videophone
❑ H.264 / MPEG-4 part 10: Universal video

ELE-4410: Multimedia Systems & Networks 54


Motion Compensated Hybrid Coding
H.261, MPEG-1, MPEG-2, H.263, MPEG-4, H.264/JVT

ELE-4410: Multimedia Systems & Networks 55


Requirement of a successful Video
Coding Standards
◼ Interoperability: should assure that encoders and decoders
from different manufacturers work together seamlessly.
◼ Innovation: should perform significantly better than previous
standard.
◼ Competition: should be flexible enough to allow competition
between manufacturers based on technical merit. Only
standardize bit-stream syntax and reference decoder.
◼ Independence from transmission and storage media:
should be flexible enough to be used for a range of
applications.
◼ Forward compatibility: should decode bit-streams from prior
standard
◼ Backward compatibility: prior generation decoders should
be able to partially decode new bit-streams
ELE-4410: Multimedia Systems & Networks 56
ITU-T H.261 Video Coding Standards
◼ International standard for ISDN picture phones and for video
conferencing/video phone systems with low delay (for real-time,
interactive applications) and with slow motion (1990).
◼ Image format: CIF (352 x 288 Y samples, above 128kbps) or QCIF
(176*144 Y samples, 64-128kbps), frame rate: 7.5,10,15 and 30 fps
◼ Bit-rate: multiple of 64 kbps (= ISDN-channel), px64 kbps,
p=1,…,30, typically 128 kbps including audio.
◼ Picture quality: for 128 kbps acceptable with limited motion in the
scene.
◼ Stand-alone videoconferencing system or desk-top video
conferencing system, integrated with PC.
◼ Macroblock Structure:
❑ Macroblock (MB) of 16x16 pixels
❑ Sampling format: 4:2:0 color format
❑ Progressive scanning
❑ MB consists of 4 luminance and
2 chrominance blocks

ELE-4410: Multimedia Systems & Networks 57


H.261 Video Coding Standards
◼ Motion Compensated prediction
❑ Each MB can be coded in intra- and inter-mode.
❑ Integer-pel accuracy for inter mode.
❑ One displacement vector per macroblock
❑ Maximum displacement vector range +/- 16 horizontally and vertically.
❑ Methods for generating the MVs are not specified in the standard
◼ Standards only define the bitstream syntax, or the decoder operation)
❑ Differential encoding of motion vectors (DMV).
❑ Encoder and decoder uses the decoded MVs to perform motion
compensation
❑ Adaptive loop filter, separable in 1-D horizontal and vertical is used to
suppress propagation of coding noise temporally.
◼ impulse response of separable filter : [¼, ½, ¼]
◼ Loop filter can be turned on or off.

ELE-4410: Multimedia Systems & Networks 58


H.261 Encoder

ELE-4410: Multimedia Systems & Networks 59


H.261 Standards
◼ Residual Coding:
❑ 8x8 DCT
❑ Quantization
◼ Uniform quantizer (∆=8) for intra-mode DC coefficients
◼ Uniform threshold quantizer with dead-zone (∆=2,4,…,62 (MQUANT)) for
AC coefficients in intra-mode and all coefficients in inter-mode.
❑ Zig-zag scan of DCT coefficients.
❑ Run-level coding for entropy coding : (zero-run, value) symbols.
◼ zero-run: the number of coefficients quantized to zero since the last nonzero
coefficient.
◼ value: the amplitude of the current nonzero coefficient.
◼ Variable Length Coding (VLC)
❑ DCT coefficients are converted into runlength representations and then
coded using VLC (Huffman coding for each pair of symbols).
❑ Other information are also coded using VLC (Huffman coding).

ELE-4410: Multimedia Systems & Networks 60


Parameter Selection and Rate control
in H.261
◼ MTYPE (intra vs. inter, zero vs. non-zero MV in inter)
◼ CBP (which blocks in a MB have non-zero DCT
coefficients)
◼ MQUANT (allow the changes of the quantizer step size at
the MB level)
❑ should be varied to satisfy the rate constraint.
◼ MV (ideally should be determined not only by prediction
error but also the total bits used for coding MV and DCT
coefficients of prediction error)
◼ Loop Filter on/off

ELE-4410: Multimedia Systems & Networks 61


H.261 Data Stream
◼ Picture Layer
❑ Picture Start Code (PSC) - 20 bit pattern
❑ Temporal Reference (TR) - 5 bit input frame number
❑ Type Information (PTYPE) - CIF or QCIF selection
❑ Spare bits to be defined in later versions
◼ GOB Layer
❑ Group of Blocks Start Code (GBSC) - 16 bit pattern
❑ Group Number (GC) - 4 bit GOB address
❑ Quantizer information (GQUANT) – Initial quantizer step size normalized to the
range 1 to 31.
◼ At the start QUANT=GQUANT
❑ Spare bits to be defined in later versions
◼ Macroblock (MB) layer
❑ Macroblock address (MBA)
◼ Location of this MB relative to the previously encoded MB inside the GOB.
❑ Type information (MTYPE) - 10 types in total
❑ Quantizer (MQUANT): normalized quantizer step size to be used until the next
MQUANT or GQUANT. (Range 1 to 31)
ELE-4410: Multimedia Systems & Networks 62
H.261Data Stream
◼ Macroblock (MB) layer
❑ Motion Vector Data (MVD)
◼ differential displacement vector
❑ Coded Block Pattern (CBP)
◼ Indicates which blocks in the MB are coded.
◼ Blocks not coded contain zero coefficients.
◼ Block Layer
❑ Lowest layer is the block layer, consisting of
◼ quantized transform coefficients (TCOEFF),
◼ End of block (EOB) symbol
❑ All coded blocks have the EOB symbol.
◼ Types of Coded MB:
❑ Intra - Original Pels are transform Coded
❑ Inter - Frame difference pels (zero-motion vectors) are coded.
◼ Skipped MBs are considered inter by default.
❑ Inter_MC - displaced (nonzero-motion vectors)
❑ Inter_MC with filter - displaced blocks are filtered by loop filter.
◼ – Used for very low bit rates.
ELE-4410: Multimedia Systems & Networks 63
H.261 Macblock Type (VLC table)

ELE-4410: Multimedia Systems & Networks 64


H.263Video Coding Standard
◼ International standard for picture phones over analog
subscriber lines (1995)
◼ H.263 is the video coding standard in H.323/H.324,
targeted for visual telephone over PSTN or Internet.
◼ Developed after H.261, can accommodate
computationally more intensive options
❑ Initial version (H.263 baseline): 1995
❑ H.263+: 1997
❑ H.263++: 2000
◼ Image format: usually CIF, QCIF or Sub-QCIF.
◼ frame rate: usually below 10 fps.

ELE-4410: Multimedia Systems & Networks 65


H.263Video Coding Standard
◼ Goal: Improved quality at lower rates
❑ Bit rate: arbitrary, typically 20 kbps for PSTN
❑ Picture Quality: with new options as good as H.261 (at only half
rate).
◼ Result: Significantly better quality at lower rates
❑ Better video at 18-24 Kbps than H.261 at 64 Kbps
❑ Enable video phone over regular phone lines (28.8 Kbps) or
wireless modem
◼ Software-only PC video phone
or TV set-top box.
◼ Widely used as compression engine
for Internet video streaming.
◼ H.263 is also the compression core
of the MPEG-4 standard.

ELE-4410: Multimedia Systems & Networks 66


H.261 Vs H.263
◼ Improved motion estimation and compensation
❑ H.261 (1990): integer-pel accuracy, loop filter, 1 motion vector
per MB
❑ H.263 (1995): half-pel accuracy, no loop filter, 1 motion vector per
MB.
❑ half-pel accuracy motion estimation with bilinear interpolation
filter.
❑ Larger motion search range [-31.5,31], and unrestricted MV at
boundary blocks.
❑ More efficient predictive coding for MVs (median prediction using
three neighbors).
❑ overlapping block motion compensation (option).
❑ variable block size: 16x16 -> 8x8, 4 MVs per MB (option).
❑ use bidirectional temporal prediction (PB picture) (option).
◼ Improved 3-D VLC for DCT coefficients (last, run, level).

ELE-4410: Multimedia Systems & Networks 67


H.261 Vs H.263
◼ Reduced overhead.
◼ Support more picture formats.
◼ Optional features defined in annexes
❑ Unrestricted motion vectors (Annex D)
❑ Syntax-based arithmetic coding (SAC) (Annex E)
◼ 4% savings in bit rate for P-mode, 10% saving for I-mode, at 50%
more computations.
❑ Advanced prediction mode(APM) (Annex F)
◼ Overlapped block motion compensation (OBMC),
◼ Switch between 1 or 4 motion vectors per MB
◼ PB pictures (Annex G).
◼ Additional optional features in H.263++. (H.263 as of 2001).
◼ The options, when chosen properly, can improve the PSNR
0.5-1.5 dB over default at 20-70 kbps range.

ELE-4410: Multimedia Systems & Networks 68


Performance of H.263 and H.261

ELE-4410: Multimedia Systems & Networks 69


Performance of H.263 and H.261

ELE-4410: Multimedia Systems & Networks 70


Overlapped Block Motion
Compensation (OBMC) in H.263
◼ Conventional block motion compensation
❑ One best matching block is found from a reference frame
❑ The current block is replaced by the best matching block
◼ OBMC
❑ Each pixel in the current block is predicted by a weighted average of
several corresponding pixels in the reference frame
❑ The corresponding pixels are determined by the MVs of the current as
well as adjacent MBs
❑ The weights for each corresponding pixel depends on the expected
accuracy of the associated MV

ELE-4410: Multimedia Systems & Networks 71


Overlapped Block Motion
Compensation (OBMC) in H.263
◼ Idea: superimpose
several prediction
signals, using the
motion vectors from
neighboring blocks
also.

ELE-4410: Multimedia Systems & Networks 72


OBMC weights in H.263

ELE-4410: Multimedia Systems & Networks 73


OBMC weights in H.263

ELE-4410: Multimedia Systems & Networks 74


Performance of H.263 with OBMC

ELE-4410: Multimedia Systems & Networks 75


H.263 : PB Pictures (Annex-G)

◼ PB-picture mode codes two pictures as a group. The


second picture (P) is coded first, then the first picture (B) is
coded using both the P-picture and the previously coded
picture. This is to avoid the reordering of pictures required
in the normal B-mode. But it still requires additional coding
delay than P-frames only.
◼ In a B-block, forward prediction (predicted from the
previous frame) can be used for all pixels; backward
prediction (from the future frame) is only used for those
pels that the backward motion vector aligns with pels of the
current MB. Pixels in the “white area” use only forward
prediction.
◼ An improved PB-frame mode was defined in H.263+, that
removes the previous restriction.
ELE-4410: Multimedia Systems & Networks 76
H.263 : PB Pictures

ELE-4410: Multimedia Systems & Networks 77


Performance of H.263 with PB Mode

ELE-4410: Multimedia Systems & Networks 78


Video Coding Standards
(MPEG-1/2 and MPEG-4)

ELE-4410: Multimedia Systems & Networks 79


Contents
◼ ISO Standards
❑ MPEG-1
❑ MPEG-2
❑ MPEG-4
❑ MPEG-7 (overview only)

ELE-4410: Multimedia Systems & Networks 80


ISO MPEG
◼ MPEG-1 Standard (1991) (ISO/IEC 11172)
❑ Target bit-rate about 1.5 Mbps.
❑ Typical image format CIF, no interlace.
❑ Frame rate 24 ... 30 fps.
❑ Main application: video storage for multimedia (e.g., on CD-ROM).
◼ MPEG-2 Standard (1994) (ISO/IEC 13818)
❑ Extension for interlace, optimized for TV resolution (NTSC: 704 x
480 Pixel).
❑ Image quality similar to NTSC, PAL, SECAM at 4 -8 Mbps
❑ HDTV at 20 Mbps.
◼ MPEG-4 Standard (1999) (ISO/IEC 14496)
❑ Object based coding.
❑ Wide-range of applications, with choices of interactivity, scalability,
error resilience, etc.

ELE-4410: Multimedia Systems & Networks 81


MPEG-1 Overview
◼ Audio/video on CD-ROM (1.5 Mbps, CIF: 352x240).
❑ Maximum: 1.856 Mbps, 768x576 pels.
◼ Start late 1988, test in 10/89, Committee Draft 9/90
◼ ISO/IEC 11172-1~5 (Systems, video, audio, compliance,
software).
◼ Prompted explosion of digital video applications: MPEG1
video CD and downloadable video over Internet.
◼ Software only decoding, made possible by the introduction
of Pentium chips, key to the success in the commercial
market.
◼ MPEG-1 Audio
❑ Offers 3 coding options (3 layers), higher layer have higher coding
efficiency with more computations
❑ MP3 = MPEG1 layer 3 audio.
ELE-4410: Multimedia Systems & Networks 82
MPEG-1 Vs H.261
◼ Developed at about the same time.
◼ Must enable random access (Fast forward/rewind).
❑ Using GOP structure with periodic I-picture and P-picture.
◼ Not for interactive applications.
❑ Do not have as stringent delay requirement.
◼ Fixed rate (1.5 Mbps), good quality (VHS equivalent).
❑ SIF video format (similar to CIF).
◼ CIF: 352x288, SIF: 352x240.
❑ Using more advanced motion compensation.
◼ Half-pel accuracy motion estimation, range up to +/- 64.
❑ Using bi-directional temporal prediction.
◼ Important for handling uncovered regions.
❑ Using perceptual-based quantization matrix for I-blocks (same as
JPEG).
◼ DC coefficients coded predictively.
ELE-4410: Multimedia Systems & Networks 83
MPEG-1/2 GOP Structure
◼ "Group of Pictures" = “GOP“, GOP structure is very flexible

1 4 2 3 8 5 6 7
ELE-4410: Multimedia Systems & Networks 84
Coding, Decoding and Display Order

ELE-4410: Multimedia Systems & Networks 85


MPEG-1 Encoder

ELE-4410: Multimedia Systems & Networks 86


MPEG-1: Coding of I-frames
◼ I-pictures: intra-frame coded
◼ 8x8 DCT
◼ Arbitrary weighting matrix for coefficients
◼ Differential coding of DC-coefficients
◼ Uniform quantization
◼ Zig-zag-scan, run-level-coding
◼ Entropy coding
◼ Unfortunately, not quite JPEG

ELE-4410: Multimedia Systems & Networks 87


MPEG-1: Coding of P-pictures
◼ Motion-compensated prediction from an
encoded I-picture or P-picture (DPCM)
◼ Half-pixel accuracy of motion compensation,
bilinear interpolation
◼ One displacement vector per macroblock
◼ Differential coding of displacement vectors
◼ Coding of prediction error with 8x8-DCT, uniform
threshold quantization, zig-zag-scan as in I-
pictures

ELE-4410: Multimedia Systems & Networks 88


MPEG-1: Coding of B-pictures
◼ Motion-compensated prediction from two consecutive P-
or I-pictures.
◼ either
❑ only forward prediction (1 vector/macroblock).
◼ or
❑ only backward prediction (1 vector/macroblock).
◼ or
❑ Average of forward and backward prediction = interpolation (2
vectors/macroblock).
◼ Half-pelaccuracy of motion compensation, bilinear
interpolation.
◼ Coding of prediction error with 8x8-DCT, uniform
quantization, zig-zag scan as in I-pictures.

ELE-4410: Multimedia Systems & Networks 89


MPEG-2 Overview
◼ Audio/Video broadcast (TV, HDTV, Terrestrial, Cable,
Satellite, High Speed Inter/Intranet) as well as DVD video.
◼ 4~8 Mbps for TV quality, 10-15 for better quality at SDTV
resolutions (BT.601).
◼ 18-45 Mbps for HDTV applications.
❑ MPEG-2 video high profile at high level is the video coding
standard used in HDTV.
◼ Test in 11/91, Committee Draft 11/93.
◼ ISO/IEC 13818-1~6 (Systems, video, audio, compliance,
software, DSM-CC).
◼ Consist of various profiles and levels.
◼ Backward compatible with MPEG1.
◼ MPEG-2 Audio
❑ Support 5.1 channel
❑ MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3.

ELE-4410: Multimedia Systems & Networks 90


MPEG-2 Vs. MPEG-1 Video
◼ MPEG1 only handles progressive sequences (SIF).
◼ MPEG2 is targeted primarily at interlaced sequences
and at higher resolution (BT.601 = 4CIF).
◼ More sophisticated motion estimation methods
(frame/field prediction mode) are developed to improve
estimation accuracy for interlaced sequences (Motion
compensation with blocks of size 16x8 pels).
◼ Different DCT modes and scanning methods are
developed for interlaced sequences.
◼ MPEG2 has various scalability modes.
◼ MPEG2 has various profiles and levels, each
combination targeted for different application.
◼ Improved coding efficiency by different quantization, VLC
tables.
ELE-4410: Multimedia Systems & Networks 91
MPEG-1 Bandwidth Requirements
Examples: A digitized video is to be compressed using MPEG-1 standard
assuming frame sequence of : I B B P B B P B B P B B I…And average
compression ratio of 10:1 (I) , 20: 1 (P), 50:1 (B). Derive the average bit
rate that is generated by the encoder for both NTSC and PAL digitization
format Note: The Frame size 352 x 240 NTSC; 352 x 288 PAL
and each pixel represented by 8 bits

Solution: Frame sequence= I B B P B B P B B P B B I ………..


Hence : 1/12 of frames are I –frames , 3/12 are P- frames , and 8/12 are B-
frames
Average compression ratio=(1x0.1+3x0.05+ 8x0.02)/12=0.0342 or 29.24:1
NTSC frame size :
Without Compression =(352x240x8)+2(176x120x8)=1.01376Mbits/frame
with compression=1.01376Mbits/frame x 1/29.24=34.67 kbits/frame
Hence bit rate generated at 30 fps =1.040 Mbps

ELE-4410: Multimedia Systems & Networks 92


PAL frame size
Without compression =352 x 288 x 8+2 (176 x 144 x 8)=1.216512
Mbits/frame
with compression =1.216512 x 1/29.24 =41.604 Kbits / frame
Hence bit-rate generated at 25 fps=1.040 Mbps

ELE-4410: Multimedia Systems & Networks 93


MPEG-2 Vs. MPEG-1 Video
◼ MPEG-2 is intended for higher data rates than MPEG-1.
◼ MPEG-2 allows for higher quality source material by supporting 4:2:2
(chroma channels sub-sampled in the horizontal dimension only), and
4:4:4 (no sub-sampling of chroma) formats, in addition to 4:2:0 (Chroma
channels sub-sampled by 2 in both directions.).
◼ MPEG-2 refers to intended display rate, MPEG-1 refers to coded frame
rate.
◼ Group of Pictures layer does not exist in MPEG-2
❑ It is an optional header useful for establishing a SMPTE time code base or
indicating that certain B pictures at the beginning of an edited sequence
comprise a broken link.
❑ In MPEG-1 Group of pictures is mandatory.
◼ Picture Layer
❑ In MPEG-2, a frame may be coded progressively or interlaced
❑ Interlaced frames may then be coded as either a frame picture or as two
separately coded field pictures.
◼ » Progressive frames are logical for video that originates from film
ELE-4410: Multimedia Systems & Networks 94
◼ » Interlace is logical for video cameras.
MPEG-2 Vs. MPEG-1 Video
◼ Repeat_first_field is new to MPEG-2 to signal a field or
frame that is repeated for purposes of frame rate
conversion
❑ This method has been used to change the 24frame/sec movies to 30
frames a second video.
◼ Changes in motion estimation:
❑ Motion vectors are now always represented along half-sample grid
❑ Increased flexibility in coding motion vectors
◼ » can change from +/- 16 pixels to +/- 64 pixels without large increase in
overhead.
❑ Restricted vertical motion vector range
◼ » Motion is more prominent across the screen than up or down.
◼ Prediction modes now include field, frame, Dual Prime and
16x8 MC
◼ Combinations for ELE-4410:
MainMultimedia
Profile and
Systems Simple Profile:
& Networks 95
Frame Vs. Field Picture

ELE-4410: Multimedia Systems & Networks 96


Motion Compensation for Interlaced
Video
◼ Field prediction for field pictures
◼ Field prediction for frame pictures
◼ Dual prime for P-pictures
◼ 16x8 MC for field pictures

◼ Field Prediction for Field Pictures:


❑ Each field is predicted individually from the reference
fields
◼ A P-field is predicted from one previous field.
◼ A B-field is predicted from two fields chosen from two reference
pictures.
ELE-4410: Multimedia Systems & Networks 97
Field Prediction for Field Pictures

ELE-4410: Multimedia Systems & Networks 98


Field Prediction for Frame Pictures

◼ The MB to be predicted is split into top field pels and bottom


field pels.
◼ Each 16x8 field block is predicted separately with its own
motion vector (P frame) or two motion vectors (B-frame)

ELE-4410: Multimedia Systems & Networks 99


◼ Dual Prime
for P-
Pictures

ELE-4410: Multimedia Systems & Networks 100


DCT Modes in MPEG-2
◼ Two types of DCT and two types of scan pattern:
❑ Frame DCT: divides an MB into 4 blocks for Lum, as usual

❑ Field DCT: reorder pixels in an MB into top and bottom fields.

◼ Zig-zag scan as known from H.261/263 and MPEG-1 is augmented


by alternate scan in MEG-2, in order to coder interlaced blocks that
have more correlation in horizontal than in the vertical direction.

ELE-4410: Multimedia Systems & Networks 101


MPEG-2 Levels
◼ High ◼ Main
❑ 1920 samples/line ◼ 720 samples/line
❑ 1152 lines per frame ◼ 576 lines per frame
❑ 60 frames/sec ◼ 30 frames/sec
❑ 80 Mbits/s ◼ 15 Mbits/s

◼ Low
◼ High 1440
◼ 352 samples/line
❑ 1440 samples/line
◼ 288 lines per frame
❑ 1152 lines per frame
◼ 30 frames/sec
❑ 60 frames/sec
◼ 4 Mbits/s
❑ 60 Mbits/s

ELE-4410: Multimedia Systems & Networks 102


MPEG-2 Algorithms and Profile
◼ MAIN – Non-scalable coding algorithm supporting
functionality for:
❑ Coding interlaced video
❑ Random access
❑ B-picture prediction modes
❑ 4:2:0 YUV representation
◼ Non scalable MPEG-2
❑ Introduces Field and Frame Pictures
◼ Interlace fields and frames are coded separately.
◼ Separate Prediction.
❑ New Motion Compensation modes to explore temporal
redundancy between fields
◼ Dual Prime prediction.
◼ 16x8 block motion compensation.

ELE-4410: Multimedia Systems & Networks 103


MPEG-2 Algorithms and Profile
◼ SIMPLE - Includes all functionality provided by the MAIN profile but:
❑ Does not support B-picture prediction modes.
❑ 4:2:0 YUV representation.
◼ SNR Scalable - Supports all functionality provided by the MAIN
Profile plus an algorithm for:
❑ SNR Scalable Coding (2 Layers Allowed)
◼ » Support receivers with different display capability.
◼ » Based on classical Pyramidal approach for progressive image coding.
❑ 4:2:2 YUV representation.
◼ SPATIAL Scalable - Supports all functionality provided by the SNR
Scalable Profile plus an algorithm for :
❑ Spatial Scalable Coding (2 Layers Allowed)
◼ » Provide interoperability between different services.
◼ » Support receivers with different display capability.
❑ 4:2:2 YUV representation.

ELE-4410: Multimedia Systems & Networks 104


MPEG-2 Algorithms and Profile

◼ HIGH - Supports all functionality provided by the SPATIAL


Scalable Profile plus an algorithm for :
❑ 3 layers with the SNR and Spatial Scalable coding modes.
❑ 4:2:2 YUV representation for improved quality requirements.

ELE-4410: Multimedia Systems & Networks 105


MPEG-2 Scalability
◼ Data partition
❑ All headers, MVs, first few DCT coefficients in the base layer
❑ Can be implemented at the bit stream level
❑ Simple
◼ SNR scalability
❑ Base layer includes coarsely quantized DCT coefficients
❑ Enhancement layer further quantizes the base layer quantization error
❑ Relatively simple
◼ Spatial scalability
❑ Complex
◼ Temporal scalability
❑ Simple
◼ Drift problem:
❑ If the encoder’s base layer information for a current frame depends on the
enhancement layer information for a previous frame
❑ Exist in the data partition and SNR scalability modes

ELE-4410: Multimedia Systems & Networks 106


SNR Scalable Encoder

ELE-4410: Multimedia Systems & Networks 107


Spatial Scalable Codec

ELE-4410: Multimedia Systems & Networks 108


Temporal Scalability
(Option 1)
◼ In this options of temporal scalability, only base layer is
used to predict images in enhancement layer.
◼ Obviously, the error in enhancement layer do not
propagate with time.

ELE-4410: Multimedia Systems & Networks 109


Temporal Scalability
(Option 2)
◼ In this options of temporal scalability, the enhancement
layer may use the base layer and enhancement layer for
the prediction.
◼ It is used for coding of stereoscopic video.

ELE-4410: Multimedia Systems & Networks 110


Profiles and levels in MPEG-2

ELE-4410: Multimedia Systems & Networks 111


MPEG-4 Overview
◼ Support highly interactive multimedia applications as well as
traditional applications
◼ Advanced functionalities: interactivity, scalability, error resilience…
◼ Coding of natural and synthetic audio and video, as well as
graphics.
◼ Enable the multiplexing of audiovisual objects and composition in a
scene.

Applications:
➢Video on LANs
➢Internet video
➢Wireless video
➢Video database
➢Interactive home shopping
➢Video e-mail, home movies
➢Virtual reality games, flight
ELE-4410: Multimediasimulation,
Systems & Networks multi-viewpoint training
112
MPEG-4: Scene with Audio Visual Objects

ELE-4410: Multimedia Systems & Networks 113


MPEG-4: Video Coding
◼ Basic video coding
❑ Definition of Video Object (VO), Video Object Layer (VOL), Video
Object Plane (VOP)
❑ Improved coding efficiency vs. MPEG-1/2
◼ Based on H.263 baseline
❑ 3D VLC.
❑ Four MVs and Unrestricted MVs.
❑ OBMC not required.
◼ Global motion compensation
❑ Using 8-parameter projective mapping.
❑ Effective for sequences with large global motion.
◼ Sprites
❑ Code a large background in the beginning of the sequence, plus affine
mappings, which map parts of the background to the displayed scene at
different time instances.
❑ Decoder can vary the mapping to zoom in/out, pan left/right

◼ Quarter pixel motion compensation.


◼ DC and AC prediction: can predict DC and part of AC from either the
previous and block above.
ELE-4410: Multimedia Systems & Networks 114
MPEG-4: Video Coding
◼ Object-based video coding
❑ Binary shape coding
◼ Run-length coding
◼ Pel-wise coding using context-based arithmetic coding
◼ Quadtree coding
❑ -map shape coding
◼ Binary alpha map: specifies whether a pel belongs to an object
◼ Gray scale alpha map: a pel belong to the object can have a
transparency value in the range (0-255).
❑ Padding for block-based DCT of texture
❑ Shape-adaptive DCT
◼ DWT for still texture coding
◼ Mesh animation, face and body animation.

ELE-4410: Multimedia Systems & Networks 115


MPEG-4: Sprite Coding
◼ Analyze the video stream to find the static background
❑ Create a still image of the background
❑ Code the moving objects against the background
◼ 8 global motion parameters describing camera motion are coded for
each sequence
❑ Represent an affine transform of the sprite from the first frame

ELE-4410: Multimedia Systems & Networks 116


Object based Video Coding
◼ Entire scene is decomposed into multiple objects
❑ Object segmentation is the most difficult task!
❑ But this does not need to be standardized
◼ Each object is specified by its shape, motion, and texture
(color)
❑ Shape and texture both changes in time (specified by motion)
◼ MPEG-4 assumes the encoder has a segmentation map
available, specifies how to code (actually decode!)
shape, motion and texture

ELE-4410: Multimedia Systems & Networks 117


Object Description Hierarchy in
MPEG-4
◼ VO: video object
◼ VOL: video object layer
❑ (can be different parts of a VO or different rate/resolution

representation of a VOL)
◼ VOP: video object plane

ELE-4410: Multimedia Systems & Networks 118


Example of Scene Composition in
MPEG-4
◼ The decoder can compose a scene by including different
VOPs in a VOL.

ELE-4410: Multimedia Systems & Networks 119


MPEG-4 Shape Coding
◼ Uses block-based approach (block=MB)
❑ Boundary blocks (blocks containing both the object and background)
❑ Non-boundary blocks: either belong to the object or background
◼ Boundary block’s binary alpha map (binary alpha block) is
coded using context-based arithmetic coding
❑ Intra-mode: context pels within the same frame.
❑ Inter-mode: context pels include previous frame, displaced by MV.
◼ Shape MV separate from texture MV.
◼ Shape MV predictively coded using texture MV.
◼ Grayscale alpha maps are coded using DCT
◼ Texture in boundary blocks coded using
❑ padding followed by conventional DCT
❑ Or shape-adaptive DCT

ELE-4410: Multimedia Systems & Networks 120


MPEG-4 Generic video coding

ELE-4410: Multimedia Systems & Networks 121


Video Compression Progress

ELE-4410: Multimedia Systems & Networks 122


Video Compression Progress

ELE-4410: Multimedia Systems & Networks 123


Video Compression Progress

ELE-4410: Multimedia Systems & Networks 124


Video Compression Progress

ELE-4410: Multimedia Systems & Networks 125


MPEG-7 Overview
◼ To enable search and browsing of multimedia documents
◼ Defines the syntax for describing the structural and
conceptual content
◼ MPEG-1/2/4 make content available, whereas MPEG-7
allows you to find the content you need!
❑ Enable multimedia document indexing, browsing, and retrieval
❑ Define the syntax for the metadata (e.g. index and summary) attached to
the document
❑ Generation of index and summary is not part of the standard!
◼ Content description in MPEG-7
❑ Descriptor (D): describing low-level features
❑ Description scheme (DS): combining Ds to describe high-level
features/structures
❑ Description definition language (DDL): define how Ds and DSs can be
defined or modified
❑ System tools
ELE-4410: Multimedia Systems & Networks 126
MPEG-7 Visual Descriptors
◼ Color
❑ Histogram, dominant color, etc.
◼ Texture
❑ Homogeneity: energy in different orientation and frequency bands
(Gabor transform)
❑ Coarseness, directionarity, regularity
❑ Edge orientation histogram
◼ Motion
❑ Camera motion
❑ Motion trajectory of feature points in non-rigid object
❑ Motion parameters of a rigid object
❑ Motion activity
◼ Shape
❑ Boundary-based vs. region-based

ELE-4410: Multimedia Systems & Networks 127

You might also like