0% found this document useful (0 votes)
88 views

Video Coding Basics: Yao Wang Polytechnic University, Brooklyn, NY11201 Yao@vision - Poly.edu

Video conferencing over ISDN Video conferencing over Internet Video distribution on DVD / digital TV Multimedia distribution over Inter / Intra net HDTV broadcasting. EE4414: Video Coding Basics Image Coding Revisited Why can we compress an image - Adjacent pixels are correlated (have similar color values) How to compress (the JPEG way) - Use transform to decorrelate the signal (DCT) - Quantize the DCT coefficients - runlength code the quantized

Uploaded by

leandroariel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Video Coding Basics: Yao Wang Polytechnic University, Brooklyn, NY11201 Yao@vision - Poly.edu

Video conferencing over ISDN Video conferencing over Internet Video distribution on DVD / digital TV Multimedia distribution over Inter / Intra net HDTV broadcasting. EE4414: Video Coding Basics Image Coding Revisited Why can we compress an image - Adjacent pixels are correlated (have similar color values) How to compress (the JPEG way) - Use transform to decorrelate the signal (DCT) - Quantize the DCT coefficients - runlength code the quantized

Uploaded by

leandroariel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Video Coding Basics

Yao Wang Polytechnic University, Brooklyn, NY11201 [email protected]

Outline
Motivation for video coding Basic ideas in video coding Block diagram of a typical video codec Different modes of operation: I, B, P Block DCT Coding
DCT Quantization Run-length coding Difference between I and P/B blocks

Yao Wang, 2003

EE4414: Video Coding Basics

Why Compress?
Video Format Y Size Color Sampling Frame Rate (Hz) 24P/30P/60P 24P/30P/60I 60I/50I 60I/50I Raw Data Rate (Mbps) 265/332/664 597/746/746 249 166 124 30 37 9.1
3

HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps SMPTE296M 1280x720 4:2:0 SMPTE295M 1920x1080 4:2:0 Video production, MPEG2, 15-50 Mbps BT.601 720x480/576 BT.601 720x480/576 4:4:4 4:2:2

High quality video distribution (DVD, SDTV), MPEG2, 4-10 Mbps BT.601 720x480/576 4:2:0 60I/50I Intermediate quality video distribution (VCD, WWW), MPEG1, 1.5 Mbps SIF 352x240/288 4:2:0 30P/25P Video conferencing over ISDN/Internet, H.261/H.263/MPEG4, 128-384 Kbps CIF 352x288 4:2:0 30P Video telephony over wired/wireless modem, H.263/MPEG4, 20-64 Kbps QCIF 176x144 4:2:0 30P
Yao Wang, 2003 EE4414: Video Coding Basics

Multimedia Communication Standards

Standards H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV

Application Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting

Video Format CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295

Raw Data Rate 37 Mbps 9.1 Mbps 9.1 Mbps 30 Mbps 128 Mbps

Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps

<=700 Mbps

18--45 Mbps

Yao Wang, 2003

EE4414: Video Coding Basics

Components in a Coding System

Yao Wang, 2003

EE4414: Video Coding Basics

Image Coding Revisited


Why can we compress an image
Adjacent pixels are correlated (have similar color values)

How to compress (the JPEG way)


Use transform to decorrelate the signal (DCT) Quantize the DCT coefficients Runlength code the quantized indices
Zigzag ordering Huffman coding each pair (zero runlength, non-zero value)

What is different with video?


We can apply JPEG to each video frame (Motion-JPEG) But we can do more than that to achieve higher compression!

Yao Wang, 2003

EE4414: Video Coding Basics

Characteristics of Typical Videos

Background

Moving objects

Frame t-1

Frame t

Adjacent frames are similar and changes are due to object or camera motion --- Temporal correlation
Yao Wang, 2003 EE4414: Video Coding Basics 7

Example: Two adjacent frames are similar

Frame 66

Frame 69

Show difference between two frames w and w/o motion compensation


Abs olute Difference w/o Motion Compens ation
Abs olute Difference with Motion Compens ation

Yao Wang, 2003

EE4414: Video Coding Basics

Key Ideas in Video Coding


Predict a new frame from a previous frame and only specify the prediction error (INTER mode) Prediction error will be coded using an image coding method (e.g., DCT-based as in JPEG) Prediction errors have smaller energy than the original pixel values and can be coded with fewer bits Those regions that cannot be predicted well will be coded directly using DCT-based method (INTRA mode) Use motion-compensated temporal prediction to account for object motion Work on each macroblock (MB) (16x16 pixels) independently for reduced complexity
Motion compensation done at the MB level DCT coding of error at the block level (8x8 pixels) Block-based hybrid video coding

Yao Wang, 2003

EE4414: Video Coding Basics

MB Structure in 4:2:0 Color Format

4 8x8 Y blocks

1 8x8 Cb blocks

1 8x8 Cr blocks

Yao Wang, 2003

EE4414: Video Coding Basics

10

Encoder Block Diagram of a Typical Block-Based Hybrid Coder

From [Wang02]

Yao Wang, 2003

EE4414: Video Coding Basics

11

Decoder Block Diagram

From [Wang02]

Yao Wang, 2003

EE4414: Video Coding Basics

12

Block Matching Algorithm for Motion Estimation

MV
Search Region

Frame t-1 (Reference Frame)


Yao Wang, 2003 EE4414: Video Coding Basics

Frame t (Predicted frame)


13

Block Matching Algorithm Overview


For each MB in a new (predicted) frame
Search for a block in a reference frame that has the lowest matching error
Using sum of absolute errors between corresponding pels Search range: depends on the anticipated motion range

EDFD (d m ) =

xBm

2 ( x + d m ) 1 ( x)

min

Displacement between the current MB and the best matching MB is the MV Current MB is replaced by the best matching MB (motioncompensated prediction or motion compensation)

This subject will be discussed in more detail in a separate lecture

Yao Wang, 2003

EE4414: Video Coding Basics

14

Temporal Prediction
No Motion Compensation:
Work well in stationary regions

f$ ( t , m, n) = f ( t 1, m, n)
Uni-directional Motion Compensation:
Does not work well for uncovered regions due to object motion or newly appeared objects

f$ ( t , m, n ) = f ( t 1, m d x , n d y )
Bi-directional Motion Compensation
Can handle better covered/uncovered regions

f$ ( t , m, n ) = wb f ( t 1, m d b , x , n d b , y ) + w f f ( t + 1, m d f , x , n d f , y )
Yao Wang, 2003 EE4414: Video Coding Basics 15

Different Coding Modes

Intra: coded directly; Predictive: predicted from a previous frame; Bidirectional: predicted from a previous frame and a following frame. From [Wang02] Can be done at the block or frame level.
Yao Wang, 2003 EE4414: Video Coding Basics 16

DCT-Based Coding Revisited


Transform Coefficients Coefficient Indices Coded Bitstream Quantized Coefficients

Run-Length Coder

Run-Length Decoder

Forward Transform

Inverse Quantizer

Channel

Why do we use DCT:


To exploit the correlation between adjacent pixels Typically only low frequency DCT coefficients are significant

For I-blocks, DCT is applied to original image values For P/B-blocks, DCT is applied to prediction errors

Yao Wang, 2003

EE4414: Video Coding Basics

Inverse Transform

Quantizer

Input Block

Output Block

17

Basis Images of 8x8 DCT

Low-Low

High-Low

Low-High

High-High

Yao Wang, 2003

EE4414: Video Coding Basics

18

DCT on a Real Image Block


>>imblock = lena256(128:135,128:135)-128 imblock= 54 47 68 52 71 48 73 75 14 20 73 24 71 20 45 -8 >>dctblock =dct2(imblock) dctblock= 31.0000 51.7034 113.5766 1.1673 -24.5837 -12.0000 -25.7508 11.9640 0.8750 23.2873 9.5585 6.9743 -13.9045 43.2054 -6.0959 35.5931 -13.3692 -13.0005

20 -10 -5 -13 -14 -21 -20 -21 -13 -18 -18 -16 -23 -19 -27 -28 -24 -22 -22 -26 -24 -33 -30 -23 -29 -13 17 30 3 -24 -10 -42 -41 50 -5 4 12 10 5 5 -16 26 26 -21 12 -31 -40 23

195.5804 10.1395 -8.6657 -2.9380 -28.9833 -7.9396

35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125 40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606 7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -1.4562 -13.3225 -0.8750 -6.7720 -2.8384 4.1187 1.3248 10.3817 16.0762 4.4157 1.1118 10.5527 -2.7348 -3.2327 -3.5087 1.1041 1.5799

Note that most DCT coefficients are close to zero except those at the low-low range
Yao Wang, 2003 EE4414: Video Coding Basics 19

Quantization Matrices
For I-blocks: non-uniform scaling is used (as in JPEG)

For P/B blocks: the same stepsize (8) is used for all coefficients, and this stepsize can be scaled by a userselectable parameter (quantization parameter or QP) that controls the trade-off between bit-rate and quality

Yao Wang, 2003

EE4414: Video Coding Basics

20

Zig-Zag Ordering

Zig-Zag ordering: converting a 2D matrix into a 1D array, so that the frequency (horizontal+vertical) increases in this order, and the coefficient variance (average of magnitude square) decreases in this order.

Yao Wang, 2003

EE4414: Video Coding Basics

21

Run-length Coding
Runlength coding
Many coefficients are zero after quantization Runlength Representation:
Ordering coefficients in the zig-zag order Specify how many zeros before a non-zero value Each symbol=(length-of-zero, non-zero-value) For I-blocks, the DC coefficient is specified directly

Code all possible symbols using Huffman coding


More frequently appearing symbols are given shorter codewords One can use default Huffman tables or specify its own tables. Instead of Huffman coding, arithmetic coding can be used to achieve higher coding efficiency at an added complexity.

Yao Wang, 2003

EE4414: Video Coding Basics

22

Example of Runlength Coding


Quantized DCT indices for an I block = 2 9 14 5 1 1 0 -1 -1 0 0 0 0 -2 2 0 0 0 0 0 0 -1 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3 -1 -1 -1 2 -1 0 0 0 0 0 0

Run-length symbol representation: {2,(0,5),(0,9),(0,14),(0,1),(1,-2),(0,-1),(0,1),(0,3),(0,2),(0,-1),(0,-1),(0,2),(1,-1),(2,-1), (0,-1), (4,-1),(0,-1),(0,1),EOB} EOB: End of block, one of the symbol that is assigned a short Huffman codeword
Yao Wang, 2003 EE4414: Video Coding Basics 23

Macroblock Coding in I-Mode


DCT transform each 8x8 DCT block

Quantize the DCT coefficients with properly chosen quantization matrices (different matrices for Y and C)

The quantized DCT coefficients are zig-zag ordered and run-length coded

Yao Wang, 2003

EE4414: Video Coding Basics

24

Macroblock Coding in P-Mode


For each macroblock (16x16), find the best matching block in a previous frame, and calculate the prediction errors

The prediction errors in each of the DCT blocks (8x8) are DCT transformed, quantized (according to specified QP), zig-zag scanned, and run-length coded

1 pair of motion vector (MV) also needs to be coded

Yao Wang, 2003

EE4414: Video Coding Basics

25

Macroblock Coding in B-Mode


Same as for the P-mode, except that a macroblock is predicted from both a previous picture and a following one. Two pair of MVs needed to be coded.

vf

vb

Yao Wang, 2003

EE4414: Video Coding Basics

26

Coding Mode Selection


Which mode should we use for a given MB? Frame-level control
I frame use only I-mode P-frame use P-mode, except when prediction does not work (back to I-mode) B-frame use B-mode (but can switch to P-mode and I-mode)

Block-level control
A MB is coded using the mode that leads to the lowest bitrate for the same distortion -> rate-distortion optimized mode selection I-mode is used for the first frame, and is inserted periodically in following frames, to stop transmission error propagation

Mode information is coded in MB header


Yao Wang, 2003 EE4414: Video Coding Basics 27

Rate Control
For a fixed QP, the bit rate varies from block to block
I mode needs more bits than P and B modes Even when the mode is the same, blocks with complex motion and texture require more bits

To reach a desired bit rate (averaged over a frame or a group of frames), one can adjust
QP Encoding frame rate (frame skip) Controlled by the status of a buffer that stores the bits produced by the encoder

Yao Wang, 2003

EE4414: Video Coding Basics

28

Sensitivity to Transmission Errors


Prediction and variable length coding makes the video stream very sensitive to transmission errors on the bitstream
Error in one frame will propagate to subsequent frames Bit errors in one part of the bit stream make the following bits undecodable

Distortion
S atellite dish

encoder
decoder
Yao Wang, 2003 EE4414: Video Coding Basics 29

reference

reference

transmission

Effect of Transmission Errors

Coded, No loss

3%

5%

10%

Example reconstructed video frames from a H.263 coded sequence, subject to packet losses
Yao Wang, 2003 EE4414: Video Coding Basics 30

Error Resilient Encoding


To help the decoder to resume normal decoding after errors occur, the encoder can
Periodically insert INTRA mode (INTRA refresh) Insert resynchronization codewords at the beginning of a group of blocks (GOB)

More sophisticated error-resilience tools


Multiple description coding

Trade-off between efficiency and error-resilience Can also use channel coding / retransmission to correct errors

Yao Wang, 2003

EE4414: Video Coding Basics

31

Error Concealment
With proper error-resilience tools, packet loss typically lead to the loss of an isolated segment of a frame The lost region can be recovered based on the received regions by spatial/temporal interpolation Error concealment Decoders on the market differ in their error concealment capabilities

Without concealment
Yao Wang, 2003

With concealment
32

EE4414: Video Coding Basics

Scalable Coding
Motivation
Real networks are heterogeneous in rate
streaming video from home (56 kbps) using modem vs. corporate LAN (10-100 mbps)

Scalable video coding


Ideal goal (embedded stream): Creating a bitstream that can be accessed at any rate Practical video coder:
layered coder: base layer provides basic quality, successive layers refine the quality incrementally Coarse granularity (typically known as layered coder) Fine granularity (FGS)

Yao Wang, 2003

EE4414: Video Coding Basics

33

Bit Stream Scalability

Yao Wang, 2003

EE4414: Video Coding Basics

34

Illustration of Scalable Coding

Spatial scalability

6.5 kbps

133.9 kbps

21.6 kbps

436.3 kbps Quality (SNR) scalability

Yao Wang, 2003

EE4414: Video Coding Basics

35

What you should know


What are the principle steps in a video coder? What are the three types of information coded? You should be able to draw the block diagram of a typical block-based video codec (encoder and decoder) using motion-compensation and know the function of each step Why do we use motion-compensated prediction? What are the difference between I, B, and P modes? Why do we use different modes? What may be the problem if we use Pmodes only (except the first frame)? What are the basic steps in DCT-based coding? How to apply it to I and P/B blocks ? Why is error-resilience and error-concealment important in video encoder and decoder design? What is scalable coding? What are the benefits and trade-offs?

Yao Wang, 2003

EE4414: Video Coding Basics

36

References
Y. Wang, J. Ostermann, Y. Q. Zhang, Video Processing and Communications, Prentice Hall, 2002. Chapters 9,11,13 Y. Wang and Q. Zhu, Error control and concealment for video communication: a review, Proceedings of the IEEE, vol. 86, pp. 974-997. May 1998.

Yao Wang, 2003

EE4414: Video Coding Basics

37

You might also like