Video Coding Basics: Yao Wang Polytechnic University, Brooklyn, NY11201 Yao@vision - Poly.edu
Video Coding Basics: Yao Wang Polytechnic University, Brooklyn, NY11201 Yao@vision - Poly.edu
Outline
Motivation for video coding Basic ideas in video coding Block diagram of a typical video codec Different modes of operation: I, B, P Block DCT Coding
DCT Quantization Run-length coding Difference between I and P/B blocks
Why Compress?
Video Format Y Size Color Sampling Frame Rate (Hz) 24P/30P/60P 24P/30P/60I 60I/50I 60I/50I Raw Data Rate (Mbps) 265/332/664 597/746/746 249 166 124 30 37 9.1
3
HDTV Over air. cable, satellite, MPEG2 video, 20-45 Mbps SMPTE296M 1280x720 4:2:0 SMPTE295M 1920x1080 4:2:0 Video production, MPEG2, 15-50 Mbps BT.601 720x480/576 BT.601 720x480/576 4:4:4 4:2:2
High quality video distribution (DVD, SDTV), MPEG2, 4-10 Mbps BT.601 720x480/576 4:2:0 60I/50I Intermediate quality video distribution (VCD, WWW), MPEG1, 1.5 Mbps SIF 352x240/288 4:2:0 30P/25P Video conferencing over ISDN/Internet, H.261/H.263/MPEG4, 128-384 Kbps CIF 352x288 4:2:0 30P Video telephony over wired/wireless modem, H.263/MPEG4, 20-64 Kbps QCIF 176x144 4:2:0 30P
Yao Wang, 2003 EE4414: Video Coding Basics
Standards H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV
Application Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting
Video Format CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295
Raw Data Rate 37 Mbps 9.1 Mbps 9.1 Mbps 30 Mbps 128 Mbps
Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps
<=700 Mbps
18--45 Mbps
Background
Moving objects
Frame t-1
Frame t
Adjacent frames are similar and changes are due to object or camera motion --- Temporal correlation
Yao Wang, 2003 EE4414: Video Coding Basics 7
Frame 66
Frame 69
4 8x8 Y blocks
1 8x8 Cb blocks
1 8x8 Cr blocks
10
From [Wang02]
11
From [Wang02]
12
MV
Search Region
EDFD (d m ) =
xBm
2 ( x + d m ) 1 ( x)
min
Displacement between the current MB and the best matching MB is the MV Current MB is replaced by the best matching MB (motioncompensated prediction or motion compensation)
14
Temporal Prediction
No Motion Compensation:
Work well in stationary regions
f$ ( t , m, n) = f ( t 1, m, n)
Uni-directional Motion Compensation:
Does not work well for uncovered regions due to object motion or newly appeared objects
f$ ( t , m, n ) = f ( t 1, m d x , n d y )
Bi-directional Motion Compensation
Can handle better covered/uncovered regions
f$ ( t , m, n ) = wb f ( t 1, m d b , x , n d b , y ) + w f f ( t + 1, m d f , x , n d f , y )
Yao Wang, 2003 EE4414: Video Coding Basics 15
Intra: coded directly; Predictive: predicted from a previous frame; Bidirectional: predicted from a previous frame and a following frame. From [Wang02] Can be done at the block or frame level.
Yao Wang, 2003 EE4414: Video Coding Basics 16
Run-Length Coder
Run-Length Decoder
Forward Transform
Inverse Quantizer
Channel
For I-blocks, DCT is applied to original image values For P/B-blocks, DCT is applied to prediction errors
Inverse Transform
Quantizer
Input Block
Output Block
17
Low-Low
High-Low
Low-High
High-High
18
20 -10 -5 -13 -14 -21 -20 -21 -13 -18 -18 -16 -23 -19 -27 -28 -24 -22 -22 -26 -24 -33 -30 -23 -29 -13 17 30 3 -24 -10 -42 -41 50 -5 4 12 10 5 5 -16 26 26 -21 12 -31 -40 23
35.8733 -24.3038 -15.5776 -20.7924 11.6485 -19.1072 -8.5366 0.5125 40.7500 -20.5573 -13.6629 17.0615 -14.2500 22.3828 -4.8940 -11.3606 7.1918 -13.5722 -7.5971 -11.9452 18.2597 -16.2618 -1.4197 -1.4562 -13.3225 -0.8750 -6.7720 -2.8384 4.1187 1.3248 10.3817 16.0762 4.4157 1.1118 10.5527 -2.7348 -3.2327 -3.5087 1.1041 1.5799
Note that most DCT coefficients are close to zero except those at the low-low range
Yao Wang, 2003 EE4414: Video Coding Basics 19
Quantization Matrices
For I-blocks: non-uniform scaling is used (as in JPEG)
For P/B blocks: the same stepsize (8) is used for all coefficients, and this stepsize can be scaled by a userselectable parameter (quantization parameter or QP) that controls the trade-off between bit-rate and quality
20
Zig-Zag Ordering
Zig-Zag ordering: converting a 2D matrix into a 1D array, so that the frequency (horizontal+vertical) increases in this order, and the coefficient variance (average of magnitude square) decreases in this order.
21
Run-length Coding
Runlength coding
Many coefficients are zero after quantization Runlength Representation:
Ordering coefficients in the zig-zag order Specify how many zeros before a non-zero value Each symbol=(length-of-zero, non-zero-value) For I-blocks, the DC coefficient is specified directly
22
3 -1 -1 -1 2 -1 0 0 0 0 0 0
Run-length symbol representation: {2,(0,5),(0,9),(0,14),(0,1),(1,-2),(0,-1),(0,1),(0,3),(0,2),(0,-1),(0,-1),(0,2),(1,-1),(2,-1), (0,-1), (4,-1),(0,-1),(0,1),EOB} EOB: End of block, one of the symbol that is assigned a short Huffman codeword
Yao Wang, 2003 EE4414: Video Coding Basics 23
Quantize the DCT coefficients with properly chosen quantization matrices (different matrices for Y and C)
The quantized DCT coefficients are zig-zag ordered and run-length coded
24
The prediction errors in each of the DCT blocks (8x8) are DCT transformed, quantized (according to specified QP), zig-zag scanned, and run-length coded
25
vf
vb
26
Block-level control
A MB is coded using the mode that leads to the lowest bitrate for the same distortion -> rate-distortion optimized mode selection I-mode is used for the first frame, and is inserted periodically in following frames, to stop transmission error propagation
Rate Control
For a fixed QP, the bit rate varies from block to block
I mode needs more bits than P and B modes Even when the mode is the same, blocks with complex motion and texture require more bits
To reach a desired bit rate (averaged over a frame or a group of frames), one can adjust
QP Encoding frame rate (frame skip) Controlled by the status of a buffer that stores the bits produced by the encoder
28
Distortion
S atellite dish
encoder
decoder
Yao Wang, 2003 EE4414: Video Coding Basics 29
reference
reference
transmission
Coded, No loss
3%
5%
10%
Example reconstructed video frames from a H.263 coded sequence, subject to packet losses
Yao Wang, 2003 EE4414: Video Coding Basics 30
Trade-off between efficiency and error-resilience Can also use channel coding / retransmission to correct errors
31
Error Concealment
With proper error-resilience tools, packet loss typically lead to the loss of an isolated segment of a frame The lost region can be recovered based on the received regions by spatial/temporal interpolation Error concealment Decoders on the market differ in their error concealment capabilities
Without concealment
Yao Wang, 2003
With concealment
32
Scalable Coding
Motivation
Real networks are heterogeneous in rate
streaming video from home (56 kbps) using modem vs. corporate LAN (10-100 mbps)
33
34
Spatial scalability
6.5 kbps
133.9 kbps
21.6 kbps
35
36
References
Y. Wang, J. Ostermann, Y. Q. Zhang, Video Processing and Communications, Prentice Hall, 2002. Chapters 9,11,13 Y. Wang and Q. Zhu, Error control and concealment for video communication: a review, Proceedings of the IEEE, vol. 86, pp. 974-997. May 1998.
37