Basic Video Compression Technique
Basic Video Compression Technique
Chapter 10
Basic Video Compression Techniques
10.1 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration
1
w w.jntuworld.com
w w.jntuworld.com
Motion Compensation
Each image is divided into macroblocks of size N N .
By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted.
w w.jntuworld.com
Reference frame
(x, y) (x0, y0) 2p + 1
Target frame
(x, y) (x0, y0) N N
Macroblock
MV
2p + 1
Matched macroblock Search window
MV search is usually limited to a small immediate neighborhood both horizontal and vertical displacements in the range [p, p]. This makes a search window of size (2p + 1) (2p + 1).
w w.jntuworld.com
M AD(i, j) =
(10.1)
N size of the macroblock, k and l indices for pixels in the macroblock, i and j horizontal and vertical displacements, C(x + k, y + l) pixels in macroblock in Target frame, R(x + i + k, y + j + l) pixels in macroblock in Reference frame.
The goal of the search is to nd a vector (i, j) as the motion vector MV = (u, v), such that M AD(i, j) is minimum:
(u, v) = [ (i, j) | M AD(i, j) is minimum, i [p, p], j [p, p] ] (10.2)
w w.jntuworld.com
Sequential Search
Sequential search: sequentially search the whole (2p + 1) (2p + 1) window in the Reference frame (also referred to as Full search). a macroblock centered at each of the positions within the window is compared to the macroblock in the Target frame pixel by pixel and their respective M AD is then derived using Eq. (10.1). The vector (i, j) that oers the least M AD is designated as the MV (u, v) for the macroblock in the Target frame. sequential search method is very costly assuming each pixel comparison requires three operations (subtraction, absolute value, addition), the cost for obtaining a motion vector for a single macroblock is (2p+1)(2p+1)N 2 3 O(p2N 2).
w w.jntuworld.com
{
cur M AD = M AD(i, j); if cur M AD < min M AD
{
min M AD = cur M AD; u = i; v = j; /* Get the coordinates for MV. */
} }
end
w w.jntuworld.com
2D Logarithmic Search
Logarithmic search: a cheaper version, that is suboptimal but still usually eective. The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search: As illustrated in Fig.10.2, initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as 1. After the one that yields the minimum M AD is located, the center of the new search region is moved to it and the step-size (oset) is reduced to half. In the next iteration, the nine new locations are marked as 2, and so on. 9
Li & Drew c Prentice Hall 2003
w w.jntuworld.com
(x0 p, y0 p)
2 2
3 3 3 1 2 1 2
3 2 3
3 3 3 1
(x0 + p, y0 p)
MV
1
(x0, y0)
p/2
(x0 p, y0 + p)
(x0 + p, y0 + p)
w w.jntuworld.com
10
Specify nine macroblocks within the search window in the Reference frame, they are centered at (x0, y0) and separated by oset horizontally and/or vertically; while last = TRUE
{
Find one of the nine specied macroblocks that yields minimum M AD; if oset = 1 then last = TRUE; oset = oset/2 ; Form a search region with the new oset and new center found;
}
end
w w.jntuworld.com
11
Using the same example as in the previous subsection, the total operations per second is dropped to:
w w.jntuworld.com
12
Hierarchical Search
The search can benet from a hierarchical (multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a signicantly reduced resolution. Figure 10.3: a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2. Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced.
w w.jntuworld.com
13
Motion Vectors
Level 0
Motion Estimation
Downsample by a factor of 2
Level 1
Motion Estimation
w w.jntuworld.com
Let (x0, k y0)kdenote the center of the macroblock at Level k in the Target frame. The procedure for hierarchical motion
0 the vector search for the macroblock centered at (x0, y0) in Target frame can be outlined as follows:
w w.jntuworld.com
15
{
Find one of the nine macroblocks that yields minimum M AD at Level k 1 centered at
k x 2(x0 + uk) + 1, 2(y0 k (2(x0k + uk) 1 + vk) 1 y 2(y0 k+ vk) + 1); then last = TRUE; if k = 1
k = k 1; Assign (x0k , y0k ) and (uk, vk) with the new center location and MV;
}
end
w w.jntuworld.com
16
Table 10.1 Comparison of Computational Cost of Motion Vector Search based on examples
Search Method OP S per second for 720 480 at 30 fps p = 15 Sequential search 2D Logarithmic search 3-level Hierarchical search 29.89 109 1.25 109 0.51 109 p =7 7.00 109 0.78 109 0.40 109
w w.jntuworld.com
17
10.4 H.261
H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. The standard was designed for videophone, video conferencing and other audiovisual services over ISDN. The video codec supports bit-rates of p 64 kbps, where p ranges from 1 to 30 (Hence also known as p 64). Require that the delay of the video encoder be less than 150 msec so that the video can be used for real-time bidirectional video conferencing.
w w.jntuworld.com
18
w w.jntuworld.com
19
Video format
H.261 support
QCIF CIF
required optional
w w.jntuworld.com
20
w w.jntuworld.com
21
Motion vectors in H.261 are always measured in units of full pixel and they have a limited range of 15 pixels, i.e., p = 15. 22
Li & Drew c Prentice Hall 2003
w w.jntuworld.com
Cb Y Cr
I frame
1010010
w w.jntuworld.com
23
w w.jntuworld.com
24
The P-frame coding encodes the dierence macroblock (not the Target macroblock itself). Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB. For motion vector, the dierence MVD is sent for entropy coding:
(10.3)
w w.jntuworld.com
25
Target frame
16 16 Reference frame
Current macroblock
Difference macroblock
Y
Best match
Cb Cr
Motion vector
w w.jntuworld.com
26
Quantization in H.261
The quantization in H.261 uses a constant step size, for all DCT coecients within a macroblock. If we use DCT and QDCT to denote the DCT coecients before and after the quantization, then for DC coecients in Intra mode: QDCT = round DCT step size = round DCT 8 (10.4)
QDCT =
DCT 2 scale
(10.5)
27
w w.jntuworld.com
28
3 Interframe
VLE
Output Buffer
Output Code
Q1
IDCT Intraframe
0
Interframe Prediction 2
+
6 Frame Memory Motion vector
MCbased Prediction
Motion Estimation
(a) Encoder
w w.jntuworld.com
29
Q1
IDCT Intraframe
0
Interframe Prediction 2 MCbased Prediction
+
4 Frame Memory Motion vector
Decoded Frame
(b) Decoder
w w.jntuworld.com
30
Current Frame I P1 P2
Observation Point 1 2 3 4 5 I P1 P2 P1 P2 D1 D2 I 1 D 2 D 0 P1 P2
6 I P 1 P 2
w w.jntuworld.com
32
In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identiable GOB. GQuant indicates the Quantizer to be used in the GOB unless it is overridden by any subsequent MQuant (Quantizer for Macroblock). GQuant and MQuant are referred to as scale in Eq. (10.5).
3. The Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8 8 image blocks (4 Y, 1 Cb, 1 Cr). 4. The Block layer: For each 8 8 block, the bitstream starts with DC value, followed by pairs of length of zerorun (Run) and the subsequent non-zero value (Level) for ACs, and nally the End of Block (EOB) code. The range of Run is [0, 63]. Level reects quantized values its range is [127, 127] and Level = 0. 33
Li & Drew c Prentice Hall 2003
w w.jntuworld.com
PSC
TR
PType
GOB
GOB
GOB
GBSC
GN
GQuant
MB
MB
Address
Type
MQuant
MVD
CBP
b0
b1
b5
macroblock layer
DC
(Run, Level)
(Run, Level)
EOB
block layer
Picture Start Code Picture Type GOB Start Code GOB Quantizer MB Quantizer Coded Block Pattern
Temporal Reference Group of Blocks Group Number Macro Block Motion Vector Data End of Block
w w.jntuworld.com
w w.jntuworld.com
35
10.5 H.263
H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). Aims at low bit-rate communications at bit-rates of less than 64 kbps. Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).
w w.jntuworld.com
36
w w.jntuworld.com
37
w w.jntuworld.com
38
GOB 0 GOB 1 GOB 2 GOB 3 GOB 4 GOB 5 Sub-QCIF GOB 0 GOB 1 GOB 2 GOB 3 GOB 4 GOB 5 GOB 6 GOB 7 GOB 8 QCIF
w w.jntuworld.com
39
Instead of coding the MV(u, v) itself, the error vector (u, v) is coded, where u = u up and v = v vp.
w w.jntuworld.com
40
Current motion vector Previous motion vector Above motion vector Above and right motion vector
Border (b)
w w.jntuworld.com
Half-Pixel Precision
In order to reduce the prediction error, half-pixel precision is supported in H.263 vs. full-pixel precision only in H.261. The default range for both the horizontal and vertical components u and v of MV(u, v) are now [16, 15.5]. The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in Fig. 10.12.
w w.jntuworld.com
42
A a b
w w.jntuworld.com
43
w w.jntuworld.com
44
2. Syntax-based arithmetic coding mode: As in H.261, variable length coding (VLC) is used in H.263 as a default coding method for the DCT coecients. Similar to H.261, the syntax of H.263 is also structured as a hierarchy of four layers. Each layer is coded using a combination of xed length code and variable length code. 3. Advanced prediction mode: In this mode, the macroblock size for MC is reduced from 16 to 8. Four motion vectors (from each of the 8 8 blocks) are generated for each macroblock in the luminance image. 45
Li & Drew c Prentice Hall 2003
w w.jntuworld.com
4. PB-frames mode: In H.263, a PB-frame consists of two pictures being coded as one unit, as shown Fig. 10.13. The use of the PB-frames mode is indicated in PTYPE. The PB-frames mode yields satisfactory results for videos with moderate motions. Under large motions, PB-frames do not compress as well as B-frames and an improved new mode has been developed in Version 2 of H.263.
w w.jntuworld.com
46
PB-frame
I or P
w w.jntuworld.com
47
w w.jntuworld.com
48
H.263+ implements Temporal, SNR, and Spatial scalabilities. Support of Improved PB-frames mode in which the two motion vectors of the B-frame do not have to be derived from the forward motion vector of the P-frame as in Version 1. H.263+ includes deblocking lters in the coding loop to reduce blocking eects.
w w.jntuworld.com
49
H.263++ includes the baseline coding methods of H.263 and additional recommendations for Enhanced Reference Picture Selection (ERPS), Data Partition Slice (DPS), and Additional Supplemental Enhancement Information. The ERPS mode operates by managing a multi-frame buer for stored frames enhances coding eciency and error resilience capabilities. The DPS mode provides additional enhancement to error resilience by separating header and motion vector data from DCT coecient data in the bitstream and protects the motion vector data by using a reversible code.
w w.jntuworld.com
50
Tutorials and White Papers on H.261 and H263 H.261 and H.263 software implementations An H263/H263+ library A Java H.263 decoder
w w.jntuworld.com
51