VDC-M v1.1
VDC-M v1.1
Standard
www.vesa.org
Version 1.1
11 May, 2018
Purpose
The purpose of this document is to specify the VESA® Display Compression-M (VDC-M) Standard.
Summary
VDC-M is part of the VESA display compression codec family. The algorithm should operate at low bit rate
and in real-time. A rate controller and buffer ensure that pictures do not experience underflow or overflow
issues. The encoder also produces a constant rate bitstream when provided a pixel data stream at a
constant rate.
In most cases, the codec operation is visually lossless. To better ensure interoperability and visually lossless
quality, this Standard normatively specifies both the encoder and decoder.
Contents
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Intellectual Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Patents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Support for this Standard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Tables
Table 1: Main Contributors to VDC-M v1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Table 2: Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Table 1-1: Coding Objects Used by EncodeSlice and DecodeSlice Routines. . . . . . . . . . . . . . . . . . .15
Table 1-2: Virtual Routines Provided by Mode Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Table 1-3: Recurring Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Table 1-4: C Model Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Table 1-5: Highlight and Camel-case Rules Used throughout this Standard . . . . . . . . . . . . . . . . . . . .18
Table 1-6: Acronyms, Initialisms, and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Table 1-7: Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Table 1-8: Reference Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Table 2-1: Parameters Used to Define Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Table 2-2: ECGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Table 2-3: Bits Required for Example Group of Samples that Use
SM and 2C Bit Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Table 2-4: Substream Multiplexing Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Table 3-1: PPS Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Table 3-2: Frame-level Syntax Fields (Constant Bit Rate Mode) . . . . . . . . . . . . . . . . . . . . . . . . . . . .49
Table 3-3: Slice-level Syntax Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Table 3-4: Substream-level Syntax for Substreams 0 through 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Table 3-5: syntaxElementSubstream[0] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Table 3-6: syntaxElementSubstream[1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Table 3-7: syntaxElementSubstream[2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
Table 3-8: syntaxElementSubstream[3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
Table 3-9: ECG-level Syntax for a Given Component k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Table 4-1: Component Width and Height Parameters for Different Chroma Sampling Formats . . . .60
Table 4-2: Equation-based Implementation of the Color Space Transforms
between RGB and YCoCg Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
Table 4-3: Flatness Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61
Table 4-4: Hadamard Shift Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62
Table 4-5: Normalization of Hadamard Transform Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
Table 4-6: Complexity Measure for Flatness Detection Calculation . . . . . . . . . . . . . . . . . . . . . . . . . .63
Table 4-7: Flatness Detection Previous Block Complexity Calculation. . . . . . . . . . . . . . . . . . . . . . . .64
Table 4-8: Flatness Detection Next Block Flatness Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Table 4-9: Flatness Detection Classification Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Table 4-10: Rate BF Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Table 4-11: Mechanisms that Ensure modeRate Is Strictly Greater than
avgBlockBits when underflowPrevention Flag Is Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . .68
Table 4-12: Constants Used for Calculating rcFullness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69
Table 4-13: rcOffset BF Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Table 4-14: BF Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
Table 4-15: Update Target Rate Scaling Factor and Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
Table 4-16: targetRateBase Target Rate Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Table 4-17: Constants Used for Calculating targetRateBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Table 4-18: targetRateDelta Target Rate Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Table 4-19: minQpOffset Calculation for RC QP Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Table 4-20: maxQp Calculation for RC QP Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Table 4-21: QP Update for Flatness Type 0 (Very Flat). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Table 4-22: QP Update for Flatness Type 1 (Somewhat Flat) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Table 4-23: QP Update for Flatness Type 2 (Complex-to-flat) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Table 4-24: QP Update for Flatness Type 3 (Flat-to-complex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Table 4-25: modeRdCost Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Table 4-26: lambdaFullness Encoder Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Table 4-27: lambdaBitrate Encoder Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Table 4-28: modeDistortion Calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
Table 4-29: Transform Mode Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80
Table 4-30: intraListA as a Function of Block Position within a Slice . . . . . . . . . . . . . . . . . . . . . . . . .82
Table 4-31: Intra Prediction Mode for FBLS Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
Table 4-32: Intra Prediction Modes for NFBLS Blocks (4:4:4, Luma Component) . . . . . . . . . . . . . . .84
Table 4-33: Intra Prediction Modes for NFBLS Blocks (4:2:2, Chroma Components) . . . . . . . . . . . . .85
Table 4-34: Intra Prediction Modes for NFBLS Blocks (4:2:0, Chroma Components) . . . . . . . . . . . . .87
Table 4-35: Forward Discrete Cosine Transform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Table 4-36: Discrete Cosine Transform Pre-shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
Table 4-37: 8-point Forward Discrete Cosine Transform Applied to Selected Row r
of Pre-shifted Residual Block R(i, j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Table 4-38: 4-point Forward Discrete Cosine Transform Applied to Row r
of Pre-shifted Residual Block R(i, j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
Table 4-39: 2-point Forward Haar Transform for Column C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Table 4-40: Forward Discrete Cosine Transform Post-shift. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Table 4-41: Encoder Intra Predictor Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Table 4-42: NFBLS Block Intra Predictor Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
Table 4-43: Mapping for Re-ordering Quantized Transform Coefficients
for EC – 8x2 Components Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
Table 4-44: BPV Search Ranges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Table 4-45: BPV Search Operation for 2x2 and 2x1 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101
Table 4-46: BPV Search SAD Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
Table 4-47: YCoCg SAD for 2x2 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Table 4-48: YCoCg SAD for 2x1 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Table 4-49: Residual Sub-blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108
Table 4-50: BP Mode Partition Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
Table 4-51: mppQp Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114
Table 4-52: mppMinStepSize for Various Bit Depths and ssm_max_se_size. . . . . . . . . . . . . . . . . . .115
Table 4-53: MPP Mode Midpoint Calculation for Sub-blocks within a Given Component k . . . . . . .116
Table 4-54: MPP Mode Prediction, Quantization, Inverse Quantization,
Reconstruction, and Error Diffusion for a 2x2 Sub-block . . . . . . . . . . . . . . . . . . . . . . . .119
Table 4-55: MPP Mode Quantized Residual Distribution among Substreams 0 through 3 . . . . . . . . .120
Figures
Figure 1-1: Encoder and Decoder Test Model High-level Operation . . . . . . . . . . . . . . . . . . . . . . . . . .14
Figure 1-2: Example VDC-M 8x2 Block with Index (5, 1) Highlighted . . . . . . . . . . . . . . . . . . . . . . . .18
Figure 2-1: High-level Encoding Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Figure 2-2: High-level Decoding Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Figure 2-3: Forward and Inverse YCoCg Transform Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Figure 2-4: Picture Element Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Figure 2-5: Blocks within a Slice are Processed in Block-raster Order. . . . . . . . . . . . . . . . . . . . . . . . .24
Figure 2-6: 1080p Frame with sliceHeight = 108, (Left) 2 Slices/line, (Right) 4 Slices/line . . . . . . . .25
Figure 2-7: Example Slice Demonstrating Blocklines within FBLS and NFBLS . . . . . . . . . . . . . . . . .25
Figure 2-8: Example Horizontal Slice Padding Using Pixel Replication . . . . . . . . . . . . . . . . . . . . . . .27
Figure 2-9: Example Slice of Width 1920 with (Top) slicesPerLine = 1,
(Bottom) slicesPerLine = 2; Horizontal Slice Padding Is Not Required. . . . . . . . . . . . . . .27
Figure 2-10: Example Slice of Width 1000 with (Top) slicesPerLine = 2,
(Bottom) slicesPerLine = 4; Horizontal Slice Padding Is Required in Both Cases . . . . . .28
Figure 2-11: Transform Mode Operation from the Encoder Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
Figure 2-12: Transform Mode Operation from the Decoder Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Figure 2-13: BP Mode Operation from the Encoder Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
Figure 2-14: Example Result of BPV Search for the First Two Sub-blocks
of a Block that Are Not within the Slice’s First Blockline . . . . . . . . . . . . . . . . . . . . . . . . .33
Figure 2-15: BP Mode Operation from the Decoder Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Figure 2-16: Encoder Side MPP Mode Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
Figure 2-17: Decoder Side MPP Mode Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
Figure 2-18: Complex-to-flat Transition Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
Figure 2-19: RC Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Figure 2-20: Example Configuration with slicesPerLine = 2,
in Which Chunks from Slices 0 and 1 Will Be Multiplexed
into the Bitstream, Followed by Chunks from Slices 2 and 3, etc. . . . . . . . . . . . . . . . . . . .41
Figure 4-1: Encoder Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Figure 4-2: Block Component Sizes for Different Chroma Sampling Formats. . . . . . . . . . . . . . . . . . .60
Figure 4-3: 8-point Forward Hadamard Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62
Figure 4-4: 4-point Forward Hadamard Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
Figure 4-5: RC Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66
Figure 4-6: rcOffsetInit as a Function of Block Time within a Slice. . . . . . . . . . . . . . . . . . . . . . . . . . .70
Figure 4-7: rcOffset as a Function of Block Time within a Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Figure 4-8: RC QP Update Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
Figure 4-9: QP Update Mapping from diffBits to qpIndex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
Figure 4-10: QP Update Mapping from rcFullness to qpUpdateMode . . . . . . . . . . . . . . . . . . . . . . . . . .75
Figure 4-11: Encoder Operations for a Transform Mode Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Figure 4-12: Transform Mode Prediction and Reconstruction Buffers . . . . . . . . . . . . . . . . . . . . . . . . . .81
Figure 4-13: Intra Prediction Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
Figure 4-14: Forward Discrete Cosine Transform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Figure 4-15: Butterfly Structure for 8-point Forward Discrete Cosine Transform . . . . . . . . . . . . . . . . .90
Figure 4-16: Butterfly Structure for 4-point Forward Discrete Cosine Transform . . . . . . . . . . . . . . . . .92
Figure 4-17: Transform Mode ECG Structure – 8x2 Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
Figure 4-18: Transform Mode ECG Structure – 4x2 and 4x1 Components . . . . . . . . . . . . . . . . . . . . . .96
Figure 4-19: Example Transform Component Data Blocks with Corresponding lastSigPos,
ECG Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97
Figure 4-20: BP Mode Encoder Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
Figure 4-21: BP Mode Partitions – 2x2 and 2x1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
Figure 4-22: BP Mode Prediction and Reconstruction Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Figure 4-23: BPV Search Range for FBLS and NFBLS Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100
Figure 4-24: BPV Search Range Candidate 2x2 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102
Figure 4-25: BPV Search Range Candidate 2x1 Partitions
for Source Partition in First Line of Sub-block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102
Figure 4-26: BPV Search Range Candidate 2x1 Partitions
for Source Partition in Second Line of Sub-block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
Figure 4-27: Example 2x2 and 2x1 Source and Candidate Partitions for SAD Calculation . . . . . . . . .104
Figure 4-28: BP Mode Partitions for 4:2:2 and 4:2:0 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105
Figure 4-29: BPV Search Range for 4:2:2 and 4:2:0 FBLS and NFBLS Chroma Components . . . . . .106
Figure 4-30: BPV Search Range Positions at Slice Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
Figure 4-31: BP Mode ECG Structure, All Components (4:4:4) or
Luma Component Only (4:2:2 and 4:2:0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109
Figure 4-32: BP Mode ECG Structure, Chroma Components (4:2:2 and 4:2:0) . . . . . . . . . . . . . . . . . .109
Figure 4-33: Possible Partition Grids (16) for BP Partition Selection . . . . . . . . . . . . . . . . . . . . . . . . . .110
Figure 4-34: MPP Mode Prediction and Reconstruction Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
Figure 4-35: MPP Mode Bits per Pixel to bppIndex Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114
Figure 4-36: MPP Mode Midpoint Is Calculated from the Current Block’s
Reconstructed Spatial Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
Figure 4-37: MPP Mode Error Diffusion within a 2x2 Sub-block. . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
Figure 4-38: MPPF Mode Bits per Component
Is Defined in PPS Parameter mppf_bits_per_comp . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
Figure 4-39: Transform Mode Component ECG Construction Example . . . . . . . . . . . . . . . . . . . . . . .126
Figure 4-40: BP Mode Component ECG Construction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127
Figure 4-41: Entropy Encoding Flowchart to Select EC Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128
Figure 4-42: VEC Encoding Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .130
Figure 4-43: Substream Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146
Figure 4-44: Substream Multiplexing Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148
Figure 5-1: Substream De-multiplexer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150
Figure 5-2: Transform Mode Syntax Parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
Figure 5-3: BP Mode Syntax Parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
Figure 5-4: MPP Mode Syntax Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
Figure 5-5: MPPF Mode Syntax Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Figure 5-6: BP-SKIP Mode Syntax Parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Figure 5-7: Inverse Discrete Cosine Transform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
Figure 5-8: Butterfly Structure for 8-point Inverse Discrete Cosine Transform . . . . . . . . . . . . . . . . .160
Figure 5-9: Butterfly Structure for 4-point Inverse Discrete Cosine Transform . . . . . . . . . . . . . . . . .162
Preface
Intellectual Property
Copyright © 2018 Video Electronics Standards Association. All rights reserved.
While every precaution has been taken in the preparation of this Standard, the Video Electronics
Standards Association and its contributors assume no responsibility for errors or omissions
and make no warranties, expressed or implied, of functionality or suitability for any purpose.
Trademarks
VESA is a registered trademark of the Video Electronics Standards Association.
All other trademarks used within this document are the property of their respective owners.
Patents
VESA® draws attention to the fact that compliance with this Standard might involve the use
of a patent or other intellectual property right (collectively, “IPR”) concerning VESA Display
Compression-M (VDC-M). VESA takes no position concerning the evidence, validity, and/or
scope of this IPR. At the time of publication, there are currently no IPRs specific to VDC-M
to list in this Standard.
Attention is drawn to the possibility that some of the elements of this VESA Standard might
be the subject of IPRs external to this Standard. VESA shall not be held responsible for identifying
any or all such IPRs, and has made no inquiry into the possible existence of any such IPRs.
THIS STANDARD IS BEING OFFERED WITHOUT ANY WARRANTY WHATSOEVER,
AND IN PARTICULAR, ANY WARRANTY OF NON-INFRINGEMENT IS EXPRESSLY
DISCLAIMED. ANY IMPLEMENTATION OF THIS STANDARD SHALL BE MADE
ENTIRELY AT THE IMPLEMENTER’S OWN RISK, AND NEITHER VESA, NOR ANY
OF ITS MEMBERS OR SUBMITTERS, SHALL HAVE ANY LIABILITY WHATSOEVER
TO ANY IMPLEMENTER OR THIRD PARTY FOR ANY DAMAGES OF ANY NATURE
WHATSOEVER DIRECTLY OR INDIRECTLY ARISING FROM THE IMPLEMENTATION
OF THIS STANDARD.
Acknowledgments
This document would not have been possible without the efforts of the VESA Display
Compression-M Task Group. In particular, Table 1 lists the individuals and their companies
that contributed significant time and knowledge to this version of the Standard.
Revision History
1 Introduction (Informative)
1.1 Document Organization
This Standard is organized into the following sections and annexes:
• Section 1 – Introduction (Informative)
This section defines the high-level industry needs for VDC-M and the resulting technical
objectives that the remaining sections of this Standard are intended to satisfy. This section
also includes document conventions and references.
• Section 2 – Theory of Operation (Informative)
This section describes the codec operation from a high-level perspective.
• Section 3 – Syntax (Normative)
This section describes the VDC-M syntax at three levels – picture, substream, and entropy
coding group.
• Section 4 – Encoding Process (Normative)
This section describes the operations that shall be performed by a VDC-M-compliant encoder.
• Section 5 – Decoding Process (Normative)
This section describes the operations that shall be performed by a VDC-M-compliant decoder.
• Annex A – Rate Buffer Guidance (Normative)
This annex provides guidance for calculating rate buffer size-related PPS parameters.
• Annex B – Derivation of Parameters (Informative)
This annex provides the derivation of certain codec parameters.
• Annex C – Guidance for Rate Buffer Size and Delays in a Practical Implementation
(Informative)
This annex provides guidance for a practical implementation of the rate buffer, including delays
associated with the rate buffer and with substream multiplexing.
• Annex D – Main Contributor History (Previous Versions)
Figure 1-1 illustrates the encoder and decoder project high-level code flow.
Encoder Decoder
Within the code, the routines EncodeSlice and DecodeSlice are responsible for most encoding
and decoding operations. Each time either of these routines is executed, local storage is allocated
for a set of coding objects that execute the per-block coding and decoding processes. Each
coding object is implemented as a class with appropriate constructor, destructor, memory
management, etc. For example, Transform mode is implemented as class TransMode, which
is described in the source files TransMode.(cpp|h). An object of type TransMode is created
in EncodeSlice and DecodeSlice. The TransMode object is used to perform all Transform
mode operations.
Table 1-1 lists the set of coding objects that are used for the encoding/decoding process.
Each coding mode class (e.g., Transform, BP, MPP, etc.) is derived from the Mode class,
which provides operations common to all modes, such as performing color space transform and
calculating distortion (error between source and reconstructed samples). The Mode class provides
the virtual routines listed in Table 1-2, which are implemented in each of the derived coding
mode classes.
1.3.2 Notations
Table 1-5: Highlight and Camel-case Rules Used throughout this Standard
Example Description Example
rc_target_rate_scale PPS parameter. Lowercase, underscore delimited, red, bold.
VEC_MAPPING_TABLE C model note. Uppercase, underscore delimited, red, bold.
IntraPrediction C function / C class. Starts with uppercase, camel-case, blue, bold.
flatnessType C variable. Starts with lowercase, camel-case, blue.
dctShift8 C constant. Starts with lowercase, camel-case, red.
bpcTemp C temporary variable. Starts with lowercase, camel-case, black.
Matrices are indexed in row-major order with zero-based indexing. In Figure 1-2, an 8x2 block
is displayed with index (5, 1) highlighted.
Figure 1-2: Example VDC-M 8x2 Block with Index (5, 1) Highlighted
1.3.4 Glossary
Table 1-7 lists terms that are used throughout this Standard.
VESA Intellectual Property Rights (IPR) Policyb 200D March 27, 2017
Rate Bitstream
Substream
Flatness Detection Rate Control Entropy Coder Rate Buffer
Multiplexer
QP
Reconstructed
Y
Samples
Reconstructed
Source N
Source Buffer
Is RGB ?
QP
Y
Reconstructed
Samples
Reconstructed
Buffer
The encoder’s computational complexity is greater than that of the decoder due to process
advantages that are typically associated with the encoder. The encoder will make many
comparisons and decisions during the encoding process, the results of which will be signaled
explicitly to the decoder in the bitstream syntax. This allows for the decoder to be implemented
in a relatively smaller design.
2.2 CSC
The YCoCg-R lossless color space transform is used for RGB source data. From this point forward,
the term “YCoCg” is used to refer to YCoCg-R. Figure 2-3 illustrates the transform matrix for this
color-space conversion (CSC).
Forward Inverse
Y 1/4 1/2 1/4 R R 1 1/2 -1/2 Y
Co = 1 0 -1 G G = 1 0 1/2 Co
Cg -1/2 1 -1/2 B B 1 -1/2 -1/2 Cg
This representation requires the use of one extra bit of precision for the chrominance (chroma)
components (Co and Cg). The representation of samples within the Co and Cg components
is signed, in contrast to the unsigned source data. The following example illustrates the range
of samples after CSC for 8-bpc unsigned RGB source data:
• Y component – For eight unsigned bits, the sample range is [0, 255]
• Co component – For nine signed bits, the sample range is [-255, 255]
• Cg component – For nine signed bits, the sample range is [-255, 255]
Picture
Slice Slice
Block Block Block Block
Block ... Block ...
Slice
Block Block ...
Block ...
Slices are independent, and may be processed in parallel. In this case, slice multiplexing is used
to combine data from multiple slices into the bitstream. Each slice is composed of an integer
number of non-overlapping 8x2 pixel blocks.
The pixels within each block are defined by an unsigned triplet of values, each with a fixed
number of bits. For example, if the source content is 4:4:4 RGB at 10bpc, each pixel contains
a 10-bit unsigned sample for each component.
The following sub-sections describe blocks, slices, and pictures in further detail.
N N+1 ... 2N – 2 2N – 1
2N 2N + 1 ... 3N – 2 3N – 1
108 108
1080
1080
... ... ... ... ... ...
108
108
Figure 2-6: 1080p Frame with sliceHeight = 108, (Left) 2 Slices/line, (Right) 4 Slices/line
A distinction is made between the first and non-first blocklines of a slice (FBLS and NFBLS,
respectively) because only NFBLS blocks can use the previous reconstructed line for prediction.
For this reason, more bit rate will be assigned to FBLS blocks to compensate for the reduction
in valid predictors. (See Section 2.8 for further details.) In addition, because this codec uses an
8x2 block size, both raster lines that intersect the current block are denoted as the current blockline.
Therefore, the FBLS is the first blockline, as illustrated in Figure 2-7.
FBLS
sliceHeight
NFBLS
...
Figure 2-7: Example Slice Demonstrating Blocklines within FBLS and NFBLS
Coding performance is better for larger slices than smaller slices. At minimum, each slice should
contain approximately 30k pixels. For example, the following slice configurations meet the
minimum requirement for a 1920x1080 picture:
• Landscape orientation (1920x1080), 2 slices/line – Slice size 960x32
• Landscape orientation (1920x1080), 4 slices/line – Slice size 480x64
• Portrait orientation (1080x1920), 1 slice/line – 1080x32
A larger slice height can be specified to further improve performance. When possible, a sliceHeight
of 108 is recommended to maximize performance.
The horizontal and vertical padding steps are handled separately, as described in the sub-sections
that follow. Padding itself is accomplished by pixel replication:
• Horizontal padding – Last valid source column will be replicated
• Vertical padding – Last valid source row will be replicated
A B C D E F
... G H I J K L
1920 1920
Slice
1920 1920
Slice Slice
If the source width is not divisible by (8 × slicesPerLine), horizontal padding will be identically
added to each slice, as illustrated in Figure 2-10 for a slice of width 1000:
• If slicesPerLine = 2, each initial slice width is:
1000
= 500
2
8 = 504
500
8×
8 = 256
250
8×
1000 1008
Slice Slice
4 4
1000 1024
6 6 6 6
2.5 Quantization
This codec uses quantization to achieve the fixed-rate compression goal. The quantization is
controlled by a quantization parameter (QP) that is maintained by the rate control algorithm at both
the encoder and decoder. The rate control algorithm will typically lower the QP for “easy” or “flat”
content, and increase the QP for “difficult” or “complex” content. (For further details regarding QP,
see Section 4.5.3.) This codec uses two different quantization methods, as described in
Section 4.8.1 and Section 4.8.2.
For each mode, the encoder determines a rate (i.e., the total of all syntax bits needed to transmit
the block using the coding mode) and a distortion. The rate and distortion are used to calculate an
RD cost. Next, the encoder selects the mode, which minimizes the RD cost subject to rate control
constraints that ensure the rate buffer does not underflow or overflow. The selected mode is
transmitted in the bitstream, and the encoder repeats the process for the next block.
Each coding mode is described in the subsections that follow.
1x ( FBL S)
4x ( NFBLS )
Curren t Block
Calculate Inve rse Inve rse Calculate
Transfo rm Qua ntization
Residual Qua ntization Transfo rm RD Cost
Calculate
Distortion
modeRdCost
Sele ct Intra Pred ictor
with Minimum RD Co st
The intra predictor will use a fixed intra mode for all FBLS blocks. For NFBLS blocks, eight intra
modes are available. In this case, the intra prediction block will calculate all eight intra predictors,
and then calculate the sum of absolute differences (SAD) between each intra predictor and the
original block. The four intra modes with the lowest distortion will be selected and tested during
the remainder of the mode operation.
For a given intra predictor, the residual block R(i, j) is calculated. This is the difference between
the source block X(i, j) and the intra predicted block P(i, j). The discrete cosine transform (DCT)
is applied to the residual block, resulting in a block of transform coefficients T(i, j). The transform
coefficients are then quantized to produce a quantized transform coefficient block Tq(i, j). These
quantized transform coefficients are transmitted in the bitstream, embedded in entropy coding
groups (ECGs). The size of these ECGs determines the rate for the transform block. Finally, inverse
quantization T (i, j) = Q-1[Tq(i, j)] and inverse transform R (i, j) = DCT-1[ T (i, j) ] are applied such
that the distortion (SAD) can be calculated between the residual and reconstructed residual blocks
(R(i, j) and R (i, j), respectively). The RD cost information is calculated from the rate and distortion.
Figure 2-12 illustrates the overall flow of Transform mode from the decoder side. The intra
prediction mode is parsed from the bitstream.
Calculate
Intra Pr edicto r
2.6.2 BP Mode
In Block Prediction (BP) mode, the current block is spatially predicted from a set of reconstructed
neighboring samples, referred to as the “Block Prediction Vector (BPV) search range.” The BPV
search range consists of 64 valid positions. The encoder is responsible for testing all BPV search
range positions for each sub-block and partition type to find the best match.
Before prediction, the current block is partitioned into four 2x2 sub-blocks. Figure 2-13 illustrates
the BP encoding steps.
4x 16x
BPV Search
Range
Quantized Residual Calculate Ent ropy
BPV Coding Rate
Reconstructed
Residual Calculate
Calculate RD Cost Select Minimum
32x (FBLS) Residual Distortion
64x (NFBLS)
Prediction
Source Block
Reconstruction
Each sub-block is predicted from the BPV search range using either a 2x2 or 2x1 partition. In the
former case, a single block prediction vector (BPV) from the search range is used to generate
the prediction for the 2x2 sub-block. In the case that the 2x1 partition is selected, the sub-block
will be represented by two different BPVs. The first BPV generates a 2x1 predicted block for the
upper two samples within the sub-block, while the second BPV generates a 2x1 predicted block
for the lower two samples. Figure 2-14 illustrates an example of this for the first two sub-blocks
of a given NFBLS block, where P represents a partition and SB represents a sub-block. The top
illustration shows the 2x2 predicted partitions PA0 and PA1. The bottom illustration shows the
2x1 predicted partitions PB0 through PB3.
3$
3$
3%
Figure 2-14: Example Result of BPV Search for the First Two Sub-blocks
of a Block that Are Not within the Slice’s First Blockline
The encoder performs a search to find the BPV which minimizes the prediction residuals for
all 2x2 and 2x1 partitions for each 2x2 sub-block of the current block. The search operation is
independent between the two partition types, and between the four sub-blocks within the current
block. Ultimately, the encoder will independently select a partition type for each of the sub-blocks.
The BPV search result is a set of BPVs and a predicted block P(i, j) for each partition type
for each sub-block. Next, the residual is calculated as R(i, j) = X(i, j) – P(i, j). Because there
are two options for the partition type, two residual sub-blocks are calculated, as follows:
1 One that is associated with the 2x2 partitions
2 One that is associated with the 2x1 partitions
The following steps are performed in parallel for each residual block:
1 Forward quantization is performed on all residual samples, and the quantized residuals Rq(i, j)
are used to calculate the entropy coding cost of each 2x2 sub-block.
2 Inverse quantization is performed to obtain the reconstructed residuals R(i, j) from which each
sub-block’s distortion is calculated.
3 For each 2x2 sub-block, the encoder selects between the 2x2 and 2x1 partitions, based on the
rate/distortion trade-off.
For RGB source content, BP is calculated in the YCoCg color space. If the source content
is YCbCr, BP will be calculated natively in the YCbCr color space.
BP operation from the decoder side is less complicated than BP operation from the encoder side,
as illustrated in Figure 2-15. The quantized residuals are obtained from the bitstream through the
entropy decoder, while the BPV values and partition structure are parsed directly. The BPV search
range is identical between the encoder and decoder, and consists of reconstructed samples that are
causally available.
The partition structure and the BPVs are used to generate the predicted block P(i, j), while the
quantized residuals are inverse quantized to obtain the reconstructed residuals R (i, j) . Finally,
these two are added together to generate the reconstructed block, which is subject to CSC
if necessary.
Calculate
Par se BPV s
Pre dicted
2x (RGB)
1x (YCbCr)
Current Bloc k
Calculate Inverse Calculate
Quantizat ion
Residual Quantization Dist ortion
Figure 2-17 illustrates the decoder operation for MPP mode. The quantized residuals are parsed
directly from the bitstream, and then inverse quantized using a scalar quantizer. These values are
then added to the midpoint predictor, and CSC is performed if required.
These fallback modes are selected only when all three non-fallback modes are too expensive to be
selected by the rate control mechanism. This may occur, for example, if an entire slice is complex
(e.g., white noise is present in all three color components). The combination of rate control and the
design of the encoder mode selection ensures that buffer overflow does not occur when a fallback
mode is selected. Furthermore, the fallback modes have rates that are strictly less than the average
block rate, which ensures that the rate buffer will not overflow.
Both fallback modes are discussed in the sub-sections that follow.
If none of these flatness types are detected, the current block will not have flatness information.
Rate control uses the flatness type to update the QP value. For example, if the current block is
detected as a complex-to-flat transition (see Figure 2-18), the QP should be immediately decreased
to avoid creating artifacts within the flat portion of the block. Likewise, the QP can be increased for
a flat-to-complex transition.
The flatness type is signaled explicitly in the bitstream as part of the mode header so that the
decoder does not need to perform the flatness detection steps.
2.8 RC
The rate control (RC) algorithm is responsible for determining the QP for each block time. The
QP is calculated implicitly and identically by both the encoder and decoder based on the RC state
(which must match). Because of this, the QP is not signaled in the bitstream syntax.
In general, RC aims to set a low QP value for easy/flat content and a high QP value for difficult/
complex content. This procedure will maximize image quality, because artifacts are most apparent
in flat regions, while complex regions provide visual masking, making the artifacts less perceptible.
The QP value for each block is determined by the following factors:
• Previous block QP
• Target bit rate
• Number of bits used to code the previous block
• Flatness information
The RC model incorporates a buffer model (the rate buffer) that is present in both the encoder
and decoder. From the encoder perspective, bits are placed into the rate buffer from the substream
multiplexer balance FIFOs. (See Section 2.10.) For constant bit rate (CBR) codec operation, bits
are removed from the rate buffer at a constant rate and placed into the bitstream for transmission.
The encoder must ensure that this rate buffer never underflows or overflows. From the decoder
perspective, bits enter the rate buffer from the bitstream at a constant rate and feed the substream
de-multiplexer’s funnel shifters. (See Section 2.10. For further details regarding the rate buffer
and associated delays, see Annex A.)
Figure 2-19 illustrates the RC model flow. This model is used by both the encoder and decoder
to implicitly derive the QP used by the block. Flatness information, calculated by the encoder,
is signaled explicitly in the bitstream so that both the encoder and decoder use the same flatness
type information for operating the RC model.
Target Rate
Previous QP
QP
Update QP Encoder Mode Selection Encode Best Mode
Lambda
Calculate
Test All Modes
BF Lambda
In addition to setting the QP, the RC model is responsible for ensuring that the rate buffer never
underflows or overflows. During each block time, a fixed number of bits are removed from the rate
buffer, while a variable number of bits are added depending on the image content and the mode
selected. The rate controller has both long- and short-term mechanisms that are used to prevent
rate buffer underflow and overflow:
• QP is a long-term mechanism that is maintained by RC. The QP is updated during each
block time. To avoid overflow, a high QP is selected if the rate buffer is almost full.
To avoid underflow, a low QP is selected if the rate buffer is almost empty.
• underflowPrevention mode is a short-term mechanism that is used to prevent rate buffer
underflow. This mode is enabled if the rate buffer is sufficiently close to being empty,
and will pad bits into the syntax to ensure that Transform and BP modes have rates that are
strictly greater than avgBlockBits. This will ensure that the rate buffer cannot underflow.
• Presence of the MPPF and BP-SKIP fallback modes is a short-term mechanism that is used to
prevent rate buffer overflow. By design, these modes have a rate that is less than avgBlockBits,
thereby ensuring that the rate buffer will not overflow if they are selected.
The lambda in Figure 2-19 is a Lagrangian parameter that is associated with the RD cost
calculation. (For further details, see Section 4.5.4.1.)
One final step performed by RC is to ensure a fixed number of bits for each block within a slice
(minBlockBits), which is determined by the compressed bit rate. The encoder mode selection will
disable any mode for the current block that causes the available rate per block for the remainder
of the slice to fall below minBlockBits.
The other three modes use the following mechanisms for signaling quantized residuals:
• MPP and MPPF modes – All quantized residuals are coded with fixed-length codes
• BP-SKIP mode – No quantized residuals are present in the syntax
In the entropy coder, groups of samples are combined and then transmitted using a common prefix
to allow for efficient decoding. An entropy-coded group of samples is referred to as an “entropy
coding group” (ECG), and each component will be represented by one or more ECGs. Table 2-2
lists the ECGs that are used by Transform and BP modes.
Values within the group are either in the sign/magnitude (SM) or 2’s complement (2C) bit
representation. For a given ECG, the maximum number of bits for all samples within the group
is determined (bitsReq). Note that for SM bit representation, the sign bits will also be signaled
in the bitstream for any sample that has a non-zero value.
For example, suppose that a group of four samples is provided, such as [-8, 0, 14, 5]. Table 2-3 lists
the number of bits that are required for each of those samples. The required bitsReq for the group is
equal to the maximum number of bits required for all the samples within the group. In this example,
bitsReq is 4 bits in the SM bit representation and 5 bits in the 2C bit representation.
Table 2-3: Bits Required for Example Group of Samples that Use
SM and 2C Bit Representations
Sample Number of Bits Required Number of Bits Required
(SM) (2C)
-8 4 4
0 1 1
14 4 5
5 3 4
The entropy coder determines the ECGs for each component. The size of each ECG will depend
on the mode, color component, and chroma sampling format. For example, BP mode uses a
uniform ECG for each component, as follows:
• For 4:4:4, the 16 samples in each color component will be distributed among four ECGs
of four samples each.
• For 4:2:2, the luma component (16 samples) will be distributed among four ECGs
of four samples each. The chroma components (eight samples) will be distributed
among two ECGs of four samples each.
• For 4:2:0, the luma component (16 samples) will be distributed among four ECGs
of four samples each. The chroma components (four samples) will be distributed among
one ECG each.
The distribution of samples within ECGs is non-uniform for Transform mode due to the varying
frequency content that is represented by each transform coefficient. (See Section 4.6.1.5.)
where:
• A tuple (sx, cy) provides the slice and chunk indices, respectively
• N provides the number of chunks per slice
frame_width
The SSM multiplexer at the encoder includes a model of the de-multiplexer such that each mux
word is placed into the bitstream in the correct order.
An additional delay of one block time exists between Substream 0 and the other three substreams.
This is referred to as ssmSkew, and is done such that the decoder will receive mode-specific
information one block time before the data for each block.
3 Syntax (Normative)
This section describes the VDC-M syntax at three levels – picture, substream, and entropy
coding group.
3.1 PPS
C source code shall always be trusted when in conflict with the content of this Standard.
The picture parameter set (PPS) has a total byte length of 128 bytes. This PPS shall be transmitted
from the encoder to the decoder, such that both can produce the same model state. The mechanism
for transmitting the PPS is not normative, and may for example be outside the link.
Table 3-1 lists the syntax and size of each PPS field. RESERVED fields shall be given an integer
number that represents the total number of bits, without a designation of the type, and filled with 0s.
Any field listed as “Nu” is an unsigned N-bit integer. For example, the valid range for 7u is [0, 127].
Finally, the version_release field is a one-byte ASCII character within a specified range,
as follows:
• 0x00 – Null (e.g., v1.0 as opposed to v1.0a)
• 0x61 through 0x7a – Lowercase characters (a through z)
Table 3-1 also lists the address of each PPS field in terms of byte and doubleword addresses
(Bx and DWx, respectively). A doubleword contains four bytes; therefore, the total PPS size of
128 bytes corresponds to 32 doublewords. The bytes within a doubleword shall be in big-endian
order (i.e., the first byte in the bitstream shall be DW0[31:24] – the most significant byte).
For a PPS field that is greater than eight bits in size, the value shall be in big-endian order.
For example:
chunk_size = (B22 × 256) + B23
Parameters listed are either independent or dependent. Independent parameters include properties
of the source picture, encoder/decoder settings, LUTs, and tuning. These must be determined first.
Dependent parameters are configured based on the independent parameters. For example, many
of the PPS fields that relate to rate control are configured as a function of the slice size, compressed
bit rate, etc.
where:
• bppFractionalBits = 4
A chunk contains one line worth of data, and must be an integer number of bytes. Rounding of bits
up to the nearest byte is accomplished by ( (bits + 7) >> 3).
Table 3-2 defines the chunk order within the bitstream. Note that each slice will contain two chunks
for each blockline.
Table 3-2: Frame-level Syntax Fields (Constant Bit Rate Mode)
Field Size (Bits) Format
for (slice_y = 0; slice_y < frame_height; slice_y += slice_height) {
for (blockline = 0; blockline < (slice_height / 2), blockline ++) {
for (slice_x = 0; slice_x < frame_width; slice_x += slice_width) {
first chunk from blockline of slice (slice_y, slice_x) chunk_size See Section 3.3
}
for (slice_x = 0; slice_x < frame_width; slice_x += slice_width) {
second chunk from blockline of slice (slice_y, slice_x) chunk_size See Section 3.3
}
}
}
4.1 Overview
C source code shall always be trusted when in conflict with the content of this Standard.
The encoder shall perform the encoding process, as illustrated in Figure 4-1, once per block time
within a slice, as follows:
1 Flatness detection shall be updated, based on source data from the current, previous, and next
blocks. The result of flatness detection is an updated flatness type for the current block.
2 Rate controller’s state shall be updated, which shall result in an updated buffer fullness value,
quantization parameter (QP), and target bit rate (targetRate).
3 Using the updated QP value, all available coding modes shall be tested. In the context of this
Standard, testing refers to the encoder’s simulation of the mode in which the reconstructed
block is calculated.
4 Number of required bits (modeRate) and error (modeDistortion) shall be calculated.
Additionally, the RD cost (modeRdCost) shall be computed from the rate and distortion,
as described in Section 4.9.
5 Encoder mode selection block shall select the best mode, based on several criteria.
(See Section 4.9.)
6 Selected mode shall be encoded.
7 Steps 1 through 6 shall be repeated for the next block.
Current Block
Flatness
Detection
Transform modeRate/modeDistortion/modeRdCost
Update
Rate Control State
Block Prediction modeRate/modeDistortion/modeRdCost
Fallback #1
modeRate/modeDistortion/modeRdCost
Midpoint Prediction Fallback
Fallback #2
modeRate/modeDistortion/modeRdCost
Block Prediction Skip
4:4:4
4:2:2
4:2:0
Figure 4-2: Block Component Sizes for Different Chroma Sampling Formats
Table 4-1 defines variables that are used throughout this Standard to describe operations without
having to create special cases for the different chroma sampling formats.
Table 4-1: Component Width and Height Parameters for Different Chroma Sampling Formats
Component Size compSamples compSamplesLog2 compWidthLog2 compHeightLog2
8x2 16 4 3 1
4x2 8 3 2 1
4x1 4 2 2 0
For example, the average of all samples in a component could be written as follows:
X = ( (∑i, j ∈ X X(i, j) ) + (1 << (compSamplesLog2 – 1) ) ) >> compSamplesLog2
Bracket notation is used to obtain the value from Table 4-1 for a specific component. The variable k
is typically used to index between the three color components. For example, k == 0 will index
the red component in RGB, or the luma component in YCbCr/YCoCg. Thus, the three color
components of a block that uses 4:2:2 chroma sampling are determined, as follows:
• compSamples[0] = 16
• compSamples[1] = 8
• compSamples[2] = 8
4.3 CSC
A lossless color-space conversion (CSC) is used to convert source content in the RGB color space
to YCoCg. The YCoCg representation requires the use of one extra bit of precision for the
chrominance (chroma) components (Co and Cg). The bit representation of the chroma components
shall be signed. Therefore, if the source content is RGB 8bpc, the luminance (luma) component
of YCoCg will be represented by eight unsigned bits, while the Co and Cg components will be
represented by nine signed bits. Table 4-2 lists implementations of forward and inverse transforms.
In the case of RGB source content, CSC is used by flatness detection as well as by each individual
coding mode. If the source content is YCbCr, this step is skipped, and the codec natively handles
the YCbCr data. The reconstructed frame shall be stored in the same color space as the
source content.
The flatness type is signaled explicitly in the bitstream as part of the flatness header. (See
Section 4.10.) For this reason, flatness detection shall not be performed at the decoder side.
bpcInput = {
bpc,
bpc + 1, component is Co or Cg
otherwise
bpcTemp = hadPrecision + compWidthLog2 + bpcInput
hadShiftA = max (0, bpcTemp – 16)
bpcTemp = min (bpcTemp, 16)
hadShiftB = compHeightLog2 + bpcTemp – 16
hadTotalShift = hadShiftA + hadShiftB
Figure 4-3 illustrates the 8-point forward Hadamard transform. This transform shall be applied
to the rows of each component that has a width of eight samples. This shall be the case for all
components if 4:4:4 chroma subsampling is used. For 4:2:2 and 4:2:0 content, the 8-point transform
shall be applied to the luma component, while the 4-point transform illustrated in Figure 4-4 shall
be applied to the two chroma components.
Symbol Weight
+1
-1
X0 X0 S0 E0 T0 T0
X1 X2 S1 E1 T6 T1
X2 X1 S2 F0 T5 T2
X3 X3 S3 F1 T3 T3
X4 X7 D0 E2 T4 T4
X5 X5 D2 E3 T2 T5
X6 X6 D1 F2 T1 T6
X7 X4 D3 F3 T7 T7
Symbol Weight
+1
-1
X0 S0 T0
X1 S1 T1
X2 D0 T2
X3 D1 T3
The second transform pass is a Haar transform (equivalent to a 2-point forward Hadamard
transform), which is applied to the block’s columns. This step shall be skipped for the chroma
components of 4:2:0 source data. Finally, the Hadamard transform coefficients are normalized,
as described in Table 4-5.
4.4.2 Complexity
The complexity value of each block is calculated from the sum of the absolute value of the
normalized Hadamard transform coefficients. This sum shall exclude the DC coefficient
(HAD(0, 0)). If the input bit depth is greater than 8bpc, an additional scaling is performed.
(See Table 4-6.)
4.4.3 Classification
The current block’s flatness is classified into different types, based on the previous, current,
and next blocks’ complexity values. These are denoted as complexityPrev, complexityCur,
complexityNext. The classification is performed as follows:
1 isPrevBlockComplex and isNextBlockFlat values are calculated, as described in Table 4-7
and Table 4-8. Parameter maxLineComplexity represents the maximum complexity of
a block within the current blockline, which is updated during each block time, using the
following logic:
complexityCur, first block in blockline
{
maxLineComplexity = max (maxLineComplexity, complexityCur), otherwise
The maxLineComplexity calculation is done such that the block complexity thresholds can
adapt to changing content throughout a slice.
isPrevBlockComplex = {
false,
complexityPrev > 90, complexityCur ≤ 50
otherwise
} else if (last block in blockline) {
isPrevBlockComplex = (complexityCur ≤ 50)
} else {
if (chroma format is 444 or 422) {
threshold = (maxLineComplexity >> 1) + (maxLineComplexity >> 3)
{
true, complexityNext ≤ 6
complexityPrev > threshold, complexityNext ≤ 25
isPrevBlockComplex =
false, otherwise
} else if (chroma format is 420) {
isPrevBlockComplex = { true,
false,
complexityNext ≤ 20 && complexityPrev > 40
otherwise
}
}
a. complexityPrev is 0 for the first block within a slice.
isNextBlockFlat = { true,
false,
conditionA && conditionB
otherwise
}
4.5 RC
The rate control algorithm shall perform a sequence of operations for each block time
to ensure the following:
• Number of bits that are used to code a slice is bounded, as follows:
• sliceBits <= (8 × chunk_size × slice_height)
• Rate buffer does not underflow or overflow
• At least one coding mode is available for mode selection, for each block
• Number of bits remaining in the rate buffer at the end of a slice is less than a specified threshold
This Standard discusses the ideal rate buffer. A hardware implementation may require an increased
rate buffer size, however, for proper operation.
At the beginning of each block time, the rate control algorithm shall update several quantities that
will be used by the remainder of operations during that block time, as follows:
1 Rate buffer fullness (bufferFullness, rcFullness) is updated, based on the number of bits that
are used to code the previous block, as well as the number of bits transmitted to the bitstream.
(For further details, see Section 4.5.1.)
2 targetRate for the current block time is calculated, as described in Section 4.5.2.
3 QP value (qp) is updated. (See Section 4.5.3.) This QP will be used by each of the coding
modes during the mode testing phase.
After all modes have been tested, and an RD cost has been calculated for each mode, the encoder
shall select a mode from the available coding modes that minimizes the RD cost while enforcing
proper rate buffer operation. Any mode that violates the rate buffer constraints will be disabled
for the current block time. Finally, the mode selected by rate control shall be encoded, and the
corresponding rate (modeRate) shall be used to update the buffer fullness, target rate, and QP for
the next block time.
Figure 4-5 illustrates the loop that is used for updating the buffer fullness, target rate, and QP.
Each of these loops runs once per block period. The flatness detection algorithm (see Section 4.4)
provides a flatnessType to the QP update logic.
bufferFullness,
rcFullness
Update Target Rate
flatnessType targetRate
Update QP
Encode Best Mode
modeRate
qp
4.5.1 Rate BF
The rate controller shall maintain two measures of rate buffer fullness (BF), as described
in Table 4-10.
4.5.1.1 BF
The physical rate buffer’s fullness (bufferFullness) shall be updated at the beginning of each block
time. First, bufferFullness shall be incremented by the number of bits that are used to code the
previous block (modeRate). Next, bits shall be removed from the rate buffer for any block time
after the initial transmission delay, which is determined by PPS parameter rc_init_tx_delay
(measured in block times).
After the initial transmission delay, avgBlockBits bits shall be removed from the rate buffer,
and then placed into the bitstream for each block time. The parameter avgBlockBits gives a block’s
average bit rate, which is calculated from PPS parameter bits_per_pixel, as follows:
avgBlockBits = bits_per_pixel
Because PPS parameter bits_per_pixel is represented with four fractional bits of precision,
and there are 16 pixels in an 8x2 block, avgBlockBits shall always have an integer value.
For the last block time during which a specific chunk is being filled, additional bits may be
removed from the rate buffer and placed into the chunk to byte-align the chunk. The total
number of alignment bits for one blockline (two chunks) shall be calculated, as follows:
chunk_adj_bits = (16 × chunk_size) –
( ( (slice_width << 1) × bits_per_pixel) >> bppFractionalBits)
where blChunkAdjBits has a minimum and maximum value of 0 and 15 bits, respectively.
For an individual chunk, the total adjustment bits will be calculated, as follows:
(blChunkAdjBits + 1) >> 1, even chunk (0, 2, 4, …)
chunkAdjBits = { blChunkAdjBits >> 1, odd chunk (1, 3, 5, …)
where:
• chunkAdjBitsMax = (blChunkAdjBits + 1) >> 1
If the underflowPrevention flag is set, the two mechanisms described in Table 4-11 shall be used
to ensure that modeRate is strictly greater than avgBlockBits for the current block time for all
non-fallback modes, which is sufficient to ensure that at least one coding mode is available to the
encoder mode-selection algorithm. Note that fallback modes shall not be used in this situation
because the fallback mode rates will fall below avgBlockBits.
4.5.1.2 RC Fullness
The rate controller maintains a second representation of the rate buffer fullness (rcFullness) to
enforce a maximum fullness of the encoder rate buffer at the end of a slice. rcFullness ensures that
an overflow will not occur during the initial transmission delay of the following slice. rcFullness is
represented as an unsigned 16-bit integer, and shall be updated at the beginning of each block time,
using the updated bufferFullness value. Table 4-12 lists the constants that shall be used for
calculating rcFullness.
The rcFullness calculation involves a scale factor rc_fullness_scale, which is pre-calculated and
stored in the PPS, as follows:
1 << (rcFullnessScaleApproxBits + rcFullnessRangeBits)
rc_fullness_scale = rc_buffer_max_size
The rc_fullness_scale value shall be represented by eight unsigned bits.
The rcFullness calculation includes two offset parameters (rcOffsetInit and rcOffset), which depend
on the current block’s position within the slice.
Offset parameter rcOffsetInit is used for blocks that occur during the initial transmission delay at
the beginning of a slice. This is to compensate for the fact that during the initial transmission delay,
bits will enter the rate buffer but will not be transmitted to the bitstream. The initial rcOffsetInit
value is calculated, as follows:
rcOffsetInit = rc_init_tx_delay × avgBlockBits
The rcOffsetInit value shall decrease by avgBlockBits for each block time, until rcOffsetInit is 0.
At this point, rcOffsetInit shall be 0 for all remaining blocks within the slice. This behavior is
illustrated in Figure 4-6.
%XIIHU5DQJH
UF2IIVHW,QLW
UFBEXIIHUBLQLWBVL]H
%ORFN7LPH
7 7[ 71
(IIHFWLYH%XIIHU5DQJH
2IIVHW
where:
• T0 is the first block time within the slice
• TN is the last block time within the slice
• Tx is the block time at the end of the initial transmission delay
Offset parameter rcOffset is an increasing positive offset that is applied throughout the slice, as
illustrated in Figure 4-7 and described in Table 4-13. rcOffset is used to ramp down the effective
rate buffer size so that the rate buffer fullness at the end of the slice is less than rc_buffer_init_size,
which is necessary to ensure correct rate buffer behavior from one slice to the next. The rcOffset
value is 0 for all block times within a slice for which the block index is less than a threshold
rcOffsetStart (indicated as Ty in Figure 4-7). rcOffsetStart is derived from PPS parameter
rc_fullness_offset_threshold, which is the number of blocklines over which rcOffset will ramp up.
The relationship is determined, as follows:
rcOffsetStart = blocksPerSlice – (rc_fullness_offset_threshold × blocksPerBlockLine)
UF2IIVHW
UFBEXIIHUBPD[BVL]H±
UFBEXIIHUBLQLWBVL]H
%ORFN7LPH
7 7\ 71
(IIHFWLYH%XIIHU5DQJH
2IIVHW
where:
• T0 is the first block time within the slice
• TN is the last block time within the slice
• Ty is the block time in which rcOffset starts to ramp up from 0, which is controlled
by PPS parameter rc_fullness_offset_threshold
The rate at which rcOffset increases is also pre-calculated and stored in the PPS (using 16 unsigned
bits), as follows:
(rc_buffer_max_size – rc_buffer_init_size) << rcOffsetBits
rc_fullness_offset_slope = rc_fullness_offset_threshold × blocksPerBlockLine
rcFullness is then calculated from bufferFullness, rcOffset, and rcOffsetInit, as described
in Table 4-14. This is done once per block time, as illustrated earlier in Figure 4-5.
Parameter targetRateBase approximates the average bit rate for all remaining blocks within a slice,
and is calculated from the number of bits and pixels that remain within the slice at the current
block time. The rate controller maintains a scaling factor targetRateScale, which is updated
per block time, as described in Table 4-15.
The targetRateScale value for the first block time within a slice is equal to PPS parameter
rc_target_rate_scale, which is calculated as follows:
Offset targetRateDeltaFbls applies to blocks within the first blockline of a slice and is calculated,
as follows:
16 × rc_target_rate_extra_fbls, FBLS block
targetRateDeltaFbls = { 0, NFBLS block
4.5.3 QP Update
The QP value shall be updated at the beginning of each block time, after targetRate is updated.
QP is used in the testing phase for all coding modes. Figure 4-8 illustrates the QP update logic.
rcFullness
deltaQp
prevBlockRate
diffBits qpIndex prevQp
Calculate diffBits Calculate qpIndex +
flatnessType
targetRate
tempQp
maxQpLut
finalQp
minQp Clip Flatness QP Adjustment
qpIndex qpIndex
0 1 2 3 4 5 0 1 2 3 4
(4:4:4) abs(diffBits) (4:4:4) abs(diffBits)
0 10 29 50 60 70 0 10 20 35 65
qpIndex qpIndex
0 1 2 3 4 5 0 1 2 3 4
(4:2:2) abs(diffBits) (4:2:2) abs(diffBits)
0 9 26 45 54 63 0 9 18 31 58
qpIndex qpIndex
0 1 2 3 4 5 0 1 2 3 4
(4:2:0) abs(diffBits) (4:2:0) abs(diffBits)
0 8 23 40 48 55 0 8 16 28 51
3 Current rate buffer fullness state will be mapped to a qpUpdateMode value, using the mapping
illustrated in Figure 4-10.
qpUpdateMode 4 3 0 1 2
rcFullness
0 7864 15729 49807 57672 65535
5 QP for the current block is calculated as the sum of the previous block time QP (prevQp)
and deltaQp, clipped between a minimum and maximum QP value, as follows:
qp = clip (minQp + minQpOffset, maxQp, prevQp +deltaQp)
where:
• minQp is the minimum allowable QP, which is calculated as a function of bit depth,
as follows:
• 8-bpc source content – minQp is 16
• 10-bpc source content – minQp is 0
• 12-bpc source content – minQp is -16
• minQpOffset – See Table 4-19
• maxQp – See Table 4-20
• This maxQp value overrides the global maximum QP value of 72;
thus, it can also be considered to be an offset from the global maximum QP
6 QP will be further updated, based on the current block’s flatness classification, as described
in Table 4-21 through Table 4-24. (For further details regarding flatness detection, see
Section 4.4.) The set of flatness types and the effect of QP update for each are discussed in
the sub-sections that follow. Note that this is the final QP value as maintained by the rate
controller. The quantizer used by each coding mode shall operate on a modified QP value,
which is derived from this rate control QP, while also factoring in the bit depth. (See
Section 4.8 for further details.)
lambdaFullness is calculated once at the beginning of each block time, using the updated
rcFullness value. Because the RD cost calculation is needed only for encoder mode selection,
the lambdaFullness calculation will not be performed during decoding.
The pre-defined array lambdaFullnessLut is stored by the encoder and has 16 entries. Interpolation
is used to approximate a 6-bit look-up from the stored 4-bit LUT, as described in Table 4-26.
The first step reduces the 16-bit scale of rcFullness down to six bits.
lambdaBitrate is calculated for each coding mode during a block time, based on the modeRate
determined during the testing phase. Because the RD cost calculation is needed only for
encoder mode selection, the lambdaBitrate calculation will not be performed during decoding.
The pre-defined array lambdaBitrateLut is stored by the encoder and includes 16 entries.
Interpolation is used to approximate a 6-bit look-up from the stored 16 entries, as described in
Table 4-27. A scaling factor (lambdaBitrateScale) is applied to modeRate, which is calculated
from the worst-case block rate (maxBlockRate), as follows:
2
maxBlockRate = maxHeaderRate + k=0
(compSamples[k] × compBitDepth[k])
where:
• maxHeaderRate is 10 bits for 8bpc content, and 11 bits otherwise
• compBitDepth[k] is in the RGB/YCbCr color space
Note: Quantities maxBlockRate and lambdaBitrateScale shall be constant for a given PPS.
The bit rate used for lambda approximates the bit-rate ratio between the compressed and
uncompressed blocks (modeRate and maxBlockRate, respectively). The normalization factor
of maxBlockRate is incorporated in lambdaBitrateScale.
After testing, the encoder will select a mode for the current block that minimizes modeRdCost,
in addition to satisfying a set of additional constraints. Encoder mode selection is described in
Section 4.9.
A coding mode’s total distortion (modeDistortion) is determined from each component’s SAD,
as listed in Table 4-28. Here, rdoWeight is a rate-distortion optimization (RDO) weight, which is
further described in Annex B.
Reconstructed Neighbors
Source Block Y
FBLS ? Calculate intraFbls
1x (FBLS)
4x (NFBLS)
Source Block
Calculate Inverse Inverse
Transform Quantization
Residual Quantization Transform
Calculate Calculate
Distortion RD Cost
Select Intra
Calculate
Predictor with
Reconstructed
Minimum
Block
RD Cost
Figure 4-12 illustrates the Transform mode prediction and reconstruction buffers.
Source Buffer (RGB) CSC Current Block Source Buffer (YCbCr) Current Block
Reconstructed Reconstructed
CSC
In the first stage of Transform mode, the encoder tests all intra predictors in intraListA to determine
the SAD between the source block and predicted block. The four intra predictors that generate the
smallest SAD are added to intraListB. If there is a tie in SAD, the encoder shall select the intra
predictors that have the smaller index. This step can be skipped for FBLS blocks, where the only
valid intra predictor is intraFbls, as indicated by Table 4-31.
Table 4-31: Intra Prediction Mode for FBLS Blocks
intraFbls
P(i, j) = { 0,
1 << (bitDepth[k] – 1),
Co or Cg chroma component
otherwise
For intra prediction, the SAD is performed in the YCoCg color space for RGB input; otherwise,
the SAD is performed in the native color space. The SAD is the direct sum of the SAD for each
component. For example, in the YCoCg color space:
SAD = SADY + SADCo + SADCg
Figure 4-13 illustrates the intra prediction modes for NFBLS blocks. Table 4-32 describes the
values of each intra predictor’s predicted block.
DC Vertical (V)
A-4 A-3 A-2 A-1 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A-4 A-3 A-2 A-1 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11
DC DC DC DC DC DC DC DC
DC DC DC DC DC DC DC DC
Table 4-32: Intra Prediction Modes for NFBLS Blocks (4:4:4, Luma Component)
intraDc intraVert
P(i, j) = Ai
DC = (
7
Ai) >> 3
i=0
P(i, j) = DC
intraDiagLeft intraDiagRight
P(0, 0) = (A0 + (A1 << 1) + A2 + 2) >> 2 P(0, 1) = (A-3 + (A-2 << 1) + A-1 + 2) >> 2
P(1, 0) = P(0, 1) = (A1 + (A2 << 1) + A3 + 2) >> 2 P(0, 0) = P(1, 1) = (A-2 + (A-1 << 1) + A0 + 2) >> 2
P(2, 0) = P(1, 1) = (A2 + (A3 << 1) + A4 + 2) >> 2 P(1, 0) = P(2, 1) = (A-1 + (A0 << 1) + A1 + 2) >> 2
… …
P(7, 0) = P(6, 1) = (A7 + (A8 << 1) + A9 + 2) >> 2 P(6, 0) = P(7, 1) = (A4 + (A5 << 1) + A6 + 2) >> 2
P(7, 1) = (A8 + (A9 << 1) + A10 + 2) >> 2 P(7, 0) = (A5 + (A6 << 1) + A7 + 2) >> 2
intraVertLeft intraVertRight
P(0, 0) = (A0 + A1 + 1) >> 1 P(0, 0) = (A-1 + A0 + 1) >> 1
P(0, 1) = (A0 + (A1 << 1) + A2 + 2) >> 2 P(0, 1) = (A-2 + (A-1 << 1) + A0 + 2) >> 2
P(1, 0) = (A1 + A2 + 1) >> 1 P(1, 0) = (A0 + A1 + 1) >> 1
P(1, 1) = (A1 + (A2 << 1) + A3 + 2) >> 2 P(1, 1) = (A-1 + (A0 << 1) + A1 + 2) >> 2
… …
P(7, 0) = (A7 + A8 + 1) >> 1 P(7, 0) = (A6 + A7 + 1) >> 1
P(7, 1) = (A7 + (A8 << 1) + A9 + 2) >> 2 P(7, 1) = (A5 + (A6 << 1) + A7 + 2) >> 2
intraHorizLeft intraHorizRight
P(0, 0) = (A1 + (A2 << 1) + A3 + 2) >> 2 P(0, 1) = (A-4 + (A-3 << 1) + A-2 + 2) >> 2
P(1, 0) = P(0, 1) = (A2 + (A3 << 1) + A4 + 2) >> 2 P(0, 0) = P(1, 1) = (A-3 + (A-2 << 1) + A-1 + 2) >> 2
P(2, 0) = P(1, 1) = (A3 + (A4 << 1) + A5 + 2) >> 2 P(1, 0) = P(2, 1) = (A-2 + (A-1 << 1) + A0 + 2) >> 2
… …
P(7, 0) = P(6, 1) = (A8 + (A9 << 1) + A10 + 2) >> 2 P(6, 0) = P(7, 1) = (A3 + (A4 << 1) + A5 + 2) >> 2
P(7, 1) = (A9 + (A10 << 1) + A11 + 2) >> 2 P(7, 0) = (A4 + (A5 << 1) + A6 + 2) >> 2
For 4:2:2 and 4:2:0 chroma components, the chroma samples are considered by the codec as
co-sited with the even luma samples. For this reason, the interpolation angles are adjusted
as detailed in Table 4-33 and Table 4-34 for 4:2:2 and 4:2:0, respectively.
Table 4-33: Intra Prediction Modes for NFBLS Blocks (4:2:2, Chroma Components)
intraDc intraVert
P(i, j) = Ai
DC = (
3
Ai) >> 2
i=0
P(i, j) = DC
intraDiagLeft intraDiagRight
AB = (A0 + A1 + 1) >> 1 AB = (A-2 + A-1 + 1) >> 1
BC = (A1 + A2 + 1) >> 1 BC = (A-1 + A0 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 CD = (A0 + A1 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 DE = (A1 + A2 + 1) >> 1
EF = (A4 + A5 + 1) >> 1 EF = (A2 + A3 + 1) >> 1
P(0, 0) = AB P(0, 0) = BC
P(1, 0) = BC P(1, 0) = CD
P(2, 0) = CD P(2, 0) = DE
P(3, 0) = DE P(3, 0) = EF
P(0, 1) = (AB + (A1 << 1) + BC + 2) >> 2 P(0, 1) = (AB + (A-1 << 1) + BC + 2) >> 2
P(1, 1) = (BC + (A2 << 1) + CD + 2) >> 2 P(1, 1) = (BC + (A0 << 1) + CD + 2) >> 2
P(2, 1) = (CD + (A3 << 1) + DE + 2) >> 2 P(2, 1) = (CD + (A1 << 1) + DE + 2) >> 2
P(3, 1) = (DE + (A4 << 1) + EF + 2) >> 2 P(3, 1) = (DE + (A2 << 1) + EF + 2) >> 2
Table 4-33: Intra Prediction Modes for NFBLS Blocks (4:2:2, Chroma Components) (Continued)
intraVertLeft intraVertRight
AB = (A0 + A1 + 1) >> 1 AB = (A-1 + A0 + 1) >> 1
BC = (A1 + A2 + 1) >> 1 BC = (A0 + A1 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 CD = (A1 + A2 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 DE = (A2 + A3 + 1) >> 1
P(0, 0) = (A0 + AB + 1) >> 1 P(0, 0) = (AB + A0 + 1) >> 1
P(1, 0) = (A1 + BC + 1) >> 1 P(1, 0) = (BC + A1 + 1) >> 1
P(2, 0) = (A2 + CD + 1) >> 1 P(2, 0) = (CD + A2 + 1) >> 1
P(3, 0) = (A3 + DE + 1) >> 1 P(3, 0) = (DE + A3 + 1) >> 1
P(0, 1) = (A0 + (AB << 1) + A1 + 2) >> 2 P(0, 1) = (A-1 + (AB << 1) + A0 + 2) >> 2
P(1, 1) = (A1 + (BC << 1) + A2 + 2) >> 2 P(1, 1) = (A0 + (BC << 1) + A1 + 2) >> 2
P(2, 1) = (A2 + (CD << 1) + A3 + 2) >> 2 P(2, 1) = (A1 + (CD << 1) + A2 + 2) >> 2
P(3, 1) = (A3 + (DE << 1) + A4 + 2) >> 2 P(3, 1) = (A2 + (DE << 1) + A3 + 2) >> 2
intraHorizLeft intraHorizRight
AB = (A0 + A1 + 1) >> 1 AB = (A-2 + A-1 + 1) >> 1
BC = (A1 + A2 + 1) >> 1 BC = (A-1 + A0 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 CD = (A0 + A1 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 DE = (A1 + A2 + 1) >> 1
EF = (A4 + A5 + 1) >> 1 EF = (A2 + A3 + 1) >> 1
P(0, 0) = (AB + (A1 << 1) + BC + 2) >> 2 P(0, 0) = (AB + (A-1 << 1) + BC + 2) >> 2
P(1, 0) = (BC + (A2 << 1) + CD + 2) >> 2 P(1, 0) = (BC + (A0 << 1) + CD + 2) >> 2
P(2, 0) = (CD + (A3 << 1) + DE + 2) >> 2 P(2, 0) = (CD + (A1 << 1) + DE + 2) >> 2
P(3, 0) = (DE + (A4 << 1) + EF + 2) >> 2 P(3, 0) = (DE + (A2 << 1) + EF + 2) >> 2
P(0, 1) = (A1 + (BC << 1) + A2 + 2) >> 2 P(0, 1) = (A-2 + (AB << 1) + A-1 + 2) >> 2
P(1, 1) = (A2 + (CD << 1) + A3 + 2) >> 2 P(1, 1) = (A-1 + (BC << 1) + A0 + 2) >> 2
P(2, 1) = (A3 + (DE << 1) + A4 + 2) >> 2 P(2, 1) = (A0 + (CD << 1) + A1 + 2) >> 2
P(3, 1) = (A4 + (EF << 1) + A5 + 2) >> 2 P(3, 1) = (A1 + (DE << 1) + A2 + 2) >> 2
Table 4-34: Intra Prediction Modes for NFBLS Blocks (4:2:0, Chroma Components)
intraDc intraVert
P(i, j) = Ai
DC = (
3
Ai) >> 2
i=0
P(i, j) = DC
intraDiagLeft intraDiagRight
AB = (A0 + A1 + 1) >> 1 AB = (A-2 + A-1 + 1) >> 1
BC = (A1 + A2 + 1) >> 1 BC = (A-1 + A0 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 CD = (A0 + A1 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 DE = (A1 + A2 + 1) >> 1
EF = (A4 + A5 + 1) >> 1 EF = (A2 + A3 + 1) >> 1
P(0, 0) = (AB + (A1 << 1) + BC + 2) >> 2 P(0, 0) = (AB + (A-1 << 1) + BC + 2) >> 2
P(1, 0) = (BC + (A2 << 1) + CD + 2) >> 2 P(1, 0) = (BC + (A0 << 1) + CD + 2) >> 2
P(2, 0) = (CD + (A3 << 1) + DE + 2) >> 2 P(2, 0) = (CD + (A1 << 1) + DE + 2) >> 2
P(3, 0) = (DE + (A4 << 1) + EF + 2) >> 2 P(3, 0) = (DE + (A2 << 1) + EF + 2) >> 2
intraVertLeft intraVertRight
AB = (A0 + A1 + 1) >> 1 AB = (A-1 + A0 + 1) >> 1
BC = (A1 + A2 + 1) >> 1 BC = (A0 + A1 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 CD = (A1 + A2 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 DE = (A2 + A3 + 1) >> 1
P(0, 0) = AB P(0, 0) = AB
P(1, 0) = BC P(1, 0) = BC
P(2, 0) = CD P(2, 0) = CD
P(3, 0) = DE P(3, 0) = DE
intraHorizLeft intraHorizRight
BC = (A1 + A2 + 1) >> 1 AB = (A-2 + A-1 + 1) >> 1
CD = (A2 + A3 + 1) >> 1 BC = (A-1 + A0 + 1) >> 1
DE = (A3 + A4 + 1) >> 1 CD = (A0 + A1 + 1) >> 1
EF = (A4 + A5 + 1) >> 1 DE = (A1 + A2 + 1) >> 1
P(0, 0) = BC P(0, 0) = AB
P(1, 0) = CD P(1, 0) = BC
P(2, 0) = DE P(2, 0) = CD
P(3, 0) = EF P(3, 0) = DE
For each intra predictor, the residual is calculated from the source and predicted blocks, as follows:
R(i, j) = X(i, j) – P(i, j) ∀i, j ∈ current block
In the second stage of Transform mode, the intra predictors in intraListB will be tested throughout
Transform mode, as illustrated in Figure 4-11.
Any intra predictor sample that is located to the left or above the slice boundary will be set equal
to half the component’s dynamic range. For example, for 8bpc content in YCoCg color space,
any sample that is located outside the slice will have the following value:
Y = 128, Co = 0, Cg = 0
Any intra predictor sample that is located to the right of the slice boundary will be horizontally
replicated from the last valid sample in the line.
4-point Y(i, j)
Forward DCT
4.6.1.2.1 Pre-shift
Before the first transform pass is conducted, the intra-predicted residuals R(i, j) are pre-shifted,
as described in Table 4-36. The pre-shifted residuals are denoted as R(i, j). This step increases
the dynamic range such that the error associated with forward transform is minimized. The
following parameters are associated with the pre-shift:
• dctFwShift = 2
• dctFwRound = 2
The result of this step is a set of temporary transform coefficients Y(i, j). For a 4:2:0 chroma
component, these transform coefficients are post-shifted, as detailed later in Table 4-40, and
the forward transform step is complete. For all other cases, a vertical transform pass is applied.
(See Section 4.6.1.2.3.)
R4 da 0 db0 dc0 dd 0 Y4
R7 da 3 db3 dc3 dd 3 Y7
Figure 4-15: Butterfly Structure for 8-point Forward Discrete Cosine Transform
Table 4-37: 8-point Forward Discrete Cosine Transform Applied to Selected Row r
of Pre-shifted Residual Block R(i, j)
// first stage
for (i = 0; i < 4; i ++) {
ea(i, r) = R(i, r) + R(7 – i, r)
da(7 – i, r) = R(i, r) – R(7 – i, r)
}
R0 a0 Y0
R1 a1 Y1
R2 a2 Y2
R3 a3 Y3
Figure 4-16: Butterfly Structure for 4-point Forward Discrete Cosine Transform
4.6.1.2.4 Post-shift
The result of the forward Haar transform shall be post-shifted, as described in Table 4-40, using the
same dctFwRound and dctFwShift values defined in Section 4.6.1.2.1. The resulting transform
coefficients T(i, j) are not orthonormal. The normalization step shall be combined with the
quantization step described in Section 4.8.1.3.
Note: For 4:2:0 chroma components, the vertical (Haar) pass is skipped, and therefore the
input to the post-shift shall be the block of temporary transform coefficients Y(i, j).
modeRdCost is calculated for each intra predictor in intraListB, as described in Section 4.6.
The encoder shall select the intra predictor, as per Table 4-41. Any intra predictor that generates a
syntax element larger than ssm_max_se_size shall be considered as invalid and disallowed by the
encoder.
where:
• largeInt = 999999
• numIntraPreds = 1 for FBLS blocks and 4 for NFBLS blocks
• intraRdCost[intraPred] = Associated intra predictor’s RD cost
• intraRate[intraPred] = Associated intra predictor’s rate
The block is then reconstructed, based on the selected intra predictor. The final rate, distortion,
and RD cost for that intra predictor are assigned as the final modeRate, modeDistortion, and
modeRdCost, respectively, for Transform mode.
Intra predictor signaling, using three bits, shall be conducted for the NFBLS blocks,
as per Table 4-42.
4.6.1.5 EC
Entropy coding of Transform mode quantized transform coefficients entropy coding is conducted
as described in Section 4.7. For any 8x2 component, prior to entropy coding, the quantized
transform coefficients are re-ordered so that they are in the correct group order, as illustrated in
Figure 4-17.
S0 S1 S2 S3 S4 S5 S6 S7 T0 T1 T2 T4 T5 T9 T10 T11
S8 S9 S10 S11 S12 S13 S14 S15 T3 T6 T7 T8 T12 T13 T14 T15
Table 4-43 describes the mapping between the quantized transform coefficients and
re-ordered coefficients.
where:
• S = Quantized transform coefficients
• T = Re-ordered coefficients
This re-ordering step is not required for 4x2 or 4x1 components, as illustrated in Figure 4-18.
S0 S1 S2 S3 S0 S1 S2 S3
S4 S5 S6 S7
S0 S1 S2 S3 S4 S5 S6 S7 S0 S1 S2 S3
Figure 4-18: Transform Mode ECG Structure – 4x2 and 4x1 Components
-1 0 0 0 0 0 0 0 -1
-5 7 -1 0 0 0 0 0 7 -1 -5
2 0 1 -1 0 0 0 0 -1 0 1 0 2
0 3 6 -2 0 1 1 -1 -2 0 3 0 -1 3 6 7 1 1 -1 -1 0
10 0 -2 -5 0 6 8 -10 -5 0 3 0 5 0 -2 3 6 8 -10 -1 0 4 -7 10
Figure 4-19: Example Transform Component Data Blocks with Corresponding lastSigPos,
ECG Structures
For chroma components, lastSigPos will be 0 if only the first coefficient has a non-zero
value. For the luma component, lastSigPos will be 0 in the following two cases:
• All coefficients are 0 (because there is no component skip flag for Component 0)
• First coefficient is non-zero, and all other coefficients are 0
lastSigPos is signaled explicitly in the bitstream at the beginning of each component, using a fixed
number of bits (bitsLastSigPos), as follows:
bitsLastSigPos = compSamplesLog2
For a given component, all samples must be signaled, up to and including the last significant
sample. All samples after lastSigPos are omitted because their value will be 0. Any ECG for which
all sample indices occur after lastSigPos will also be omitted. For example, if lastSigPos = 8 for
the luma component (see Figure 4-17), ECG2 is omitted because all samples in this group have
an index that is larger than 8.
After lastSigPos is calculated, the encoder shall calculate the sign of the last significant coefficient
(signLastSigPos) to avoid possible ambiguity later, as follows:
1, coeff[lastSigPos] < 0
signLastSigPos = { 0, coeff[lastSigPos] > 0
The coefficient at lastSigPos is then modified because it is known that the original value cannot
have been 0. Modification is performed as follows:
coeff[lastSigPos] + 1, coeff[lastSigPos] < 0
coeff[lastSigPos] = { coeff[lastSigPos] – 1, coeff[lastSigPos] > 0
One final step shall occur if coeff[lastSigPos] == 0 after this modification (i.e., coeff[lastSigPos] ==
±1 before the modification). In this case, the encoder will signal signLastSigPos, using one bit to
avoid possible ambiguity. This bit will be transmitted along with the sign bits in ECG3. (For further
details regarding EC sign bits, see Section 4.7.8.)
4.6.2 BP Mode
In Block Prediction (BP) mode, the current block is split into partitions, and each partition is
predicted from a set of spatially neighboring samples (the BPV search range). For each source
partition within the block, the encoder searches the BPV search range for a candidate partition that
minimizes distortion. This distortion is calculated per-pixel rather than per-component (i.e., a single
BPV will be used for all three color components of a given partition). The best candidate’s position
within the search range is the block prediction vector (BPV) for that partition. The BPV is signaled
explicitly within the bitstream syntax, such that the decoder can perform the same prediction
without requiring a search operation.
The BP mode syntax includes the BPVs for each partition, in addition to entropy-coded quantized
prediction residuals for each component. Figure 4-20 illustrates BP mode encoder operation.
4x 16x
BPV Search
Range
Quantized Residual Calculate Ent ropy
BPV Coding Rate
Reconstructed
Residual Calculate
Calculate RD Cost Select Minimum
32x (FBLS) Residual Distortion
64x (NFBLS)
Prediction
Source Block
Reconstruction
BP mode uses partitioning to split the current block into two types of non-overlapping partitions,
2x2 and 2x1, as illustrated in Figure 4-21. The BPV search operation is performed for each search
range position for each partition within the block. The encoder selects a partition option for each
2x2 sub-block, based on RD cost. For example, in Figure 4-21, the leftmost 2x2 sub-block can
be represented by a single BPV (A) –or– two BPVs (A0, A1).
A B C D A0 B0 C0 D0
A1 B1 C1 D1
Figure 4-21: BP Mode Partitions – 2x2 and 2x1
BP Mode BP Mode
Source Buffer (RGB) CSC Current Block Source Buffer (YCbCr) Current Block
FBLS C0 C1 ... C24 C25 C26 C27 C28 C29 C30 C31 C32
Block C33 C34 ... C57 C58 C59 C60 C61 C62 C63 C64 C65
NFBLS C0 C1 ... C24 C25 C26 C27 C28 C29 C30 C31 C32
Block C33 C34 ... C57 C58 C59 C60 C61 C62 C63 C64 C65
Figure 4-23: BPV Search Range for FBLS and NFBLS Blocks
The BPV search range is split into three parts, as described in Table 4-44.
For each sub-block, the BPV search operation shall be performed independently for all 2x2 and
2x1 partitions, as described in Table 4-45.
Table 4-45: BPV Search Operation for 2x2 and 2x1 Partitions
Description Candidate Partitions Search Ranges
Compare the 2x2 source partition BPV 0 through 6 bpvSearchRangeA, bpvSearchRangeC
for the sub-block with all candidate BPV 7 bpvSearchRangeA, bpvSearchRangeB,
2x2 partitions within the search range bpvSearchRangeC
(see Figure 4-24).
BPV 8 through 31 bpvSearchRangeB
(vertically replicated)
BPV 32 through 63 bpvSearchRangeC
Compare the top 2x1 source partition BPV 0 through 6 bpvSearchRangeA
for the sub-block with all candidate BPV 7 bpvSearchRangeA, bpvSearchRangeB
2x1 partitions (see Figure 4-25).
BPV 8 through 31 bpvSearchRangeB
BPV 32 through 63 bpvSearchRangeC
(first row)
Compare the bottom 2x1 source BPV 0 through 6 bpvSearchRangeC
partition for the sub-block with (first row)
all candidate 2x1 partitions BPV 8 through 31 bpvSearchRangeB
(see Figure 4-26).
BPV 32 through 63 bpvSearchRangeC
(second row)
$ %
%39 %39 %39
& &
& & & & & & & & & & &
& & & & & & & & & & &
% % % %
%39 %39
% % % %
& & & & & & & & & & &
& & & & & & & & & & &
& & & & & & & & & & &
& & & & & & & & & & &
%39 %39
ďƉǀ Ϭ ďƉǀ ϳ
ďƉǀ ϴ ďƉǀ ϯϭ
ďƉǀ ϯϮ ďƉǀ ϲϯ
& & & & & & & & & & &
& & & & & & & & & & &
%39 %39
& & & & & & & & & & &
& & & & & & & & & & &
& & & & & & & & & & &
%39 %39
For FBLS blocks, the search range shall consist of up to 32 positions because search ranges A and
B will be unavailable. In this case, the BPV shall be signaled in the bitstream, using five bits each.
For NFBLS blocks, the search range shall consist of up to 64 positions, and six bits shall be used
to signal each BPV.
During the BPV search operation, for each 2x2 and 2x1 partition within the current block, the SAD
shall be calculated between the source partition and each candidate partition within the search
range. The candidate partition that generates the minimum SAD when compared with the source
partition shall be selected. The SAD between the source partition and candidate partition shall
be calculated over all samples within the partitions, as per Table 4-46.
Figure 4-27 demonstrates the SAD calculation for a given 2x2 and 2x1 source and candidate
partition in the YCoCg color space:
AY BY EY FY AY BY CY DY
CY DY GY HY
ACo BCo CCo DCo
Figure 4-27: Example 2x2 and 2x1 Source and Candidate Partitions for SAD Calculation
AY BY EY FY AY BY EY FY
CY DY GY HY CY DY GY HY
CCb GCb
CCr GCr
4:2:2, 2x1 Partition; 4:2:0, 2x1 Partition (Top Row) 4:2:0, 2x1 Partition (Bottom Row)
AY BY CY DY AY BY CY DY
ACb CCb
ACr CCr
Figure 4-28: BP Mode Partitions for 4:2:2 and 4:2:0 Use Cases
For 4:2:2 content, a 2x2 sub-block contains four luma samples, and two samples each for Cb/Cr.
In this case, 2x1 partitions are calculated in the same way for partitions within the current block’s
top and bottom rows.
SAD2x2 = ( | AY – EY | + | BY – FY | + | CY – GY | + | DY – HY | )
+ ( | ACb – ECb | + | CCb – GCb | ) + ( | ACr – ECr | + | CCr – GCr | )
SAD2x1 = ( | AY – CY | + | BY – DY | ) + | ACb – CCb | + | ACr – CCr | )
For 4:2:0 content, a 2x2 sub-block contains four luma samples, and one sample each for Cb/Cr.
Therefore, the SAD for a 2x2 partition shall be calculated as follows:
SAD2x2 = ( | AY – EY | + | BY – FY | + | CY – GY | + | DY – HY | ) + | ACb – ECb | + | ACr – ECr |
For 2x1 partitions in 4:2:0, the chroma sample shall be aligned with the current block’s top row.
Therefore, the SAD for a 2x1 partition in the top row shall contain a chroma sample, while
the SAD for the bottom row shall not. This is necessary to ensure that the error between 2x2
and 2x1 partitions are consistent. The following are the SADs for a 4:2:0 2x1 partition:
SAD2x1 top = ( | AY – CY | + | BY – DY | ) + | ACb – CCb | + | ACr – CCr |
SAD2x1 bottom = ( | AY – CY | + | BY – DY | )
Each portion of the search range shall contain half as many samples for 4:2:2 and 4:2:0 FBLS
and NFBLS chroma components, as illustrated in Figure 4-29.
4:2:2 C0 C1 ... C12 C13 C14 C15 C16 4:2:2 C0 C1 ... C12 C13 C14 C15 C16
FBLS C C NFBLS C C
17 18 ... C29 C30 C31 C32 C33 17 18 ... C29 C30 C31 C32 C33
Figure 4-29: BPV Search Range for 4:2:2 and 4:2:0 FBLS and NFBLS Chroma Components
For a given luma search range position (srPos) within a search range, the chroma search range
position (srPosChroma) is calculated as follows, where the offset in search range C is due
to the one-sample shift of search range C relative to A and B:
(srPos + 1) >> 1, srId == 2
srPosChroma = { srPos >> 1, otherwise
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
4.6.2.2 Residuals
BP mode shall calculate two residual sub-blocks for each sub-block within the current block,
as described in Table 4-49. The combination of all 2x2 residual sub-blocks will give the
2x2 residual block R2x2(i, j). The combination of all 2x1 residual sub-blocks will give
the 2x1 residual block R2x1(i, j).
where:
• ~
R is a temporary BP mode variable
4.6.2.3 Quantization
BP mode shall use the fractional quantizer, as described in Section 4.8.1.3, for 2x2 and
2x1 residual blocks, as follows:
Q [R2x2(i, j)] = Rq2x2(i, j)
Q [R2x1(i, j)] = Rq2x1(i, j)
4.6.2.4 EC
BP mode quantized residual entropy coding shall be conducted as described in Section 4.7 with the
entropy coding group sample distributions illustrated in Figure 4-31 and Figure 4-32. This shall be
performed for each of the 16 possible partition combinations described in Section 4.6.2.5.
S0 S1 S2 S3 S4 S5 S6 S7
4:2:2 4:2:0
S0 S1 S2 S3 S0 S1 S2 S3
S4 S5 S6 S7
S0 S1 S4 S5 S2 S3 S6 S7 S0 S1 S2 S3
Figure 4-32: BP Mode ECG Structure, Chroma Components (4:2:2 and 4:2:0)
2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1
0 4 2x2 8 2x2 12 2x2 2x2
2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1 2x1
The final partition information is selected by minimizing the RD cost among all 16 possible
combinations, as described in Table 4-50. The minimum rate is used to break ties.
Sub-block 0 = { 2x2,
2x1,
(bestGrid & 0x8) >> 3 ==1
otherwise
Sub-block 1 = { 2x2,
2x1,
(bestGrid & 0x4) >> 2 ==1
otherwise
Sub-block 2 = { 2x2,
2x1,
(bestGrid & 0x2) >> 1 ==1
otherwise
Sub-block 3 = { 2x2,
2x1,
(bestGrid & 0x1) ==1
otherwise
where:
• largeInt = 999999
• partitionGridRdCost[grid] = Associated partition grid’s RD cost
• partitionGridRate[grid] = Associated partition grid’s rate
Source Buffer (RGB) Current Block Source Buffer (YCbCr) Current Block
CSC
Midpoint (FBLS)
CSC
bppIndex 0 1 2 3 4 5 6 7
The scalar quantizer used by MPP mode has a step size that is determined by the value
mppStepSize, which will be signaled explicitly in the bitstream, as follows:
mppStepSize = clip (mppMinStepSize, bpc – 1, ( (mppQp – 16) >> 3) + (bpc – 8) )
where:
• mppMinStepSize is determined as described in Table 4-52
In the RGB color space, mppStepSize is used as the quantization step size for all three color
components. In the YCoCg color space, a remapping is performed, using the fixed tables
mppStepSizeMapCo and mppStepSizeMapCg:
mppStepSizeComp = { mppStepSizeMapCo[mppStepSize],
mppStepSizeMapCg[mppStepSize],
mppStepSize,
mppColorSpace == 1, Co component
mppColorSpace == 1, Cg component
otherwise
After the encoder determines mppStepSize, the encoder shall calculate a midpoint value for each
2x2 sub-block, as described in Section 4.6.3.2.
The number of bits per sample in each component for MPP mode (mppBitsPerComp) is calculated
for each component, as follows:
bpc + 1, Co or Cg component
bitDepth = {
bpc, otherwise
mppBitsPerComp[k] = (bitDepth – mppStepSizeComp[k])
Quantized MPP residuals within the bitstream shall be signaled as unsigned values. After
the quantized residuals are calculated as described in Section 4.6.3.3, they are mapped to unsigned
values by subtracting the value minCode, where minCode = - (1 << (mppBitsPerComp – 1) ).
Table 4-53: MPP Mode Midpoint Calculation for Sub-blocks within a Given Component k
middle = { 0,
1 << (bitDepth – 1),
Co or Cg component
otherwise
mppStepSize[k] == 0
bias = { 0,
1 << (mppStepSize[k] – 1), otherwise
for (subblock = 0; subblock < 4; subblock ++) {
if (chroma format is 420 or 422 && k > 0 && subblock ≥ 2) {
mean = 0
} else {
sbx = subblock << 1
if (first block in slice) {
mean = middle
} else if (FBLS block) {
if (chroma format is 420 && k > 0) {
mean = ( X prev (sbx, 0) + X prev (sbx + 1, 0) ) >> 1
} else {
mean = ( X prev (sbx, 0) + X prev (sbx + 1, 0) + X prev (sbx, 1) + X prev (sbx + 1, 1) ) >> 2
}
} else if (NFBLS block) {
mean = ( X prevRecLine(sbx, 0) + X prevRecLine(sbx + 1, 0) ) >> 1
if (original color space is RGB and testing in YCoCg) {
perform color space conversion on mean (RGB → YCoCg)
}
}
}
maxClip = min ( (1 << bitDepth) – 1, middle + 2 × bias)
mp = clip (middle, maxClip, mean + 2 × bias)
}
For blocks that are located within the first blockline of a slice, each sub-block’s mp is calculated
from the corresponding sub-block within the previous reconstructed block. (See Figure 4-36, top.)
For NFBLS blocks, the immediate vertical neighbors of each sub-block are used instead.
(See Figure 4-36, bottom.) For the first block within a slice, no spatial neighbors will be used.
Previous
x y ...
Reconstructed Line
A B ...
NFBLS Current Block
C D ...
Figure 4-36: MPP Mode Midpoint Is Calculated from the Current Block’s
Reconstructed Spatial Neighbors
where:
• mppErrorDiffusionThreshold = 3
3UHGLFW
$ % $UHF % $UHF % $UHF %UHF
8SGDWH 'LIIXVH(UURU
5HFRQVWUXFWHG
& ' & ' & ' & '
Table 4-54 describes the processes of prediction, quantization, inverse quantization, reconstruction,
and error diffusion.
Arec = Q-1[Aq] + mp
sampleError = A – Arec
C' = {C,
C + ( (sampleError + 1) >> 1), allowErrorDiffusion
otherwise
if (chroma == 444 || comp == 0) {
Bres = B' – mp
Bq = Q[Bres]
Brec = Q-1[Bq] + mp
sampleError = B – Brec
Crec = Q-1[Cq] + mp
sampleError = C – Crec
Drec = Q-1[Dq] + mp
}
Table 4-55: MPP Mode Quantized Residual Distribution among Substreams 0 through 3
Chroma Format Substream 0 Substream 1 Substream 2 Substream 3
4:4:4, Component 0[0 – 3] Component 0[4 – 15] Component 1[4 – 15] Component 2[4 – 15]
12 samples/ssm Component 1[0 – 3]
Component 2[0 – 3]
4:2:2, Component 0[0 – 7] Component 0[8 – 15] Component 1[0 – 7] Component 2[0 – 7]
8 samples/ssm
4:2:0, Component 0[0 – 5] Component 0[6 – 11] Component 0[12 – 15] Component 1[2 – 3]
6 samples/ssm Component 1[0 – 1] Component 2[0 – 3]
DW20
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
In the PPS, mppfBitsPerComp must be specified such that the maximum rate is strictly less than
the average block rate (avgBlockBits). An MPPF block’s maximum rate is stored in the parameter
minBlockBits, which the rate controller uses to ensure that at least one mode is available to
encoder mode selection during each block time. (See Section 4.9.) The parameter maxHeaderLen
is eight bits for RGB source content, and seven bits for YCbCr source content. This is a sum of the
maximum mode header (four bits), the maximum flatness header (three bits), and one bit to signal
the MPPF color space (mppfColorSpace) in the case of RGB source content.
minBlockBits = ( k ∈ [0, 2 ]
compSamples[k] × mppfBitsPerComp[k] ) + maxHeaderLen
For RGB source content, mppf_bits_per_comp shall be specified such that the following holds:
k ∈ [0, 2 ]
mppfBitsPerCompRgb[k] ≥
k ∈ [0, 2 ]
mppfBitsPerCompYCoCg[k]
The remainder of MPPF mode is identical to MPP mode, with the following mapping between
mppfBitsPerComp and mppfStepSizeComp:
bpc + 1, Co or Cg component
bitDepth = bpc, { otherwise
mppfStepSizeComp[k] = bitDepth – mppfBitsPerComp[k]
As with MPP mode, the encoder shall select the color space that generates the minimum
distortion. For MPPF mode, the RGB color space shall be selected if modeDistortionRgb <
modeDistortionYCoCg. The selected color space (mppfColorSpace) is then signaled explicitly
in the bitstream.
Quantized MPPF residuals within the bitstream shall be signaled as unsigned values. After
the quantized residuals are calculated as described in Section 4.6.3.3, they are mapped to unsigned
values by subtracting the value minCode, where minCode = - (1 << (mppfBitsPerComp – 1) ).
BPV search is not repeated for BP-SKIP mode. Instead, the residuals and quantized residuals
determined in BP mode may be directly re-used. The procedure described in Section 4.6.2.5 for
BP partition selection is repeated for BP-SKIP; however, in this case, the rate is determined solely
by header bits and BPV signaling.
There are four ECGs for each component, indexed by ecgIdx. The contents of each ECG shall
depend on the coding mode, chroma format, component skip, component data, and whether
rate buffer underflow prevention is enabled. Each ECG contains between 0 to ecgMaxBits bits
where ecgMaxBits is equal to 50, and is constructed as described in Section 4.7.2.
Each component’s ECGs shall be transmitted within a separate substream, as follows:
• Substream 0 – No ECGs
• Substream 1 – ECGs for Component 0
• Substream 2 – ECGs for Component 1
• Substream 3 – ECGs for Component 2
The bits that comprise each ECG are divided into three categories, as described in Table 4-58.
Figure 4-39 illustrates example ECG constructions for a Transform mode color component.
Note that any data in ecgIdx == 3 shall be coded using 2’s complement (2C) representation,
while all other ECGs shall be coded using sign/magnitude (SM). Table 4-59 describes the possible
ecgDataActive[ecgIdx] values for Transform mode.
4:4:4, Data 5 Samples (SM) 3 Samples (SM) 7 Samples (SM) 1 Sample (2C)
k = any,
ecCompSkip = 0, Stuffing Bits Yes Yes Yes
underflowPrevention = 1,
lastSigPos = 15 Sign Bits Yes
Data – – – –
4:4:4,
k > 0,
Stuffing Bits Yes Yes Yes
ecCompSkip = 1,
underflowPrevention = 1
Sign Bits –
Data – – – –
4:4:4,
k > 0,
Stuffing Bits – – –
ecCompSkip = 1,
underflowPrevention = 0
Sign Bits –
Figure 4-40 illustrates example ECG constructions for a BP mode color component. As with
Transform mode, data shall be coded using 2C bit representation if ecgIdx == 3, and coded using
SM bit representation otherwise. Table 4-60 describes the possible ecgDataActive[ecgIdx] values
for BP mode.
Data – – – –
4:4:4,
k > 0,
Stuffing Bits Yes Yes Yes
ecCompSkip = 1,
underflowPrevention = 1
Sign Bits –
Data – – – –
4:4:4,
k > 0,
Stuffing Bits – – –
ecCompSkip = 1,
underflowPrevention = 0
Sign Bits –
Y Y
BP mode && Code ECG
Calculate bitsReq bitsReq > 0 ?
bitsReq ≤ 2 ? Usin g V EC
N N
The number of bits required for each sample, bitsReq, depends on whether the current group uses
SM or 2C bit representation. This is determined using the following rule:
• SM bit representation – Used if ecgIdx < 3
• 2C bit representation – Used if ecgIdx == 3
The encoder determines bitsReq as the minimum value (as described in Table 4-61) that can fully
represent all samples within the current ECG. Note that for SM representation, the sample’s
magnitude shall be used to determine bitsReq. Therefore, the effective range on the sample itself
would be symmetric around 0 (e.g., [-3, 3]). For example, an ECG that uses SM bit representation
where the magnitude of all samples is within the range [0, 7] shall correspond with a bitsReq of 3.
Likewise, an ECG that uses 2C bit representation shall have a bitsReq of 3 if all samples are within
the range [-4, 3].
Table 4-61: Mapping from bitsReq to Sample Ranges in SM and 2C Bit Representations
bitsReq Range Range
(SM) (2C)
1 [0, 1] [-1, 0]
2 [0, 3] [-2, 1]
3 [0, 7] [-4, 3]
4 [0, 15] [-8, 7]
5 [0, 31] [-16, 15]
… … …
N [0, 2N – 1] [-2N – 1, 2N – 1 – 1]
The unary prefix shall be generated, as described in Table 4-62. The case of bitsReq == 0 does not
correspond with a unary prefix because such a group would be coded using a group skip flag
instead. For bitsReq < 6, the unary prefix depends on the coding mode, as detailed Table 4-62.
For bitsReq ≥ 6, the unary prefix shall be the same for all modes, signaled as (bitsReq – 1) 1s
followed by a 0.
For a non-skipped group, first the groupSkip flag shall be signaled as 0 to specify that group skip
is disabled. Next, the unary prefix shall be transmitted as described above. The suffix is the final
step of entropy coding and shall be handled differently, depending on whether the current group
uses CPEC or VEC.
4.7.5 CPEC
Each sample within a CPEC ECG will be signaled as a fixed-length field, where the length is
bitsReq. For SM bit representation, the magnitude shall be signaled in the fixed-length field, and
the sign information will be signaled later. (See Section 4.7.8.) For 2C bit representation, each
sample shall be directly coded using the 2C, which includes both sign and magnitude information.
4.7.6 VEC
VEC is handled in three stages, as illustrated in Figure 4-42.
• ECG samples to scalar vecCodeSymbol mapping
• vecCodeSymbol to vecCodeNumber mapping, using an LUT
• vecCodeNumber signaling, using Golomb-Rice coding
First, the four samples within the ECG shall be mapped to a scalar value vecCodeSymbol, using the
logic described in Table 4-63. This mapping is one-to-one; thus, each possible sample vector will
generate a unique vecCodeSymbol.
Each Golomb-Rice code contains a unary prefix, followed by a fixed-length suffix of length
vecGrK. The coding size is calculated as 24 × bitsReq because each vector contains four samples,
each represented by bitsReq bits. If the prefix reaches the maximum prefix length, the ending
0 bit in the unary prefix can be omitted because the decoder can infer the ending 0 bit.
To prevent the rate buffer from underflowing, the syntax includes stuffing bits for this block
time to ensure that the block rate is strictly greater than avgBlockBits. The stuffing bits shall
be composed of nine stuffing words (rateBufferStuffingWord), allocated as three stuffing words
each to Substreams 1 through 3. The stuffing word’s size is determined by PPS parameter
rc_stuffing_bits. These stuffing words shall be included in the last three ECGs for each substream.
Note that these stuffing words shall be transmitted regardless of whether ecgDataActive is false
for a given ecgIdx.
The rc_stuffing_bits size shall be determined by the average block rate, as follows:
rc_stuffing_bits = avgBlockBits
9
–8
The quantity (avgBlockBits – 8) accounts for the largest required stuffing, given a block with the
minimum syntax element size of two bits for all four substream (eight bits). This stuffing is divided
into nine equal words because the rate buffer underflow prevention stuffing will be present in
three ECGs per component.
For example, if avgBlockBits == 96 (6 bits/pixel), each stuffing word shall contain the following:
969– 8 = 10 bits
4.7.8 Sign Bits
The final entropy coding step shall signal the sign bits for all non-zero samples within groups that
are coded using SM representation. These sign bits shall be grouped together and signaled as part
of the ecgIdx == 3 syntax.
For Transform mode, signLastSigPos shall be signaled as the final sign bit, if required, as described
in Section 4.6.1.5.1.
4.8 Quantizer
Quantization is used by each mode, as described in Table 4-69. The amount of quantization used
is tied to the QP. The QP value used by each type of quantizer will be derived from the QP value
that is maintained by the rate control model. This is described in Section 4.8.1.1 and Section 4.8.2
for the fractional and scalar quantizers, respectively.
{
16, bpc == 8
0, bpc == 10
minQp =
-16, bpc == 12
The fractional quantizer’s behavior is different for Transform and BP modes because of the
difference in dynamic range between transformed residuals T(i, j) (Transform mode) and residuals
R(i, j) (BP mode). In addition, a transform normalization factor is present in the quantization
of Transform mode that is not present for BP mode.
The QP value that is used by the fractional quantizer is derived from the rate control QP,
as described in Section 4.8.1.1.
4.8.1.1 QP Mapping
At the beginning of each block time, the rate control algorithm shall update the QP as described
in Section 4.5.3. During this block time, the fractional quantizer shall use a modified QP value
(qpMod), which is determined from the rate control QP (qpRc), coding mode, component index,
and bit depth, as described in Table 4-70. Outside of this section, general discussion with regard
to the QP shall refer only to qpRc.
{
4, 4:4:4, FBLS
2, 4:4:4, NFBLS
offset =
0, otherwise
qpTempA = { qpRc,
clip (minQp, maxQp, qpRc + offset),
Transform mode
BP mode
if (color space is RGB) {
qpTempB = qpTempA
} else if (color space is YCbCr) {
qpTempB = { qpTempA,
quantStepChroma[qpTempA – 16],
qpTempA ≤< 16 | k == 0
otherwise
} else if (color space is YCoCg) {
if (qpTempA < 16) {
qpTempB = { qpTempA,
qpTempA + 8,
k == 0
otherwise
} else {
{
qpTempA, k == 0
quantStepCo[qpTempA – 16], k == 1
qpTempB =
quantStepCg[qpTempA – 16], k == 2
if (k > 0 && FBLS && qpTempA ≤ maxQp) {
qpTempB = clip (minQp, 72, qpTempB)
}
}
}
qpMod = qpTempB + ( (bpc – 8) << 3)
Each row of the quantization table specifies the coefficients for a given fractional quantization step
(qpRem = QP & 0x07). A mapping is then defined between each sample position within a row and
the associated quantization coefficient in the quantization table. Both rows of the current block use
the same mapping.
• quantTableMapping8x2 = [0, 1, 2, 3, 0, 3, 2, 1]
• quantTableMapping4x2 = [0, 1, 0, 1]
• quantTableMapping4x1 = [0, 1]
From the above matrices and mapping array, the forward and inverse quantization procedures
are described in Table 4-71 and Table 4-72, respectively. The specific quantization table and
mapping tables used will be the ones associated with the current block component dimension.
For an 8x2 block component:
• quantTableForwardDct = quantTableForwardDct8x2
• quantTableMapping = quantTableMapping8x2
In addition, the following parameters define the shift associated with the Transform mode
quantization coefficients and dead zone.
• dctQuantBits = 8
• dctQuantDeadZone = 102
where:
• Dead zone size is 102 / 256 ≈ 0.4
From these two matrices, the forward and inverse quantization procedures are described in
Table 4-73 and Table 4-74, respectively.
The following parameters define the shifts associated with the BP mode quantization and inverse
quantization coefficients and quantization dead zone:
• bpQuantBits = 6
• bpInvQuantBits = 9
• bpQuantDeadZone = 22
where:
• Dead zone size is (bpQuantDeadZone >> bpQuantBits) =
~ 0.35
{
( (mppQp – 16) >> 3) + (bitDepth – minBpc), k == 0 | | RGB/YCbCr
mppStepSizeMapCo[stepSizeComp[0] ], k is Co component
stepSizeComp[k] =
mppStepSizeMapCg[stepSizeComp[0] ], k is Cg component
Given stepSize and the current component’s bit depth, forward quantization shall be calculated
as described in Table 4-75.
bias = { 0,
1 << (stepSizeComp – 1),
stepSizeComp == 0
otherwise
codeMin = -(1 << (bitDepth – stepSizeComp – 1) )
codeMax = (1 << (bitDepth – stepSizeComp – 1) ) – 1
avgBlockBits denotes the number of bits that shall be transferred from the rate buffer to the
bitstream, per block time. (See Section 4.5.1.1.) minBlockBits is the worst-case cost for MPPF
mode. (See Section 4.6.4.)
Table 4-76 describes the mode selection conditions for enforcing correct rate buffer behavior
for a given modeRate. If any of these conditions are not met, the mode selection algorithm
shall disallow a coding mode for the current block time.
Table 4-76: Mode Selection Conditions for Enforcing Correct Rate Buffer Behavior for Given modeRate
xmitBits = { 0,
avgBlockBits,
blkIdx < rc_init_tx_delay
otherwise
conditionA: (bufferFullness + modeRate – xmitBits) ≤ rc_buffer_max_size – (rcOffset + rcOffsetInit)
conditionB: (bufferFullness + modeRate – xmitBits) ≥ avgBlockBits + chunkAdjBitsMax
conditionC: (sliceBitsRemaining – modeRate) ≥ (numBlocksRemaining × minBlockBits)
where:
• sliceBitsRemaining is the number of bits that remain within the slice
• numBlocksRemaining is the number of blocks that remain within the slice
• chunkAdjBitsMax = 8 (see Section 4.5.1.1)
Encoder mode selection shall then proceed according to the following four steps, for the set
of modes that has not been invalidated for the current block time:
1 From the following set of coding modes, select the mode that has the lowest modeRdCost:
{Transform mode, BP mode, MPP mode}.
2 If all the modes listed in step 1 are invalidated, select the fallback mode (MPPF –or– BP-SKIP)
that has the lowest modeRdCost.
3 If BP mode is selected as the best mode, select BP-SKIP mode instead if the following
conditions exist:
a underflowPrevention is disabled.
b BP mode has zero quantized residual for all three components.
4 If MPP mode is selected as the best mode, select BP-SKIP mode instead if the following
conditions exist:
a underflowPrevention is disabled.
b Chroma format is 4:4:4 or 4:2:2.
c modeRdCost for BP-SKIP mode is lower than the modeRdCost for MPP mode.
d QP ≥ (bpc << 3) – 16.
ssm[0]
Entropy Encoder ,
ssmBalanceFifo[0]
Funnel Sh ifter
ssm[1]
Entropy Encoder ,
ssmBalanceFifo[1]
Funnel Sh ifter Bitstrea m
Sub stream
Rate B uffer
ssm[2] Multiplexer
Entropy Encoder ,
ssmBalanceFifo[2]
Funnel Sh ifter
ssm[3]
Entropy Encoder ,
ssmBalanceFifo[3]
Funnel Sh ifter
De-multiplexer
Model
SSM requires a small portion of the total slice rate to be set aside to guarantee proper behavior
at the end of the slice. This quantity is provided by PPS parameter num_extra_mux_bits and is
calculated as described in Table 4-81. The portion of the slice rate that is used by the rate controller
shall be as follows:
slice_num_bits – num_extra_mux_bits
SSM is subject to a delay at the beginning of the slice, such that ssmBalanceFifo has a chance to
fill up before any mux words are transmitted. This delay is specified by ssmDelay. An additional
fixed delay of one block time is applied to Substreams 1 through 3, relative to Substream 0,
to ease timing at the de-multiplexer. This means that the de-multiplexer shall effectively receive
Substream 0 one block time earlier than the other substreams. This additional delay (ssmSkew)
is specified, as follows:
0, ssmIdx == 0
{
ssmSkew[ssmIdx] = 1, ssmIdx > 0
Therefore, the full value of ssmDelay for each substream is calculated, as follows:
ssmDelay[ssmIdx] = ssmDelayBase + ssmSkew[ssmIdx]
where:
• ssmDelayBase = ssm_max_se_size:
ssmDelayBase = 2 × ssm_max_se_size
ssmMinSeSize
–1
= 2 × ssm_max_se_size
2
–1
= ssm_max_se_size
Figure 4-44 illustrates ssmDelay’s impact on the substream multiplexer. For any block time less
than ssmDelay, bits shall be placed into ssmBalanceFifo; however, bits shall not be removed and
placed into the rate buffer. When the block time is greater than or equal to ssmDelay, bits shall
be placed into ssmBalanceFifo, and the de-multiplexer model shall be checked to determine
whether any mux words need to be removed from ssmBalanceFifo and placed into the rate buffer.
SSM Delay
Encode r
0 1 2 3 4 5 6 ... N N+1 N+2 N+3 ...
Block Inde x
... ...
Add synta xEle me nt to Add synta xEle me nt to
ssmBa lanceFifo[ssmIdx], ssmBa lanceFifo[ssmIdx],
Mux Word
Reque st ?
Add ssmSe Size[ssmIdx] to Add ssmSe Size[ssmIdx] to
ssmSyntaxFifo[ssmIdx] ssmSyntaxFifo[ssmIdx] Y
The substream multiplexer is updated during each block time after encoder mode
selection. The selected mode’s syntax shall consist of at least ssmMinSeSize = 2 bits and
at most ssm_max_se_size bits for each substream. The substream multiplexer shall perform
the following steps once per block time, per substream:
1 If the current block time is greater than or equal to ssmSkew, all block bits for the current
substream are added to ssmBalanceFifo. For example, during the first block time, all bits
from Substream 0 of Block 0 shall be added to ssmBalanceFifo[0]. During Block Time 1,
Substream 0 of Block 1 shall be added to ssmBalanceFifo[0], followed by Substreams 1
through 3 of Block 0 to ssmBalanceFifo[1] through ssmBalanceFifo[3]. This shall continue
for the remainder of the slice. The number of bits added to ssmBalanceFifo per block time
shall also be placed at the end of ssmSyntaxFifo.
2 After ssmBalanceFifo has been updated, the de-multiplexer model shall be checked to
determine whether any mux words shall be requested as per Table 4-82. The de-multiplexer
shall request a mux word if the funnel shifter fullness for that substream is strictly less than
ssm_max_se_size.
Table 4-82: Encoder Substream Multiplexer Check for Mux-word Requests, Substream ssmIdx
decReqMuxWord[ssmIdx] = { true,
false,
ssmFunnelShifterFullness[ssmIdx] < ssm_max_se_size
otherwise
3 De-multiplexer model state is updated as per Table 4-83. If a mux word is requested during
the current block time, ssmFunnelShifterFullness is incremented by ssm_max_se_size. Current
block decoding is simulated by reducing ssmFunnelShifterFullness by the value at the front of
ssmSyntaxFifo, which is then popped.
5.1 Overview
C source code shall always be trusted when in conflict with the content of this Standard.
The decoder shall perform the decoding process, once per block time within a slice, as follows:
1 Substream de-multiplexer shall be responsible for requesting mux words and ensuring that the
four decoder funnel shifters always have a minimum of ssm_max_se_size bits available for
each block time.
2 Parser shall remove bits from the funnel shifters, which shall be used later during
block decoding.
Note: The decoder does not perform any testing between the different modes; instead, the mode
for each block and all associated information is contained explicitly within the syntax.
3 Rate controller’s state shall be updated.
4 Current block shall be decoded.
5 Steps 1 through 4 shall be repeated for the next block.
Figure 5-1 illustrates the substream de-multiplexer’s overall structure. Bracket notation is used
to specify a substream element. For example, ssmFunnelShifter[0] refers to the Substream 0
funnel shifter.
ssm[0]
ssmFunnelShifter[0] Par ser
ssm[1]
ssmFunnelShifter[1] Par ser
Bitstrea m
Sub stream
Rate B uffer
De-multiplexer ssm[2]
ssmFunnelShifter[2] Par ser
ssm[3]
ssmFunnelShifter[3] Par ser
During each block time, the substream de-multiplexer may request up to one mux word from each
of the four substreams. A mux word shall be requested for a substream ssmIdx if the following
condition is met:
fullness (ssmFunnelShifter[ssmIdx] ) < ssm_max_se_size
It is possible for all four substreams to request a mux word during the same block time. Should this
occur, the requests shall be fulfilled in numerical order (e.g., Substream 1 would receive a mux
word from the bitstream before Substream 2). It is also possible that no requests will be made.
There shall be a delay of one block time between the transmission of Substream 0 and the
remaining substreams. This shall be captured by parameter ssmSkew, as follows:
0, ssmIdx == 0
{
ssmSkew[ssmIdx] = 1, ssmIdx > 0
The mux word request progression for the four substreams shall proceed as described in Table 5-1.
Requests shall not be made for any substream if that substream’s decoder block index is less than
ssmSkew. Therefore, during Block Time 0, Substream 0 shall be the only substream to request a
mux word. During Block Time 1, Substreams 1 through 3 shall issue requests. For all other block
times, a substream shall request a mux word only if that substream’s funnel shifter fullness is
strictly less than ssm_max_se_size.
The bits present in the decoder funnel shifters after SSM requests have been fulfilled are necessary
for decoding the current block. At this point, the decoder parser and entropy decoder process
information directly from the SSM funnel shifters.
Substream
Funnel Shifter Parser Entropy Decoder Quantized Transform Coefficients
De-multiplexer
Bitstream
Substream
Funnel Shifter Parser Entropy Decoder Quantized Transform Coefficients
De-multiplexer
Substream
Funnel Shifter Parser Entropy Decoder Quantized Transform Coefficients
De-multiplexer
Substream
Funnel Shifter Parser Entropy Decoder BPV for Sub-block 1, Quantized Residuals
De-multiplexer
Bitstream
Substream
Funnel Shifter Parser Entropy Decoder BPV for Sub-block 2, Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser Entropy Decoder BPV for Sub-block 3, Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Bitstream
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Bitstream
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser Quantized Residuals
De-multiplexer
Substream
Funnel Shifter Parser BPV for Sub-block 1
De-multiplexer
Bitstream
Substream
Funnel Shifter Parser BPV for Sub-block 2
De-multiplexer
Substream
Funnel Shifter Parser BPV for Sub-block 3
De-multiplexer
In addition to the syntax parser, an entropy decoder shall be included in Substreams 1 through 3.
During entropy decoding, bits shall be removed from the funnel shifter and quantized coefficients
or residuals shall be generated. The number of entropy coding groups (ECGs) per substream shall
be determined by the chroma sampling format. Table 5-2 describes the slice syntax distribution
among the four substreams.
In addition to the block header, additional syntax may be included in Substream 0, dependent on the
block mode. (See Section 5.4 for further details.)
All syntax specific to these modes can be parsed by reading fixed-length fields in Substreams 0
through 3, as described in their respective sections.
To parse the ECG data, first the component skip flag ecCompSkip shall be parsed for
Components 1 and 2. (See Section 5.3.3.1.) ecCompSkip shall be disabled for Component 0
and shall not be included in the bitstream. For Transform mode, a field lastSigPos shall then
be parsed if component skip is inactive. This field shall be of length compSamplesLog2.
Following this, the entropy decoder shall process four ECGs per component. Depending on
the chroma format, component index, and the value of lastSigPos, certain ECG can be skipped
(i.e., ecgDataActive[ecgIdx] is false). This is determined by table look-up, as described in
Section 4.7.2. For any ECG with a non-skipped data portion, the decoding process shall be
conducted as described in Section 5.3.3.2.
If ecCompSkip is active, all quantized samples for the component shall be equal to 0, and
ecgDataActive[ecgIdx] is false for all four ECGs.
If ecCompSkip is inactive, there is at least one ECG within the component for which
ecgDataActive[ecgIdx] is true.
For any ECG in which ecgDataActive[ecgIdx] is true, a one-bit flag groupSkip shall be parsed to
indicate whether the group is skipped. If groupSkip == 1, all quantized samples in the group shall
be equal to 0, and parsing of the data portion of the ECG shall be complete. If groupSkip == 0,
the ECG shall be further parsed as described in Section 5.3.3.2.
After bitsReq is extracted from the bitstream, a fixed number of bits shall be parsed from the
bitstream, as follows:
bits = N × bitsReq
where:
• N is the number of samples in the ECG
The entropy decoder shall convert these bits into samples, as follows:
• SM bit representation – Magnitude shall be contained within the fixed-length field and
the sign bits shall be parsed separately
• 2C bit representation – Sample shall be determined directly from the fixed-length field
sample[i] = { temp,
temp – offset,
temp < thresh
otherwise
shift –= bitsReq
}
4-point Z(i, j)
Inverse DCT
Y5 Z5 Z5 db1 dc1 Z5
Y6 Z6 Z6 db2 dc2 Z6
Figure 5-8: Butterfly Structure for 8-point Inverse Discrete Cosine Transform
Table 5-8: 8-point Inverse Discrete Cosine Transform Applied to Row r of Y(i, j)
// first stage (re-order)
mapping = [0, 4, 2, 6, 7, 3, 5,1]
for (c = 0; c < numCols; c ++) {
Y(c, r) = Y(mapping[c], r)
}
// final stage
Z(0, r) = eb(0, r) + dc(3, r)
Z(1, r) = eb(1, r) + dc(2, r)
Z(2, r) = eb(2, r) + dc(1, r)
Z(3, r) = eb(3, r) + dc(0, r)
Z(4, r) = eb(3, r) – dc(0, r)
Z(5, r) = eb(2, r) – dc(1, r)
Z(6, r) = eb(1, r) – dc(3, r)
Z(7, r) = eb(0, r) – dc(3, r)
Y0 a0 Z0
Y1 a1 Z1
Y2 a2 Z2
Y3 a3 Z3
Figure 5-9: Butterfly Structure for 4-point Inverse Discrete Cosine Transform
Table 5-9: 4-point Inverse Discrete Cosine Transform Applied to Row r of Y(i, j)
5.4.1.2.3 Post-shift
The result of the inverse transforms shall be the set of reconstructed residuals Z(i, j). The scale shall
be adjusted before the final reconstructed residuals ( R (i, j) ) can be obtained. Table 5-10 describes
this adjustment, using the following coefficients:
• dctInvShift = 8
• dctInvRound = 128
sign = {
-1,
+1,
Z(c, r) < 0
otherwise
R (c, r) = sign × ( ( | Z(c, r) | + dctInvRound) >> dctInvShift)
}
}
5.4.1.3 Reconstruction
The intra predicted block is calculated, using the tables in Section 4.6.1.1 for the current block’s
parsed intra predictor type. The reconstructed residuals are then added to the predicted block and
clipped to the available range. If the source color space is RGB, Transform mode is calculated in
the YCoCg space. In this case, color-space conversion (CSC) is used to convert the reconstructed
samples back to RGB color space. This step is not required for YCbCr source content.
5.4.2 BP Mode
In BP mode, the block prediction vectors (BPVs) are parsed from Substream 0. This shall occur
in two steps:
1 Decoder shall parse a fixed-length four-bit field (bpvTable) from Substream 0. These four bits
shall define the partition structure of each 2x2 sub-block. For each bit in bpvTable:
• 0 signals a 2x1 partition (2x BPV)
• 1 signals a 2x2 partition (1x BPV)
The total number of BPV for each sub-block shall either be one or two.
2 The BPV for each sub-block shall be parsed from the corresponding substream (i.e., the one
or two BPVs associated with Sub-block i shall be parsed from Substream i).
For FBLS blocks, each BPV shall be represented by five bits because only 32 search range
positions are valid in this region. For NFBLS blocks, each BPV shall be represented by six bits
(64 possible positions). For example, if the current block is located within NFBLS and bpvTable
is 1011, the BPV shall be distributed as described in Table 5-11.
After the decoder parses the BPVs from the bitstream, the remaining syntax for BP mode consists
of a set of ECGs distributed among Substreams 1 through 3. These ECGs shall contain quantized
prediction residuals for all three color components. Following the procedure for entropy decoding
described in Section 5.3.3, the decoder shall receive the complete set of quantized residuals.
These residuals shall be inverse quantized, using the fractional quantizer described in Section 4.8.1
to generate the reconstructed residuals.
The predicted block shall be generated from the set of received BPV for each partition, using the
mapping between BPV index and BPV search range position described in Section 4.6.2.1. Next,
the reconstructed samples shall be generated by adding the predicted block to the reconstructed
residuals and clipping to the available range.
If the source color space is RGB, BP mode is calculated in the YCoCg color space. In this case,
CSC is used to convert the reconstructed samples back to RGB color space. This step is not
required for YCbCr source content.
If the input format is YCbCr, mppColorSpace is not included in the bitstream, and
curBlockColorSpace = YCbCr. Next, the step size (mppStepSize) is parsed. mppStepSize
shall be a fixed 3-bit unsigned field if bpc == 8, –or– a 4-bit unsigned field if bpc > 8.
The decoder shall then parse the quantized residuals from the bitstream in Substreams 0 through 3.
Each of the four substreams contain an equal number of samples, as described in Table 4-55.
Quantized MPP residuals within the bitstream are represented as unsigned values. After
parsing this unsigned value, the decoder shall add the value minCode to each sample,
where minCode = - (1 << (mppBitsPerComp – 1) ).
The number of bits for each quantized residual is determined by the component index and
mppStepSize. mppStepSizeComp denotes the step size for a given component, calculated
as a function of the source color space and mppColorSpace flag:
{
mppStepSizeMapCo[mppStepSize], mppColorSpace == 1, Co component
mppStepSizeMapCg[mppStepSize], mppColorSpace == 1, Cg component
mppStepSizeComp =
mppStepSize, otherwise
The number of bits per quantized residual for a given component (mppBitsPerComp) shall then
be determined, as follows:
mppBitsPerComp = compBitDepth – mppStepSizeComp
where:
bpc +1, YCoCg chroma components
compBitDepth = { bpc, otherwise
Reconstructed residuals shall be calculated by applying the scalar inverse quantizer (see
Section 4.8.2) to all quantized residuals. The final reconstructed pixels shall be obtained by first
calculating the midpoint in the same way as calculated on the encoder side (see Section 4.6.3),
and then adding the midpoint to the reconstructed residuals before clipping to the available range.
If mppColorSpace == 1, CSC shall be used to convert the block from YCoCg to RGB color space.
At this point, MPP block decoding shall be complete.
After mppfStepSizeComp is determined for each component, the quantized residuals shall
be parsed from the bitstream, using the same procedure as in MPP mode. (See Section 5.4.3.)
Quantized MPPF residuals within the bitstream are represented as unsigned values. After
parsing this unsigned value, the decoder shall add the value minCode to each sample,
where minCode = - (1 << (mppfBitsPerComp – 1) ).
Reconstructed residuals shall be calculated by applying the scalar inverse quantizer (see
Section 4.8.2) to all quantized residuals. The final reconstructed pixels shall be obtained by first
calculating the midpoint in the same way as calculated on the encoder side (see Section 4.6.3),
and then adding the midpoint to the reconstructed residuals before clipping to the available range.
If mppfColorSpace is YCoCg, CSC shall be used to convert the reconstructed block from YCoCg
to RGB color space.
At this point, MPPF block decoding shall be complete.
The flatness type used by QP update shall be directly parsed from the bitstream.
This annex provides guidance for calculating rate buffer size-related PPS parameters.
The physical rate buffer size (rc_buffer_max_size) shall be calculated from the compressed bit
rate and slice dimensions, as follows:
rc_buffer_max_size = (2 × rc_buffer_init_size) + (2 × slice_width × rc_target_rate_extra_fbls)
PPS parameter rc_buffer_init_size defines a constraint on the number of bits that can remain
within the rate buffer at the end of a slice during the encoding procedure. This means that when
encoding for Slice N is complete, there are still rc_buffer_init_size or fewer bits in the rate
buffer that need to be transmitted. The same shall be true for the start of the encoding process
for Slice N + 1. That is, at the beginning of a slice, the rate buffer shall be filled to at least
rc_buffer_init_size before Slice N + 1 transmission can begin.
First, rcBufferInitSizeTemp is calculated as a function of the slice width:
{
4096, slice_width ≤ 720
8192, 720 < slice_width ≤ 2048
rcBufferInitSizeTemp =
10752, otherwise
The delay associated with filling the rate buffer shall be denoted as rc_init_tx_delay (measured
in block times). rc_init_tx_delay is calculated, as follows:
rcBufferInitSizeTemp
rc_init_tx_delay = avgBlockRate
where:
• avgBlockRate = bpp, because bpp is stored with four bits of precision, and there
are 16 pixels per block
Note that rc_init_tx_delay is in addition to the delay associated with filling the SSM balance
FIFOs. (See Section 4.11.) During the initial transmission delay period, bits shall be removed from
the SSM balance FIFOs and placed into the rate buffer. After the initial transmission delay period
has concluded, transmission shall begin from the rate buffer into the bitstream.
The initial buffer size shall then be modified to be a multiple of avgBlockRate, as follows:
rc_buffer_init_size = avgBlockRate × rc_init_tx_delay
This annex provides guidance for a practical implementation of the rate buffer, including delays
associated with the rate buffer and with substream multiplexing.
The rate buffer discussed within the normative section of this Standard is an idealized rate buffer,
and is therefore the minimum required size of a practical rate buffer in a hardware implementation.
A practical rate buffer will likely grow from the idealized size to accommodate various factors,
such as horizontal blanking, pipeline delays, and so forth.
The initial transmission delay (rc_init_tx_delay) discussed within the normative section of this
Standard determines the number of block times between the compressed data first entering the
rate buffer from the SSM balance FIFOs and transmission from the rate buffer to the bitstream.
The total encoder delay of a practical implementation, at minimum, is the sum of the
following terms:
• rc_init_tx_delay
• Delay associated with filling the SSM balance FIFOs (see Section 4.11)
• Delay of one block time to account for the SSM skew between Substream 0 and the
other substreams (Substreams 1 through 3)
• Additional implementation-specific delays
Additional encoder buffering requirements that are needed to compensate for the above delays are
largely implementation-specific. However, this additional buffering must, at a minimum, include
ssmTransmitStartBufferAdj bits to reconcile the start of transmission of the SSM mux words
into the rate buffer with the constant bit rate transmission from the rate buffer.
ssmTransmitStartBufferAdj = (8 × ssm_max_se_size) – (2 × avgBlockRate)
In a practical encoder, the compressed bits can be distributed in one of the following, as long
as the encoding process proceeds as defined in this Standard:
• SSM balance FIFOs
• Rate buffer
• Combination of SSM balance FIFOs and rate buffer
A practical implementation of the decoder should include an initial decode delay period during
which bits accumulate within the rate buffer from the bitstream before any blocks are decoded
(i.e., removed from the rate buffer and placed into the SSM funnel shifters). To prevent an input
buffer underflow, this delay must, at a minimum, include initDecodeDelay. initDecodeDelay
is specified by the total buffer size and initial transmission delay, such that the total system
delay (totalDelay) through the encoder and decoder is a constant.
totalDelay = rc_buffer_max_size
bits_per_pixel
The PPS parameter rc_init_tx_delay value and consequential delays discussed within this
annex are measured in block times. The equivalent delay in pixel times is a delay value times
the number of pixels per block, which is 16, divided by the encoder or decoder throughput
in pixels per clock. For example, if a delay value is 100 block times and the decoder throughput
is 4 pixels per clock, the equivalent delay value is 400 pixel clocks.