vp9 Bitstream Specification v0.6 20160331 Draft
vp9 Bitstream Specification v0.6 20160331 Draft
ABSTRACT
This document defines the bitstream format and decoding process for the Google
VP9 video codec.
Contents Page
1 Scope ..................................................................................................................................................... 1
2 Terms and definitions ............................................................................................................................. 2
3 Symbols (and abbreviated terms)........................................................................................................... 5
4 Conventions............................................................................................................................................ 7
4.1 Arithmetic operators ............................................................................................................................... 7
4.2 Logical operators .................................................................................................................................... 7
4.3 Relational operators ............................................................................................................................... 7
4.4 Bitwise operators .................................................................................................................................... 7
4.5 Assignment ............................................................................................................................................. 8
4.6 Mathematical functions ........................................................................................................................... 8
4.7 Method of describing bitstream syntax ................................................................................................... 8
4.8 Functions .............................................................................................................................................. 10
4.9 Descriptors ........................................................................................................................................... 11
4.9.1 f(n) ........................................................................................................................................................ 11
4.9.2 s(n) ....................................................................................................................................................... 11
4.9.3 B(p) ....................................................................................................................................................... 11
4.9.4 L(n) ....................................................................................................................................................... 11
4.9.5 T ........................................................................................................................................................... 11
5 Overview of the decoding process (Informative) .................................................................................. 12
5.1 Purpose of VP9 .................................................................................................................................... 12
5.2 Compressing image data...................................................................................................................... 12
5.3 Quantization and lossy compression .................................................................................................... 13
5.4 Predicting image data ........................................................................................................................... 13
5.5 Inter prediction ...................................................................................................................................... 14
5.6 Superblocks .......................................................................................................................................... 15
5.7 Multiple transforms ............................................................................................................................... 16
5.8 Inverse DCT structure .......................................................................................................................... 16
5.9 Inverse ADST structure ........................................................................................................................ 19
5.10 Reference frames ................................................................................................................................. 19
5.11 Hidden frames ...................................................................................................................................... 20
5.12 Compound prediction ........................................................................................................................... 20
5.13 Motion vector prediction ....................................................................................................................... 20
5.14 Tiles ...................................................................................................................................................... 20
5.15 Segmentation map ............................................................................................................................... 21
5.16 Reference frame scaling....................................................................................................................... 21
5.17 Arithmetic coding .................................................................................................................................. 21
5.18 Probability updates ............................................................................................................................... 22
5.19 Chroma format...................................................................................................................................... 22
5.20 High bit depth ....................................................................................................................................... 23
5.21 Probability Contexts.............................................................................................................................. 23
5.22 Zigzag ordering..................................................................................................................................... 23
5.23 Loop filter .............................................................................................................................................. 24
5.24 Loop filter ordering and filters ............................................................................................................... 24
5.25 Frame structure .................................................................................................................................... 25
5.26 Superframes ......................................................................................................................................... 26
6 Bitstream syntax ................................................................................................................................... 27
6.1 Frame syntax ........................................................................................................................................ 27
6.1.1 Trailing bits syntax ................................................................................................................................ 27
1 Scope
This document specifies the Google VP9 bitstream format and decoding process.
2.10 Coded frame: The representation of one frame before the decoding process.
2.11 Component: One of the three sample value matrices (one luma matrix and two chroma matrices) or
its single sample value.
2.12 Compound prediction: a type of inter prediction where sample values are computed by blending
together predictions from two different reference frames.
2.13 Compressed header: An arithmetically encoded description of frame level settings of transform
mode and probability adjustments.
2.14 DC coefficient: A transform coefficient whose frequency indices are zero in both dimensions.
2.15 Decoded frame: The frame reconstructed out of the bitstream by the decoder.
2.16 Decoder: One embodiment of the decoding process.
2.17 Decoding process: The process that derives decoded frames from syntax elements.
2.18 Dequantization: The process in which transform coefficients are obtained by scaling the quantized
coefficients.
2.19 Encoder: One embodiment of the encoding process.
2.20 Encoding process: A process not specified in this Specification that generates the bitstream that
conforms to the description provided in this document.
2.21 Flag: A binary variable.
2.22 Frame: The representation of video signals in the space domain, composed of one luma sample
matrix (Y) and two chroma sample matrices (U and V).
2.23 Frame context: A set of probabilities used in the decoding process.
2.24 Golden frame: A frame that can be used in inter coding. Typically the golden frame is encoded with
higher quality and is used as a reference for multiple inter frames.
2.25 Inter coding: Coding one block or frame using inter prediction.
2.26 Inter frame: A frame compressed by referencing previously decoded frames and which may use
intra prediction or inter prediction.
2.27 Inter prediction: The process of deriving the prediction value for the current frame using previously
decoded frames.
2.28 Intra coding: Coding one block or frame using intra prediction.
2.29 Intra frame: A frame compressed using only intra prediction which can be independently decoded.
2.30 Intra-only frame: A type of intra frame that does not reset the decoding process.
NOTE – A key frame is different to an intra-only frame even though both only use intra prediction. The
difference is that a key frame fully resets the decoding process.
2.31 Intra prediction: The process of deriving the prediction value for the current sample using previously
decoded sample values in the same decoded frame.
2.32 Inverse transform: The process in which a transform coefficient matrix is transformed into a spatial
sample value matrix.
2.33 Key frame: A frame where the decoding process is reset. Key frames, and following frames, are
always decodable without access to preceding frames. A key frame only uses intra prediction.
2.34 Level: A defined set of constraints on the values for the syntax elements and variables.
2.35 Loop filter: A filtering process applied to the reconstruction intended to reduce the visibility of block
edges.
2.36 Luma: A sample value matrix or a single sample value representing the monochrome signal related
to the primary colors.
NOTE – The symbol representing luma is Y.
2.37 Mode info: A header describing one or more blocks. Blocks of size 8x8 and larger are described
with a single mode info header. Blocks of size less than 8x8 share a mode info header that covers
the whole 8x8 block of luma samples.
2.38 Mode info block: A luma sample value block of size 8x8 or larger and its two corresponding chroma
sample value blocks. A mode info block has a single mode info header.
2.39 Motion vector: A two-dimensional vector used for inter prediction which refers the current frame to
the reference frame, the value of which provides the coordinate offsets from a location in the current
frame to a location in the reference frame.
2.40 Parse: The procedure of getting the syntax element from the bitstream.
2.41 Prediction: The implementation of the prediction process consisting of either inter or intra prediction.
2.42 Prediction process: The process of estimating the decoded sample value or data element using a
predictor.
2.43 Prediction value: The value, which is the combination of the previously decoded sample values or
data elements, used in the decoding process of the next sample value or data element.
2.44 Profile: A subset of syntax, semantics and algorithms defined in a part.
2.45 Quantization parameter: A variable used for scaling the quantized coefficients in the decoding
process.
2.46 Quantized coefficient: A transform coefficient before dequantization.
2.47 Raster scan: Maps a two dimensional rectangular raster into a one dimensional raster, in which the
entry of the one dimensional raster starts from the first row of the two dimensional raster, and the
scanning then goes through the second row and the third row, and so on. Each raster row is scanned
in left to right order.
2.48 Reconstruction: Obtaining the addition of the decoded residual and the corresponding prediction
values.
2.49 Reference frame: A previously decoded frame used during inter prediction.
2.50 Reserved: A special syntax element value which may be used to extend this part in the future.
2.51 Residual: The differences between the reconstructed samples and the corresponding prediction
values.
2.52 Sample: The basic elements that compose the frame.
2.53 Sample value: The value of a sample. This is an integer from 0 to 255 (inclusive) for 8-bit frames,
from 0 to 1023 (inclusive) for 10-bit frames, and from 0 to 4095 (inclusive) for 12-bit frames.
2.54 Segmentation map: a 3-bit number containing the segment affiliation for each 8x8 block in the
image. The segmentation map persists across frames.
2.55 Sequence: The highest level syntax structure of coding bitstream, including one or several
consecutive coded frames.
2.56 Subblock: A 4x4, 4x8, or 8x4 block. All the subblocks within an 8x8 block share a single mode info
header.
2.57 Superblock: A square block of 64x64 pixels that consists of either 1 or 2 mode info blocks or is
recursively partitioned into 4 32x32 blocks, which themselves can be further partitioned.
2.58 Superframe: a chunk of data containing one or more coded frames plus an index at the end
describing the number and sizes of the coded frames.
2.59 Syntax element: An element of data represented in the bitstream.
2.60 Tile: a rectangular region of the frame that is intended to be able to be decoded and encoded
independently, although loop-filtering across tile edges may still be applied.
NOTE – For VP9 this intention is only partially met. Partitioning into columns works as expected, but
partitioning into rows does not. The decode of a tile depends on the decode of the tile above.
2.61 Transform block: A square transform coefficient matrix, used as input to the inverse transform
process.
2.62 Transform coefficient: A scalar value, considered to be in a frequency domain, contained in a
transform block.
2.63 Uncompressed header: High level description of the frame to be decoded that is encoded without
the use of arithmetic encoding.
4 Conventions
The mathematical operators and their precedence rules used to describe this Specification are similar to those
used in the C programming language. However, the operation of integer division with truncation is specifically
defined.
In addition, an array with 2 elements used to hold a motion vector (indicated by the variable name ending with
the letters Mv or Mvs) can be accessed using either normal array notation (e.g. Mv[ 0 ] and Mv[ 1 ]), or by just
the name (e.g. Mv). The only operations defined when using the name are assignment and equality/inequality
testing. Assignment of an array is represented using the normal notation A = B and is specified to mean the
same as doing both the individual assignments A[ 0 ] = B[ 0 ] and A[ 1 ] = B[ 1 ]. Equality testing of 2 motion
vectors is represented using the notation A == B and is specified to mean the same as (A[ 0 ] == B[ 0 ] &&
A[ 1 ] == B[ 1 ]). Inequality testing is defined as A != B and is specified to mean the same as (A[ 0 ] != B[ 0 ] ||
A[ 1 ] != B[ 1 ]).
When a variable is said to be representable by a signed integer with x bits, it means that the variable is
greater than or equal to -(1<<(x-1)), and that the variable is less than or equal to (1<<(x-1))-1.
4.1 Arithmetic operators
+ Addition
– Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Integer division with truncation of the result toward zero. For example, 7/4 and -7/-4 are
truncated to 1 and -7/4 and 7/-4 are truncated to -1.
a%b Remainder from division of a by b. Both a and b are positive integers.
a << b Shift “a” in 2’s complement binary integer representation format to the left by b bit positions.
This operator is only used with b being a non-negative integer. Bits shifted into the LSBs as
a result of the left shift have a value equal to 0.
4.5 Assignment
= Assignment operator
++ Increment, x++ is equivalent to x = x + 1. When this operator is used for an array index, the
variable value is obtained before the auto increment operation
-- Decrement, i.e. x-- is equivalent to x = x - 1. When this operator is used for an array index,
the variable value is obtained before the auto decrement operation
+= Addition assignment operator, for example x += 3 corresponds to x = x + 3
-= Subtraction assignment operator, for example x -= 3 corresponds to x = x - 3
⎧ x; x ≥ 0
Abs( x) = ⎨
⎩− x; x < 0
⎧ x; z < x
⎪
Clip3( x, y, z ) = ⎨ y; z > y
⎪ z; otherwise
⎩
⎧ x; x <= y
Min( x, y ) = ⎨
⎩ y; x > y
⎧ x; x >= y
Max( x, y ) = ⎨
⎩ y; x < y
the decoding process for later syntax structures. Variables starting with a lower case letter are only used
within the process from which they are derived.
Constant values appear in all upper case letters with underscore characters.
Constant lookup tables appear in all lower case letters with underscore characters.
Hexadecimal notation, indicated by prefixing the hexadecimal number by “0x”, may be used when the number
of bits is an integer multiple of 4. For example, “0x1a” represents a bit string “0001 1010”.
Binary notation is indicated by prefixing the binary number by “0b”. For example, 0b00011010 represents a bit
string “0001 1010”. Binary numbers may include underscore characters to enhance readability. If present,
the underscore characters appear every 4 binary digits starting from the LSB. For example, 0b11010 may
also be written as 0b1_1010.
A value equal to 0 represents a FALSE condition in a test statement. The value TRUE is represented by any
value not equal to 0.
The following table lists examples of the syntax specification format. When syntax_element appears (with
bold face font), it specifies that this syntax element is parsed from the bitstream.
Type
/* A statement can be a syntax element with associated descriptor or can be an
expression used to specify its existence, type, and value, as in the following examples
*/
syntax_element f(1)
/* A “do … while” structure executes the statement once, and then tests the condition.
It repeatedly evaluates the statement while the condition remains true. */
Do
Statement
while ( condition )
/* An “if … else” structure tests the condition first. If it is true, the primary statement is
evaluated. Otherwise, the alternative statement is evaluated. If the alternative
statement is unnecessary to be evaluated, the “else” and corresponding alternative
statement can be omitted. */
if ( condition )
Primary statement
Else
Alternative statement
/* A “for” structure evaluates the initial statement at the beginning then tests the
condition. If it is true, the primary and subsequent statements are evaluated until the
condition becomes false. */
for ( initial statement; condition; subsequent statement )
Primary statement
4.8 Functions
Functions used for syntax description are specified in this section.
The specification of these functions makes use of a bitstream position indicator. This bitstream position
indicator locates the position of the bit that is going to be read next.
get_position( ): Return the value of the bitstream position indicator.
init_bool( sz ): Initialize the arithmetic decode process for the boolean decoder with a size of sz bytes
as specified in section 9.2.1.
exit_bool( ): Exit the arithmetic decode process as described in section 9.2.3.
4.9 Descriptors
The following descriptors specify the parsing of syntax elements. Lower case descriptors specify syntax
elements that are represented by a fixed integer number of bits in the bitstream; upper case descriptors
specify syntax elements that are represented by arithmetic coding.
4.9.1 f(n)
Unsigned n-bit number appearing directly in the bitstream. The bits are read from high to low order. The
parsing process specified in section 9.1 is invoked and the syntax element is set equal to the return value.
4.9.2 s(n)
Signed integer using n bits for the value and 1 bit for a sign flag. The parsing process for this descriptor is
specified below:
s(n) { Type
value f(n)
sign f(1)
return sign ? -value : value
}
4.9.3 B(p)
A single arithmetic encoded bit with estimated probability p/256 of being 0. The syntax element is set equal to
the return value of read_bool( p ) (see section 9.2.2 for a specification of this process).
4.9.4 L(n)
Unsigned arithmetic encoded n-bit number encoded as n flags (a "literal"). The bits are read from high to low
order. The syntax element is set equal to the return value of read_literal( n ) (see section 9.2.4 for a
specification of this process).
4.9.5 T
An arithmetic tree encoded value from a small alphabet. Such values represent the leaves of a small binary
tree. The (non-leaf) nodes of the tree have associated probabilities p and are represented by B(p). A zero
represents choosing the left branch below the current node and a one represents choosing the right branch.
Each element of this type defined in this document has an associated table of values defined in this document.
Reference is made to those tables when required. (See section 9.3 for the specification of this process).
Every value (leaf) whose tree depth is x is decoded using x B(p) values.
There are many ways that a given alphabet can be represented. The choice of tree has little impact on data
rate but does affect decoder performance. The trees used by VP9 are chosen to (on average) minimize the
number of calls to read_bool (the function used to extract B(p) from the bitstream). This is equivalent to
shaping the tree so that values that are more probable have smaller tree depth than do values that are less
probable.
We have sixteen 8 bit numbers here which need 16*8=128 bits to store in a raw format. However, this part of
the image is so flat that we could probably represent it as a flat area with a single value of 160 without an
observer noticing any difference. This would only need 8 bits.
Similarly, suppose we had an area that looked like this:
10 20 30 40
10 20 30 40
10 20 30 40
10 20 30 40
In this case the image gradually increases from left to right so if we had some way of specifying the slope we
could represent all 16 values with fewer bits.
VP9 approaches this is by means of a reversible transform that adjusts the numbers to try and make most
numbers small and a few numbers large. The essence of the approach is to take two numbers (e.g. 162 and
160) and transform these into the sum of the numbers, and the difference between the numbers
(162+160=322 and 162-160=2). If the decoder is given the 322 and 2 it can reconstruct the original numbers
by computing the sum and difference divided by two (322+2)/2 = 162 and (322-2)/2 = 160. The full transform
takes the sum and differences of pairs of pixels, and performs further similar operations on both the rows and
the columns.
Overall this results in a transform that takes the 16 original pixels into 16 transformed values. The
transformed values are still in a square grid, but now the axes represent horizontal and vertical frequency.
This means that if the image is flat the transform will have just a single non-zero coefficient in the top left
(called the DC coefficient).
This transform is useful because we can compact a block of similar pixel values into a smaller number of non-
zero transform coefficients in the frequency domain, so that the transformed coefficients can typically be
represented by fewer bits than the original.
5.3 Quantization and lossy compression
In the example above our almost flat image would transform into a large DC coefficient and small values for
the other coefficients (called AC coefficients). Although this is already an improvement, we can compress
better by quantizing the coefficients. This means that we divide the coefficients by a quantization factor
before encoding them, and then in the decode process multiply by the quantization factor.
For example, suppose that we used a quantization factor of 10. Instead of sending the numbers 322 and 2,
we would instead send 322/10=32 and 2/10=0 (where we have rounded down to keep the numbers as
integers). In the decoding process we would compute 32*10=320 and 0*10=0, followed by the transform
(320+0)/2=160 and (320-0)/2 = 160.
So this has resulted in the decode of two values (160 and 160) that are close, but not exactly the same as our
source image for the benefit of only needing to transmit the numbers 32 and 0. As we no longer decode to an
exact match of the source data, this is known as lossy compression.
Lossy compression is used for most broadcast videos as it results in large bandwidth savings, but for some
applications (such as video editing) it is useful to be able to use lossless compression. Lossless compression
avoids the growth of the small errors introduced by each repeated application of lossy compression if the
same video sequence were to be repeatedly decompressed and recompressed.
VP9 supports both lossy and lossless coding. Lossless coding is indicated by using the smallest quantization
factor and this automatically switches to use a perfectly invertible transform known as the Walsh-Hadamard
transform.
5.4 Predicting image data
Suppose we are part way through decoding an image and have decoded the pixels shown below.
The green cells represent pixels that we have already decoded, while the question marks represent a 4 by 4
block of pixels that we are about to decode.
It seems natural to predict that some of the missing pixels on the left are at least close to the value 160 even
before we have seen them, while some of the ones on the right are probably close to 200. However, it is quite
possible that the image looks like:
or like
VP9 contains what is called an intra mode that specifies a direction such as vertical for the first case, or 45
degrees for the second case. When decoding a block the decoder first reads the intra mode, and uses that to
filter already decoded pixels from the current frame to form a prediction for the contents of the block. This is
known as intra prediction and this method is called decoding an intra block.
Of course, the actual content of our source image data is unlikely to be exactly the same as the prediction.
Suppose the actual contents are:
Consider the residual block. This is defined to be the difference between the prediction (assume we have
been instructed to use vertical prediction) and the source image:
0 0 0 0
0 0 0 0
10 10 10 10
10 10 10 10
The residual only contains small numbers, so is cheaper to represent than the original contents.
In VP9 all blocks are represented by some information specifying how to predict the contents of the block (in
this case the intra mode), plus transform coefficients of the residual block. The decoder works by first
computing the prediction, and then inverse transforming the transform coefficients and adding the result (the
residual) to this prediction.
This process is followed even for the first blocks in the video where we do not have any decoded pixels. In
these cases the decoder pretends that it has decoded pixels with a fixed value for the off-screen locations.
5.5 Inter prediction
Suppose we are now trying to compress a whole video sequence.
Consider what we can predict about the next image from the previous one: there may be some still parts of
the image in the background, so some blocks may be identical to their contents in the previous frame.
Similarly, if the camera is panning or some object is moving, there may be blocks that are very similar to a
slightly shifted part of the previous frame.
VP9 takes advantage of these cases by using inter blocks. An inter block contains a motion vector that
specifies the offset in the previous frame of the part of the image to use as a prediction for this block. So, for
example, still blocks will be represented by a zero motion vector. The motion vector contains information
about both a vertical and horizontal offset to allow for both types of movement.
As for intra blocks, the decoding process works by first computing the prediction, and then inverse
transforming the transform coefficients and adding the result (the residual) to this prediction.
The motion vectors can specify shifts in units of whole pixels, or shifts containing a fractional pixel offset.
When a fractional pixel shift is used, the previous frame is filtered in order to give a more accurate prediction.
It is also possible to choose the type of interpolation filter used in this filtering. The main difference is in the
filter bandwidth. If the source frames are noisy, it can be appropriate to use a narrow bandwidth filter to
discard the noise, while if the source frames are clean we can use a higher bandwidth filter to try and preserve
more of the high frequency texture. This choice can either be made for all blocks in a frame, or specified per
block.
5.6 Superblocks
In some parts of the image there may be a large region that can all be predicted at the same time (e.g. a still
background image), while in other parts there may be some fine details that are changing (e.g. in a talking
head) and a smaller block size would be appropriate. VP9 provides the ability to vary the block size to handle
a range of prediction sizes.
The decoder decodes each image in units of 64 by 64 pixel superblocks. Each superblock has a partition
which specifies how it is to be encoded. It can consist of:
− A single 64 by 64 block
− Two 64 by 32 blocks
− Two 32 by 64 blocks
− Four 32 by 32 blocks
The individual parts are decoded in raster order.
Each 32 by 32 block can also be partitioned in a similar way all the way down until we reach an 8x8 block
which has the choices:
− A single 8 by 8 block
− Two 8 by 4 subblocks
− Two 4 by 8 subblocks
− Four 4 by 4 subblocks
An example partitioning of a 64 by 64 superblock is shown below:
64 pixels
1 2 5
3 4 (8x8)
0 15 16
(16x16) (16x16) (16x16)
6 7
(8x8) (8x8)
8 9 13
(8x8) 10 (16x8)
17 18
(16x16) (16x16)
11 12 14
(8x8) (8x8) (16x8)
64 pixels
19
(32x16)
21
(32x32)
20
(32x16)
The numbers give the decode order of the blocks, while the numbers in brackets give the block sizes. The
blocks without sizes are subblocks of 8x8 blocks.
The difference between a block and a subblock is that a single block header (called mode info) is sent for all
the subblocks within an 8x8 region, while each block has its own block header. The subblocks can have
different intra modes, or motion vectors, but they share some other information such as which reference frame
to predict from.
5.7 Multiple transforms
VP9 specifies a number of different transforms that can be applied to the residual. These differ in size (4x4,
8x8, 16x16, 32x32 are supported) and in the type of transform. The type of transform can be varied
independently for rows and columns as being either an integer precision version of a Discrete Cosine
Transform or an Asymmetric Discrete Sine Transform. The choice of transform type is deduced from the
intra mode, while the choice of transform size can either be specified at a frame level, or given on a per-block
basis.
The idea is that when we are doing intra prediction, normally the samples near the known edges are predicted
better than the ones further away so the errors are usually small at one side. The Asymmetric Discrete Sine
Transform does a better job of transforming this shape because it has basis vectors that tend to vanish at the
known boundary, and maximize energy at the unknown boundary.
When the transform size is smaller than the block size, the transform blocks are predicted and reconstructed
in raster order. For example, suppose we had a 64x64 intra block using a 16x16 transform size. The blocks
would be processed in this order:
0 1 2 3
(16x16) (16x16) (16x16) (16x16)
4 5 6 7
(16x16) (16x16) (16x16) (16x16)
8 9 10 11
(16x16) (16x16) (16x16) (16x16)
12 13 14 15
(16x16) (16x16) (16x16) (16x16)
There are two types of butterfly (represented by B and H). The H type of butterfly represents a matrix
combined with scaling such that it can be implemented with one addition and one subtraction operation
whereas the B type of butterfly requires the use of multiplication.
The structure of the 32 point inverse DCT is shown in the butterfly diagram below (not including the input
n n-1
shuffle). The structure of the 2 point inverse DCT is such that it contains the 2 point inverse DCT within it.
This recursion is highlighted in the diagram to also show the 4, 8, and 16 point inverse DCTs.
The structure of the 16 point inverse ADST is shown in the diagram below.
The 4 point ADST is treated as a special case and is implemented as eight multiplications followed by a
number of addition/subtraction/shift operations.
5.10 Reference frames
If an object is moving across a scene it can happen that the best source of an inter prediction for a block is not
the previous frame (where the block was obscured by the moving object), but the frame before that.
VP9 provides options for inter blocks to specify which frame is used as the reference frame. A decoder
maintains 8 slots, each slot with a decoded reference frame. When a new frame is decoded, the frame
header specifies which of the slots should be overwritten with the new frame.
Although 8 slots are maintained, any particular frame can make use of at most 3 reference frames. Which
reference frames to use are specified in the frame header, and then the detailed choice between these 3 is
specified in the mode info at the coding block level. Each block (of size above or equal to 8x8) is allowed to
use up to 2 reference frames. All the sub8x8 blocks inside an 8x8 block share the same reference frame
combination, which allows up to 2 reference frames.
a b c d
0 2 3
e f g h
4 5 6 7
Tiles have dimensions that are multiples of 64 by 64 superblocks and are evenly spaced, as far as possible.
Tiles are not intended to help reduce bandwidth (in fact they can hurt compression a small amount), but the
objective is to allow implementations to take advantage of parallel processing by encoding/decoding different
tiles at the same time. The tile sizes are sent at the start of each tile (except the last) so a decoder can know
the start points if it wishes to do parallel decoding.
5.15 Segmentation map
VP9 provides a means of segmenting the image and then applying various signals or adjustments at the
segment level. Segmentation can be particularly efficient and useful when the segmentation of a video
sequence persists or changes little for many frames.
Up to 8 segments may be specified for any given frame. For each of these segments it is possible to specify:
− a quantizer,
− a loop filter strength,
− a prediction reference frame,
− a block skip mode that implies the use of a zero motion vector and that no residual will be coded.
Each of these data values for each segment may be individually updated at the frame level. Where a value is
not updated in a given frame, the value from the previous frame persists. The exceptions to this are key
frames, intra only frames or other frames where independence from past frame values is required (for
example to enable error resilience). In such cases all values are reset to a default.
It is possible to indicate segment affiliation for any prediction block of size 8x8 pixels or greater. Updates to
this segmentation map are explicitly coded using either a temporal coding or direct coding strategy (chosen
at the frame level).
If no explicit update is coded for a block’s segment affiliation, then it persists from frame to frame with the
same provisos detailed above for the segment data values. In regard to key frames, intra only frames and
frames where independence from past frames is required, the segment affiliation for each block defaults to 0
unless explicitly updated.
Internally, segment affiliation is stored at the resolution of 8x8 blocks (a segment map). This can lead to
conflicts when, for example, a transform size of 32x32 is selected for a 64x64 region. If the different
component 8x8 blocks that comprise a larger region have different segment affiliations, then the segment
affiliation for the larger region is defined as being the lowest segment id of any of the contributing 8x8 regions.
5.16 Reference frame scaling
It is legal for different decoded frames to have different frame sizes (and aspect ratios). VP9 automatically
handles resizing predictions from reference frames of different sizes.
However, reference frames must share the same color depth and subsampling format for reference frame
scaling to be allowed, and the amount of up/down scaling is limited to be no more than 16x larger and no less
than 2x smaller (e.g. the new frame must not be more than 16 times wider or higher than any of its used
reference frames).
5.17 Arithmetic coding
Suppose we have 4 symbols (for example, the inter mode can be NEW, NEAREST, NEAR, ZERO), that we
wish to encode. If these are all equally likely then encoding each with 2 bits would be fine:
MODE BITS
NEW 00
NEAR 01
NEAREST 10
ZERO 11
However, suppose ZERO happened 50% of the time, NEAREST 25%, and NEAR/NEW 12.5%. In this case
we could use the variable length codes:
MODE BITS
NEW 000
NEAR 001
NEAREST 01
ZERO 1
This scheme would now give fewer bits on average than the uniform encoding scheme.
Now suppose that ZERO happened 90% of the time. Arithmetic coding provides a way to allow us to
effectively use a fraction of a bit in this case.
At the lowest level VP9 contains a boolean decoder which decodes one boolean value (0 or 1) at a time
given an input containing the estimated probability of the value. If the boolean value is much more likely to be
a 1 than a 0 (or the other way around), then it can be faithfully coded using less than 1 bit per boolean value
on average using an arithmetic coder.
The boolean decoder works using a small set of unsigned 16-bit integers and an unsigned 16-bit multiplication
operation.
5.18 Probability updates
The boolean decoder produces fewest bits when the estimated probabilities for the different syntax elements
match the actual frequency with which the different cases occur.
VP9 provides two mechanisms to match these up:
1. The probabilities can be explicitly changed in the frame headers. (In fact the probability changes are
themselves coded using an arithmetic coder to reduce the cost of this process.)
2. The boolean decoder keeps track of how many times each type of syntax element is decoded and can be
told to automatically adjust the probabilities at the end of the frame to match the observed frequencies.
The idea with the first method is that we can reduce the number of bits to code the frame by setting the
probabilities accurately – but the cost is that we need to spend bits to perform the updates.
The idea with the second method is that the probabilities for the next frame are probably quite similar to the
ones for this frame, so adapting the probabilities at the end of the frame can help to improve compression.
5.19 Chroma format
The human visual system is said to be less sensitive to color than to luminance so images are often coded
with fewer chroma samples than luminance samples.
VP9 provides the option for the 2 chroma planes (called U and V) to be subsampled in either the horizontal or
vertical direction (or both, or neither).
In profiles 0 and 2, only 4:2:0 format is allowed, which means that chroma is subsampled in both the
horizontal and vertical direction. In profiles 1 and 3, all other subsampling formats are allowed.
5.20 High bit depth
VP9 supports the option to output pixels using either 8, 10, or 12 bits per color sample.
In profiles 0 and 1, only 8 bits per color sample is allowed.
In profiles 2 and 3, only greater than 8 bits per color sample is allowed.
5.21 Probability Contexts
When coding a syntax element, such as whether the block is skipped, VP9 defines a process to determine
which probability to use. The choice of probability is based on the context of the syntax element, e.g. on how
that syntax element has been decoded in the past for blocks that are similar in some way – such as being
close or being of the same size.
This process makes it more likely that the decoder can accurately predict the probability distribution of a
syntax element and therefore can represent the syntax element using fewer bits on average.
5.22 Zigzag ordering
The transform coefficients of natural blocks tend to be clustered around the low frequency end. This means
that there are often only a few non-zero coefficients in a block and these are clustered in one corner of the
transformed block.
VP9 decodes the coefficients in a special zig-zag order such that the first coefficient read is the DC coefficient,
and then the order gradually moves outward to higher frequency coefficients.
A bool is decoded after each non-zero coefficient that signals whether there are any more non-zero
coefficients in the whole transform block. When this condition is detected, the decoder can immediately fill in
the whole rest of the transform block with zeros without consuming any more bits from the bitstream.
Depending on the direction of intra prediction, the transform coefficients are often clustered towards the left or
the top side of the transform block. Therefore VP9 selects the scan order based on the intra prediction
direction.
An example scan ordering for a 4x4 block is illustrated in the diagram below, where the numbers and arrows
indicate the order of the decode process.
0 2 5 8
1 3 9 12
4 7 11 14
6 10 13 15
Each of the individual 4x4 blocks looks reasonably close to the original, but the discontinuity in the middle
stands out. This is quite a common problem and block edges appear in the decoded images.
To reduce the impact of these errors, a process called the loop filter is applied to the block edges in the image.
This process filters the image pixels across the block boundaries in an attempt to smooth off such sudden
discontinuities. The block boundaries that are filtered include both the edges between transform blocks and
the edges between different mode info blocks.
This process is known as an in-loop filter because the filtered versions of frames are used for reference in
inter prediction.
5.24 Loop filter ordering and filters
The loop filter operates on a raster scan order of superblocks.
For each superblock, the loop filter is first applied to the left vertical boundary and all internal vertical
boundaries (shown in red in the diagram below).
The loop filter is then applied to the top horizontal boundary and all internal horizontal boundaries (shown in
blue).
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
The numbers indicate the order in which the boundaries are processed.
For each boundary, the filtering operations depends on up to 8 samples on either side of the edge, and may
modify up to 7 samples on either side of the edge. (This is true for both luma and chroma and in both
subsampled and non-subsampled modes of operation.) The regions outside the superblock which may be
used by the filter process are shaded grey in the diagram.
5.25 Frame structure
The coded bytes are stored in sequence as shown below:
Frame 0
Uncompressed Compressed
Tile 0 Tile 1
Header Header
Byte 0
Byte 1
Byte 2
The first bytes contain the uncompressed header. This contains almost all the frame level information using
raw binary encodings (i.e. no arithmetic coding).
The compressed header follows the uncompressed header and specifies the transform size to use during the
frame plus information about which probabilities to adjust. The information in this second header is
compressed using arithmetic coding.
The headers are followed by the bytes for each tile in turn. Each tile contains the tile size (omitted for the final
tile) followed by the arithmetic coded data for the tile.
This structure is used for normal frames. There are also short frames that simply contain 1 byte of
uncompressed header (or 2 for profile 3) that indicate that the decoder should show a previously decoded
frame. These short frames have no compressed header and no tile data.
5.26 Superframes
VP9 supports consolidating multiple compressed video frames into one single chunk called a superframe.
The superframe index is stored in the last bytes of the chunk (and is up to 34 bytes long). The enclosed
frames can be located by parsing this superframe index:
Last byte
Byte 0
Byte 1
Byte 2
...
From the point of view of the container format, this whole superframe is stored together. This format can be
useful to ensure that each superframe produces a single decoded frame even though the video is coded using
unshown frames. However, it is also legal for a superframe to result in multiple output frames, or even no
output frames.
6 Bitstream syntax
This section presents the bitstream syntax in a tabular form. The meaning of each of the syntax elements is
presented in section 7.
6.1 Frame syntax
frame( sz ) { Type
startBitPos = get_position( )
uncompressed_header( )
trailing_bits( )
if ( header_size_in_bytes == 0 ) {
while ( get_position( ) < startBitPos + 8 * sz)
padding_bit f(1)
return
}
load_probs( frame_context_idx )
load_probs2( frame_context_idx )
clear_counts( )
init_bool( header_size_in_bytes )
compressed_header( )
exit_bool( )
endBitPos = get_position( )
headerBytes = (endBitPos - startBitPos) / 8
decode_tiles( sz - headerBytes )
refresh_probs( )
}
} else {
color_space = CS_BT_601
subsampling_x = 1
subsampling_y = 1
BitDepth = 8
}
refresh_frame_flags f(8)
frame_size( )
render_size( )
} else {
refresh_frame_flags f(8)
for( i = 0; i < 3; i++ ) {
ref_frame_idx[ i ] f(3)
ref_frame_sign_bias[ LAST_FRAME + i ] f(1)
}
frame_size_with_refs( )
allow_high_precision_mv f(1)
read_interpolation_filter( )
}
}
if ( error_resilient_mode == 0 ) {
refresh_frame_context f(1)
frame_parallel_decoding_mode f(1)
} else {
refresh_frame_context = 0
frame_parallel_decoding_mode = 1
}
frame_context_idx f(2)
if ( FrameIsIntra || error_resilient_mode ) {
setup_past_independence ( )
if ( frame_type == KEY_FRAME || error_resilient_mode == 1
|| reset_frame_context == 3 ) {
for ( i = 0; i < 4; i ++ ) {
save_probs( i )
}
} else if ( reset_frame_context == 2 ) {
save_probs( frame_context_idx )
}
frame_context_idx = 0
}
loop_filter_params( )
quantization_params( )
segmentation_params( )
tile_info( )
header_size_in_bytes f(16)
compute_image_size( )
}
is_filter_switchable f(1)
if ( is_filter_switchable == 1 ) {
interpolation_filter = SWITCHABLE
} else {
raw_interpolation_filter f(2)
interpolation_filter = literal_to_type[ raw_interpolation_filter ]
}
}
}
segmentation_update_data f(1)
if ( segmentation_update_data == 1 ) {
segmentation_abs_or_delta_update f(1)
for ( i = 0; i < MAX_SEGMENTS; i++ ) {
for ( j = 0; j < SEG_LVL_MAX; j++ ) {
feature_value = 0
feature_enabled f(1)
FeatureEnabled[ i ][ j ] = feature_enabled
if ( feature_enabled == 1 ) {
bits_to_read = segmentation_feature_bits[ j ]
feature_value f(bits_to_read)
if ( segmentation_feature_signed[ j ] == 1 ) {
feature_sign f(1)
if ( feature_sign == 1 )
feature_value *= -1
}
}
FeatureData[ i ][ j ] = feature_value
}
}
}
}
}
The constant lookup tables used in this syntax are defined as:
segmentation_feature_bits[ SEG_LVL_MAX ] = { 8, 6, 2, 0 }
segmentation_feature_signed[ SEG_LVL_MAX ] = { 1, 1, 0, 0 }
return minLog2
}
calc_max_log2_tile_cols( ) { Type
maxLog2 = 1
while ( (Sb64Cols >> maxLog2) >= MIN_TILE_WIDTH_B64 )
maxLog2++
return maxLog2 - 1
}
if ( update_mv_prob == 1 ) {
mv_prob L(7)
prob = (mv_prob << 1) | 1
}
return prob
}
init_bool( tile_size )
decode_tile( )
exit_bool( )
}
}
}
decode_partition( r, c, subsize )
decode_partition( r, c + halfBlock8x8, subsize )
decode_partition( r + halfBlock8x8, c, subsize )
decode_partition( r + halfBlock8x8, c + halfBlock8x8, subsize )
}
if ( bsize == BLOCK_8X8 || partition != PARTITION_SPLIT ) {
for ( i = 0; i < num8x8 ; i ++ ) {
AbovePartitionContext[ c + i ] = 15 >> b_width_log2_lookup[ subsize ]
LeftPartitionContext[ r + i ] = 15 >> b_height_log2_lookup[ subsize ]
}
}
}
}
}
}
else
is_inter T
}
}
}
6.4.19 MV syntax
read_mv( ref ) { Type
UseHp = allow_high_precision_mv && use_mv_hp( BestMv[ ref ] )
diffMv = ZeroMv
mv_joint T
if ( mv_joint == MV_JOINT_HZVNZ || mv_joint == MV_JOINT_HNZVNZ )
diffMv[ 0 ] = read_mv_component( 0 )
if ( mv_joint == MV_JOINT_HNZVZ || mv_joint == MV_JOINT_HNZVNZ )
diffMv[ 1 ] = read_mv_component( 1 )
Mv[ ref ][ 0 ] = BestMv[ ref ][ 0 ] + diffMv[ 0 ]
Mv[ ref ][ 1 ] = BestMv[ ref ][ 1 ] + diffMv[ 1 ]
}
token T
TokenCache[ pos ] = energy_class[ token ]
if ( token == ZERO_TOKEN ) {
Tokens[ pos ] = 0
checkEob = 0
} else {
coef = read_coef( token )
sign_bit L(1)
Tokens[ pos ] = sign_bit ? -coef : coef
checkEob = 1
}
}
nonzero = c > 0
EobTotal += nonzero
for ( i = c; i < segEob; i++ )
Tokens[ scan[ i ] ] = 0
return nonzero
}
scan = default_scan_8x8
} else if ( txSz == TX_16X16 ) {
if ( TxType == ADST_DCT )
scan = row_scan_16x16
else if ( TxType == DCT_ADST )
scan = col_scan_16x16
else
scan = default_scan_16x16
} else {
scan = default_scan_32x32
}
return scan
}
{ 6, 14, 67}
}
The mv_ref_blocks table contains candidate locations to search for motion vectors and is defined as:
mv_ref_blocks[ BLOCK_SIZES ][ MVREF_NEIGHBOURS ][ 2 ] = {
{{-1, 0}, {0, -1}, {-1, -1}, {-2, 0}, {0, -2}, {-2, -1}, {-1, -2}, {-2, -2}},
{{-1, 0}, {0, -1}, {-1, -1}, {-2, 0}, {0, -2}, {-2, -1}, {-1, -2}, {-2, -2}},
{{-1, 0}, {0, -1}, {-1, -1}, {-2, 0}, {0, -2}, {-2, -1}, {-1, -2}, {-2, -2}},
{{-1, 0}, {0, -1}, {-1, -1}, {-2, 0}, {0, -2}, {-2, -1}, {-1, -2}, {-2, -2}},
{{0, -1}, {-1, 0}, {1, -1}, {-1, -1}, {0, -2}, {-2, 0}, {-2, -1}, {-1, -2}},
{{-1, 0}, {0, -1}, {-1, 1}, {-1, -1}, {-2, 0}, {0, -2}, {-1, -2}, {-2, -1}},
{{-1, 0}, {0, -1}, {-1, 1}, {1, -1}, {-1, -1}, {-3, 0}, {0, -3}, {-3, -3}},
{{0, -1}, {-1, 0}, {2, -1}, {-1, -1}, {-1, 1}, {0, -3}, {-3, 0}, {-3, -3}},
{{-1, 0}, {0, -1}, {-1, 2}, {-1, -1}, {1, -1}, {-3, 0}, {0, -3}, {-3, -3}},
{{-1, 1}, {1, -1}, {-1, 2}, {2, -1}, {-1, -1}, {-3, 0}, {0, -3}, {-3, -3}},
{{0, -1}, {-1, 0}, {4, -1}, {-1, 2}, {-1, -1}, {0, -3}, {-3, 0}, {2, -1}},
{{-1, 0}, {0, -1}, {-1, 4}, {2, -1}, {-1, -1}, {-3, 0}, {0, -3}, {-1, 2}},
{{-1, 3}, {3, -1}, {-1, 4}, {4, -1}, {-1, -1}, {-1, 0}, {0, -1}, {-1, 6}}
}
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 3, 1
}
deltaCol = RefListMv[ i ][ 1 ]
if ( !allow_high_precision_mv || !use_mv_hp( RefListMv[ i ] ) ) {
if ( deltaRow & 1 )
deltaRow += (deltaRow > 0 ? -1 : 1)
if ( deltaCol & 1 )
deltaCol += (deltaCol > 0 ? -1 : 1)
}
RefListMv[ i ][ 0 ] = clamp_mv_row( deltaRow,
(BORDERINPIXELS - INTERP_EXTEND) << 3 )
RefListMv[ i ][ 1 ] = clamp_mv_col( deltaCol,
(BORDERINPIXELS - INTERP_EXTEND) << 3 )
}
NearestMv[ refList ] = RefListMv[ 0 ]
NearMv[ refList ] = RefListMv[ 1 ]
BestMv[ refList ] = RefListMv[ 0 ]
}
7 Bitstream semantics
This section specifies the meaning of the syntax elements read in the syntax structures.
7.1 Frame semantics
The bitstream consists of a sequence of coded frames.
Each coded frame is given to the decoding process in turn as a bitstream along with a variable sz that gives
the total number of bytes in the coded frame.
Methods of framing the coding frames in a container format are outside the scope of this Specification.
However, one common method of packing several frames into a single superframe is described in Annex B.
padding_bit shall be equal to 0.
7.1.1 Trailing bits semantics
zero_bit shall be equal to 0 and is inserted into the bitstream to align the bit position to a multiple of 8 bits.
7.1.2 Refresh probs semantics
load_probs( ctx ) is a function call that indicates that the probability tables should be loaded from frame
context number ctx in the range 0 to 3. When this function is invoked the following takes place:
− A copy of each probability table (except tx_probs and skip_prob) is loaded from an area of memory
indexed by ctx. (The memory contents of these frame contexts have been initialized by previous calls to
save_probs).
load_probs2( ctx ) is a function call that indicates that the probability tables tx_probs and skip_prob should
be loaded from frame context number ctx in the range 0 to 3. When this function is invoked the following
takes place:
− A copy of the probability tables called tx_probs and skip_prob are loaded from an area of memory indexed
by ctx.
adapt_coef_probs is a function call that indicates that the coefficient probabilities should be adjusted based
on the observed counts. This process is described in section 8.4.3.
adapt_noncoef_probs is a function call that indicates that the probabilities (for reading syntax elements other
than the coefficients) should be adjusted based on the observed counts. This process is described in section
8.4.4.
clear_counts is a function call that indicates that all the counters for different syntax elements should be reset
to 0. This process is described in section 8.3.
7.2 Uncompressed header semantics
frame_marker shall be equal to 2.
profile_low_bit and profile_high_bit combine to make the variable Profile. VP9 supports 4 profiles:
It is possible for bitstreams to start with a non key frame and still be decodable. In this case there are a
number of additional constraints on the bitstream that are detailed in section 8.2.
error_resilient_mode equal to 1 indicates that error resilient mode is enabled; error_resilient_mode equal to
0 indicates that error resilient mode is disabled.
NOTE – Error resilient mode allows the syntax of a frame to be decoded independently of previous frames.
intra_only equal to 1 indicates that the frame is an intra-only frame; intra_only equal to 0 indicates that the
frame is an inter frame.
reset_frame_context specifies whether the frame context should be reset to default values:
− 0 or 1 means do not reset any frame context
− 2 resets just the context specified in the frame header
− 3 resets all contexts.
refresh_frame_flags contains a bitmask that specifies which reference frame slots will be updated with the
current frame after it is decoded.
See section 8.10 for details of the frame update process.
ref_frame_idx specifies which reference frames are used by inter frames. It is a requirement of bitstream
conformance that the selected reference frames match the current frame in bit depth, profile, chroma
subsampling, and color space.
ref_frame_sign_bias specifies the intended direction of the motion vector in time for each reference frame. A
sign bias equal to 0 indicates that the reference frame is a backwards reference; a sign bias equal to 1
indicates that the reference frame is a forwards reference.
NOTE – The sign bias is just an indication that can improve the accuracy of motion vector prediction and is
not constrained to reflect the actual output order of pictures.
allow_high_precision_mv equal to 0 specifies that motion vectors are specified to quarter pel precision;
allow_high_precision_mv equal to 1 specifies that motion vectors are specified to eighth pel precision.
refresh_frame_context equal to 1 indicates that the probabilities computed for this frame (after adapting to
the observed frequencies if adaption is enabled) should be stored for reference by future frames.
refresh_frame_context equal to 0 indicates that the probabilities should be discarded at the end of the frame.
See section 8.4 for details of the adaption process.
frame_parallel_decoding_mode equal to 1 indicates that parallel decoding mode is enabled;
frame_parallel_decoding_mode equal to 0 indicates that parallel decoding mode is disabled.
NOTE – Parallel decoding mode means that the probabilities are not adapted based on the observed
frequencies. This means that the next frame can start to be decoded as soon as the frame headers of the
current frame have been processed. This has most of the benefits of error resilient mode for multi-core
decoding, without needing to repeat sending updated probabilities for each frame.
frame_context_idx indicates the frame context to use.
header_size_in_bytes indicates the size of the compressed header in bytes.
setup_past_independence is a function call that indicates that this frame can be decoded without
dependence on previous coded frames. When this function is invoked the following takes place:
− FeatureData[ i ][ j ] and FeatureEnabled[ i ][ j ] are set equal to 0 for i = 0..7 and j = 0..3.
− segmentation_abs_or_delta_update is set equal to 0.
− PrevSegmentIds[ row ][ col ] is set equal to 0 for row = 0..MiRows-1 and col = 0..MiCols-1.
− loop_filter_delta_enabled is set equal to 1.
− loop_filter_ref_deltas[ INTRA_FRAME ] is set equal to 1.
− loop_filter_ref_deltas[ LAST_FRAME ] is set equal to 0.
− loop_filter_ref_deltas[ GOLDEN_FRAME ] is set equal to -1.
− loop_filter_ref_deltas[ ALTREF_FRAME ] is set equal to -1.
− loop_filter_mode_deltas[ i ] is set equal to 0 for i = 0..1.
− ref_frame_sign_bias[ i ] is set equal to 0 for i = 0..3.
− The probability tables are reset to default values. The default values are specified in section 10.5.
save_probs( ctx ) is a function call that indicates that indicates that all the probability tables should be saved
into frame context number ctx in the range 0 to 3. When this function is invoked the following takes place:
A copy of each probability table is saved in an area of memory indexed by ctx. The memory contents of these
frame contexts are persistent in order to allow a subsequent inter frame to reload the probability tables.
7.2.1 Frame sync semantics
frame_sync_byte_0 shall be equal to 0x49.
frame_sync_byte_1 shall be equal to 0x83.
frame_sync_byte_2 shall be equal to 0x42.
7.2.2 Color config semantics
ten_or_twelve_bit equal to 1 indicates the bit depth is 12 bits; ten_or_twelve_bit equal to 0 indicates that the
bit depth is 10 bits.
color_space specifies the color space of the stream:
It is a requirement of bitstream conformance that color_space is not equal to CS_RGB when profile_low_bit is
equal to 0.
NOTE – Note that VP9 passes the color space information in the bitstream including Rec. ITU-R BT.2020-2,
however, VP9 does not specify if it is in the form of “constant luminance” or “nonconstant luminance”. As
such, application should rely on the signaling outside of the VP9 bitstream. If there is no such signaling, the
application may assume nonconstant luminance for Rec. ITU-R BT.2020-2.
color_range specifies the black level and range of the luma and chroma signals as specified in Rec. ITU-R
BT.709-6 and Rec. ITU-R BT.2020-2:
NOTE – Note that this specification does not enforce the range of YUV values when the YUV range is
signaled as Studio swing. Therefore the application should perform additional clamping and color
conversion operations according to the specified range.
subsampling_x, subsampling_y specify the chroma subsampling format:
found_ref equal to 1 indicates that the frame dimensions can be inferred from reference frame i where i is the
loop counter in the syntax parsing process for frame_size_with_refs. found_ref equal to 0 indicates that the
frame dimensions are not inferred from reference frame i.
Once the FrameWidth and FrameHeight have been computed for an inter frame, it is a requirement of
bitstream conformance that for at least one value of i in the range 0..2, all the following conditions are true:
− 2 * FrameWidth >= RefFrameWidth[ ref_frame_idx[ i ] ]
− 2 * FrameHeight >= RefFrameHeight[ ref_frame_idx[ i ] ]
− FrameWidth <= 16 * RefFrameWidth[ ref_frame_idx[ i ] ]
− FrameHeight <= 16 * RefFrameHeight[ ref_frame_idx[ i ] ]
NOTE – This is a requirement even if all the blocks in an inter frame are coded using intra prediction.
7.2.6 Compute image size semantics
When compute_image_size is invoked, the following ordered steps occur:
1. If this is the first time compute_image_size is invoked, or if either FrameWidth or FrameHeight have
changed in value compared to the previous time this function was invoked, then the segmentation map is
cleared to all zeros by setting SegmentId[ row ][ col ] equal to 0 for row = 0..MiRows-1 and col =
0..MiCols-1.
2. The variable UsePrevFrameMvs is set equal to 1 if all of the following conditions are true:
a. This is not the first time compute_image_size is invoked.
b. Both FrameWidth and FrameHeight have the same value compared to the previous time this function
was invoked.
c. show_frame was equal to 1 the previous time this function was invoked.
d. error_resilient_mode is equal to 0.
e. FrameIsIntra is equal to 0.
Otherwise, UsePrevFrameMvs is set equal to 0.
NOTE – compute_image_size is not invoked and therefore segmentation map is not cleared when
show_existing_frame is equal to 1 even if the shown frame has different dimensions.
7.2.7 Interpolation filter semantics
is_filter_switchable equal to 1 indicates that the filter selection is signaled at the block level;
is_filter_switchable equal to 0 indicates that the filter selection is signaled at the frame level.
raw_interpolation_filter is used to compute interpolation_filter.
interpolation_filter specifies the filter selection used for performing inter prediction:
loop_filter_delta_enabled equal to 1 means that the filter level depends on the mode and reference frame
used to predict a block. loop_filter_delta_enabled equal to 0 means that the filter level does not depend on
the mode and reference frame.
loop_filter_delta_update equal to 1 means that the bitstream contains additional syntax elements that
specify which mode and reference frame deltas are to be updated. loop_filter_delta_update equal to 0 means
that these syntax elements are not present.
update_ref_delta equal to 1 means that the bitstream contains the syntax element loop_filter_ref_delta;
update_ref_delta equal to 0 means that the bitstream does not contain this syntax element.
loop_filter_ref_deltas contains the adjustment needed for the filter level based on the chosen reference
frame. If this syntax element is not present in the bitstream, it maintains its previous value.
update_mode_delta equal to 1 means that the bitstream contains the syntax element
loop_filter_mode_deltas; update_mode_delta equal to 0 means that the bitstream does not contain this syntax
element.
loop_filter_mode_deltas contains the adjustment needed for the filter level based on the chosen mode. If
this syntax element is not present in the bitstream, it maintains its previous value.
NOTE – The previous values for loop_filter_mode_deltas and loop_filter_ref_deltas are intially set by the
setup_past_independence function and can be subsequently modified by these syntax elements being
coded in a previous frame.
7.2.9 Quantization params syntax
The residual is specified via decoded coefficients which are adjusted by one of four quantization parameters
before the inverse transform is applied. The choice depends on the plane (Y or UV) and coefficient position
(DC/AC coefficient). The Dequantization process is specified in section 8.6.
base_q_idx indicates the base frame qindex. This is used for Y AC coefficients and as the base value for the
other quantizers.
delta_q_y_dc indicates the Y DC quantizer relative to base_q_idx.
delta_q_uv_dc indicates the UV DC quantizer relative to base_q_idx.
delta_q_uv_ac indicates the UV AC quantizer relative to base_q_idx.
delta_coded specifies that the delta_q syntax element is present in the bitstream.
delta_q specifies an offset (relative to base_q_idx) for a particular quantization parameter.
Lossless is computed from the syntax elements and if equal to 1 indicates that the frame is coded using a
special 4x4 transform designed for encoding frames that are bit-identical with the original frames.
7.2.10 Segmentation params syntax
VP9 provides a means of segmenting the image and then applying various adjustments at the segment level.
Up to 8 segments may be specified for any given frame. For each of these segments it is possible to specify:
1. A quantizer (absolute value or delta).
2. A loop filter strength (absolute value or delta).
3. A prediction reference frame.
4. A block skip mode that implies both the use of a (0,0) motion vector and that no residual will be coded.
Each of these data values for each segment may be individually updated at the frame level. Where a value is
not updated in a given frame, the value from the previous frame persists. The exceptions to this are key
frames, intra only frames or other frames where independence from past frame values is required (for
example to enable error resilience). In such cases all values are reset as described in the semantics for
setup_past_independence.
The segment affiliation (the segmentation map) is stored at the resolution of 8x8 blocks. If no explicit update
is coded for a block’s segment affiliation, then it persists from frame to frame (until reset by a call to
setup_past_independence).
segmentation_enabled equal to 1 indicates that this frame makes use of the segmentation tool;
segmentation_enabled equal to 0 indicates that the frame does not use segmentation.
segmentation_update_map equal to 1 indicates that the segmentation map should be updated during the
decoding of this frame. segmentation_update_map equal to 0 means that the segmentation map from the
previous frame is used.
segmentation_tree_probs specify the probability values to be used when decoding segment_id.
segmentation_pred_prob specify the probability values to be used when decoding seg_id_predicted.
segmentation_temporal_update equal to 1 indicates that the updates to the segmentation map are coded
relative to the existing segmentation map. segmentation_temporal_update equal to 0 indicates that the new
segmentation map is coded without reference to the existing segmentation map.
segmentation_update_data equal to 1 indicates that new parameters are about to be specified for each
segment. segmentation_update_data equal to 0 indicates that the segmentation parameters should keep
their existing values.
segmentation_abs_or_delta_update equal to 0 indicates that the segmentation parameters represent
adjustments relative to the standard values. segmentation_abs_or_delta_update equal to 1 indicates that the
segmentation parameters represent the actual values to be used.
feature_enabled equal to 0 indicates that the corresponding feature is unused and has value equal to 0.
feature_enabled equal to 1 indicates that the feature value is coded in the bitstream.
feature_value specifies the magnitude of the feature data for a segment feature.
feature_sign equal to 1 indicates that the feature data is given by -feature_value. feature_sign equal to 0
indicates that the feature data is given by feature_value. It is a requirement of bitstream conformance that
feature_sign is equal to 0 when segmentation_abs_or_delta_update is equal to 1.
7.2.11 Tile info semantics
increment_tile_cols_log2 indicates whether the tile width should be increased.
tile_cols_log2 specifies the base 2 logarithm of the width of each tile (where the width is measured in units of
8x8 blocks).
It is a requirement of bitstream conformance that tile_cols_log2 is less than or equal to 6.
increment_tile_rows_log2 indicates whether the tile height should be increased.
tile_rows_log2 specifies the base 2 logarithm of the height of each tile (where the height is measured in units
of 8x8 blocks).
7.3 Compressed header semantics
The compressed header specifies the transform mode and updates to the probability tables.
7.3.1 Tx mode semantics
tx_mode specifies how the transform size is determined:
For tx_mode not equal to 4, the inverse transform will use the largest transform size possible up to the limit set
in tx_mode. For tx_mode equal to 4, the choice of size is specified explicitly for each block.
tx_mode_select equal to 1 indicates that the transform mode is specified within each mode info header.
tx_mode_select equal to 0 indicates that the transform mode is computed based on the size of the block and
the maximum transform size allowed by the frame header.
7.3.2 Diff update prob semantics
update_prob equal to 1 indicates that an adjustment to the probability is coded in the bitstream; update_prob
equal to 0 indicates that the previous value for probability should be used.
7.3.3 Decode term subexp semantics
bit, sub_exp_val, sub_exp_val_minus_16, sub_exp_val_minus_32, v are combined to specify an
adjustment to a probability in such a way that fewer bits are required to specify small adjustments.
7.3.4 Inv remap prob semantics
deltaProb is passed into this function to indicate how much the probability should be adjusted. It is a
requirement of bitstream conformance that the value of deltaProb is strictly less than MAX_PROB.
7.3.5 Coef prob semantics
update_probs equal to 1 indicates that the coefficient probabilities should be adjusted. update_probs equal
to 0 indicates that the coefficient probabilities should maintain their previous values.
coef_probs specifies a set of probability models to be used for transform coefficients decoding. When frame
parallel mode is turned off, it first runs backward update according to the previous frame decoding information.
It then goes through forward probability model update via diff_update_prob, where it reads the difference in
probabilities from the bit-stream and adjust the effective model accordingly.
7.3.6 Frame reference mode semantics
non_single_reference equal to 1 specifies that compound prediction may be used; non_single_reference
equal to 0 specifies that compound prediction may not be used.
reference_select equal to 0 specifies that all inter blocks use compound prediction; reference_select equal to
1 specifies that the mode info for inter blocks contains the syntax element comp_mode that indicates whether
to use single or compound prediction.
reference_mode is a derived syntax element that specifies the type of inter prediction to be used:
SINGLE_REFERENCE indicates that all the inter blocks use only a single reference frame to generate motion
compensated prediction.
COMPOUND_REFERENCE requires all the inter blocks to use compound mode. Single reference frame
prediction is not allowed.
REFERENCE_MODE_SELECT allows each individual inter block to select between single and compound
prediction modes.
subsize is computed from partition and indicates the size of the component blocks within this block:
The dimensions of these blocks are given in width, height order (e.g. BLOCK_8X16 corresponds to a block
that is 8 pixels wide, and 16 pixels high).
It is a requirement of bitstream conformance that at least one of the following conditions is true every time
subsize is computed:
− subsize < BLOCK_8X8
− get_plane_block_size( subsize, 1 ) is not equal to BLOCK_INVALID
NOTE – This requirement prevents the UV blocks from being too tall or too wide (i.e. having aspect ratios
outside the range 1:2 to 2:1).
7.4.4 Decode block semantics
MiRow is a variable holding the vertical location of the block in units of 8x8 pixels.
MiCol is a variable holding the horizontal location of the block in units of 8x8 pixels.
MiSize is a variable holding the size of the block with values having the same interpretation as in the
semantics for subsize.
AvailU is equal to 0 if the information from the block above cannot be used; AvailU is equal to 1 if the
information from the block above can be used.
AvailL is equal to 0 if the information from the block to the left can not be used; AvailL is equal to 1 if the
information from the block to the left can be used.
NOTE – Information from a block in a different tile can be used if the block is above, but not if the block is
to the left.
7.4.5 Intra frame mode info semantics
This syntax is used when coding an intra block within an intra frame.
default_intra_mode specifies the direction of intra prediction filtering:
default_uv_mode specifies the chrominance intra prediction mode using values with the same interpretation
as in the semantics for default_intra_mode.
7.4.6 Intra and inter segment id semantics
segment_id specifies which segment is associated with the current intra block being decoded.
seg_id_predicted equal to 1 specifies that the segment_id should be taken from the segmentation map.
seg_id_predicted equal to 0 specifies that the syntax element segment_id should be parsed.
NOTE – It is legal for seg_id_predicted to be equal to 0 even if the value coded for the segment_id is equal
to predictedSegmentId.
NOTE – The intra modes take values 0..9 so these y_mode values start at 10.
When seg_feature_active( SEG_LVL_SKIP ) is set, y_mode is set equal to ZEROMV. It is a requirement of
bitstream conformance that whenever inter_block_mode_info is invoked and
seg_feature_active( SEG_LVL_SKIP ) is equal to 1, MiSize shall be greater than or equal to BLOCK_8X8.
interp_filter specifies the type of filter used in inter prediction. Values 0..3 are allowed with the same
interpretation as for interpolation_filter.
NOTE – The syntax element interpolation_filter from the uncompressed header can specify the type of filter
to be used for the whole frame. If it is set to SWITCHABLE then the interp_filter syntax element is read
from the bitstream for every inter block.
7.4.13 MV semantics
ZeroMv represents a zero motion vector and is defined by ZeroMv[ i ] is equal to 0 for i = 0..1.
mv_joint specifies which components of the motion vector difference are non-zero:
The motion vector difference is added to the BestMv to compute the final motion vector in Mv. It is a
requirement of bitstream conformance that the resulting motion vector satisfies -(1<<14) < Mv[ ref ][ comp ] <
(1<<14) - 1 for comp=0..1.
7.4.14 MV component semantics
mv_sign equal to 0 means that the motion vector difference is positive; mv_sign equal to 1 means that the
motion vector difference is negative.
mv_class specifies the class of the motion vector difference. A higher class means that the motion vector
difference represents a larger update:
2 MV_CLASS_2
3 MV_CLASS_3
4 MV_CLASS_4
5 MV_CLASS_5
6 MV_CLASS_6
7 MV_CLASS_7
8 MV_CLASS_8
9 MV_CLASS_9
10 MV_CLASS_10
mv_class0_bit specifies the integer part of the motion vector difference. This is only present in the bitstream
for class 0 motion vector differences.
mv_class0_fr specifies the first 2 fractional bits of the motion vector difference. This is only present in the
bitstream for class 0 motion vector differences.
mv_class0_hp specifies the third fraction bit of the motion vector difference. This is only present in the
bitstream for class 0 motion vector differences.
mv_bit specifies bit i of the integer part of the motion vector difference.
mv_fr specifies the first 2 fractional bits of the motion vector difference.
mv_hp specifies the third fractional bit of the motion vector difference.
7.4.15 Residual semantics
AboveNonzeroContext and LeftNonzeroContext are arrays that store at a 4 sample granularity which
blocks contained coded coefficients.
predict_intra is a function call that indicates the conceptual point where intra prediction happens. When this
function is called, the intra prediction process specified in section 8.5.1 is invoked.
predict_inter is a function call that indicates the conceptual point where inter prediction happens. When this
function is called, the inter prediction process specified in section 8.5.2 is invoked.
reconstruct is a function call that indicates the conceptual point where inverse transform and reconstruction
happens. When this function is called, the reconstruction process specified in section 8.6.2 is invoked.
NOTE – The predict_inter, predict_intra, and reconstruct functions do not affect the syntax decode process.
7.4.16 Token semantics
more_coefs equal to 0 specifies that all the remaining coefficients in this transform block are equal to 0 and
that no more coefficients need to be parsed for this transform block. more_coefs equal to 1 indicates that
more coefficients should be read.
token specifies the size range of the transform coefficient:
9 DCT_VAL_CAT5
10 DCT_VAL_CAT6
sign_bit equal to 0 indicates that the transform coefficient is given by coef; sign_bit equal to 1 indicates that
the transform coefficient is given by -coef.
nonzero is equal to 0 if the first more_coefs syntax element decoded for the transform block was equal to 0;
nonzero is equal to 1 otherwise.
NOTE – It is legal for the transform block to be filled with zero coefficients without more_coefs being set
equal to 0. However, in this case note that nonzero is still set equal to 1. The variable nonzero
corresponds to the last scan position being non-zero, and not the actual presence of non-zero coefficients.
7.4.17 Coef semantics
high_bit is only present when the BitDepth is greater than 8 and token is equal to DCT_VAL_CATEGORY6.
high_bit contains some additional most significant bits of the transform coefficients.
coef_bit represents the remaining bits of the transform coefficient. Different tokens specify different numbers
of remaining bits.
8 Decoding process
8.1 General
Decoders shall produce output frames that are identical in all respects and have the same output order as
those produced by the decoding process specified herein.
The input to this process is a sequence of coded frames.
The output from this process is a sequence of decoded frames.
For each coded frame in turn the decoding process operates as follows:
1. The syntax elements for the coded frame are extracted as specified in sections 6 and 7. The syntax
tables include function calls indicating when the block decode processes should be triggered.
2. If loop_filter_level is not equal to 0, the loop filter process as specified in section 8.8 is invoked once the
coded frame has been decoded.
3. If all of the following conditions are true, PrevSegmentIds[ row ][ col ] is set equal to
SegmentIds[ row ][ col ] for row = 0..MiRows-1, for col = 0..MiCols-1:
− show_existing_frame is equal to 0,
− segmentation_enabled is equal to 1,
− segmentation_update_map is equal to 1.
4. The output process as specified in section 8.9 is invoked.
5. The reference frame update process as specified in section 8.10 is invoked.
8.2 Frame order constraints
This section describes additional constraints on a bitstream that result from the choice of frame types.
If a sequence starts with a key frame, then the additional constraints are automatically satisfied.
Otherwise, the sequence can start with a number of non key frames (i.e. inter frames and intra only frames)
and extra care has to be taken to ensure that the decoding process is well defined. It is a requirement of
bitstream conformance that the following applies:
− When load_probs( ctx ) is invoked, this must load an initialized set of probabilities, i.e. there must have
been an earlier invocation of save_probs( ctx ).
− When ref_frame_idx[ i ] is decoded, it must identify an initialized reference frame, i.e. there must have
been an earlier decoded frame which had (refresh_frame_flags>>ref_frame_idx[ i ])&1 equal to 1.
NOTE – This constraint means that the first coded frame cannot be an inter frame.
counts_single_ref[REF_CONTEXTS][2][2]
counts_comp_ref[REF_CONTEXTS][2]
counts_skip[SKIP_CONTEXTS][2]
counts_mv_joint[MV_JOINTS]
counts_mv_sign[2][2]
counts_mv_class[2][MV_CLASSES]
counts_mv_class0_bit[2][CLASS0_SIZE]
counts_mv_class0_fr[2][CLASS0_SIZE][MV_FR_SIZE]
counts_mv_class0_hp[2][2]
counts_mv_bits[2][MV_OFFSET_BITS][2]
counts_mv_fr[2][MV_FR_SIZE]
counts_mv_hp[2][2]
counts_token[TX_SIZES][BLOCK_TYPES][REF_TYPES][COEF_BANDS]
[PREV_COEF_CONTEXTS][UNCONSTRAINED_NODES]
counts_more_coefs[TX_SIZES][BLOCK_TYPES][REF_TYPES][COEF_BANDS]
[PREV_COEF_CONTEXTS][2]
The numbers in square brackets define the size of each dimension of the array.
When the clear counts process is invoked, all of these counts are set equal to 0.
8.4 Probability adaptation process
This section defines two processes used to perform backward updates of probabilities based on the observed
frequencies of different syntax elements.
These processes are triggered at points defined by function calls in the refresh probs syntax table in section
6.1.2.
The processes make use of the functions merge_prob and merge_probs that are specified next.
8.4.1 Merge prob process
The inputs to this process are:
− a variable preProb specifying the original probability for a boolean,
− variables ct0 and ct1 specifying the number of times the boolean was decoded as 0 and 1,
− a variable countSat indicating how many times the boolean needs to be decoded for the maximum
adaption to apply,
− a variable maxUpdateFactor specifying the maximum amount the probability can be adjusted.
The output is a variable outProb containing the updated probability.
The variable den representing the total times this boolean has been decoded is set equal to ct0 + ct1.
The variable prob estimating the probability that the boolean is decoded as a 0 is set equal to (den == 0) ?
128 : Clip3( 1, 255, (ct0 * 256 + (den >> 1)) / den ).
The variable count is set equal to Min( ct0 + ct1, countSat ).
The variable factor is set equal to maxUpdateFactor * count / countSat.
The return variable outProb is set equal to Round2( preProb * (256 - factor) + prob * factor, 8 ).
8.4.2 Merge probs process
The inputs to this process are:
− an array tree specifying the decode tree for a syntax element,
− Otherwise if mode is equal to DC_PRED and haveLeft is equal to 1 and haveAbove is equal to 0,
pred[ i ][ j ] is set equal to leftAvg with i = 0..size-1 and j = 0..size-1. The variable leftAvg is specified as
follows:
sum = 0
− Otherwise if mode is equal to DC_PRED and haveLeft is equal to 0 and haveAbove is equal to 1,
pred[ i ][ j ] is set equal to aboveAvg with i = 0..size-1 and j = 0..size-1. The variable aboveAvg is specified
as follows:
sum = 0
for ( k = 0; k < size; k++ ) {
sum += aboveRow[ k ]
}
aboveAvg = (sum + (1 << (log2Size - 1) ) ) >> log2Size
− Otherwise (mode is DC_PRED), pred[ i ][ j ] is set equal to 1<<(BitDepth - 1) with i = 0..size-1 and j =
0..size-1.
The current frame is updated as follows:
− CurrFrame[ plane ][ y + i ][ x + j ] is set equal to pred[ i ][ j ] for i = 0..size-1 and j = 0..size-1.
8.5.2 Inter prediction process
The inter prediction process is invoked for inter coded blocks. When MiSize is smaller than BLOCK_8X8, the
prediction is done with a granularity of 4x4 samples, otherwise the whole plane is predicted at the same time.
The inputs to this process are:
− a variable plane specifying which plane is being predicted,
− variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region
to be predicted,
− variables w and h specifying the width and height of the region to be predicted,
− a variable blockIdx, specifying how much of the block has already been predicted in units of 4x4 samples.
The outputs of this process are inter predicted samples in the current frame CurrFrame.
The variable isCompound is set equal to ref_frame[ 1 ] > NONE.
The prediction arrays are formed by the following ordered steps:
1. The variable refList is set equal to 0.
2. The motion vector selection process in section 8.5.2.1 is invoked with plane, refList, blockIdx as inputs
and the output being the motion vector mv.
3. The motion vector clamping process in section 8.5.2.2 is invoked with plane, mv as inputs and the output
being the clamped motion vector clampedMv
4. The motion vector scaling process in section 8.5.2.3 is invoked with plane, refList, x, y, clampedMv as
inputs and the output being the initial location startX, startY, and the step sizes stepX, stepY.
5. The block inter prediction process in section 8.5.2.4 is invoked with plane, refList, startX, startY, stepX,
stepY, w, h as inputs and the output is assigned to the 2D array preds[ refList ].
6. If isCompound is equal to 1, then the variable refList is set equal to 1 and steps 2, 3, 4 and 5 are repeated
to form the prediction for the second reference.
The inter predicted samples are then derived as follows:
− If isCompound is equal to 0, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to preds[ 0 ][ i ][ j ] for i = 0..h-1
and j = 0..w-1.
The purpose of this process is to find the motion vector for this block. Motion vectors are specified for each
luma block, but a chroma block may cover more than one luma block due to subsampling. In this case, an
average motion vector is constructed for the chroma block.
round_mv_comp_q4( value ) {
return (value < 0 ? value - 2 : value + 2) / 4
}
NOTE – When subsampling_x is equal to 1, and subsampling_y is equal to 0, the chroma motion vector
for the bottom block is computed as an average of the top-right and bottom-left luma motion vectors
because blockIdx is equal to 1 for the bottom chroma block.
8.5.2.2 Motion vector clamping process
The inputs to this process are:
− a variable plane specifying which plane is being predicted,
− a variable mv specifying the motion vector to be clamped.
The output of this process is a 2 element array called clampedMv containing the clamped motion vector for
this block.
The purpose of this process is to change the motion vector into the appropriate precision for the current plane
and to clamp motion vectors that go too far off the edge of the frame.
The variables sx and sy are set equal to the subsampling for the current plane as follows:
− If plane is equal to 0, sx is set equal to 0 and sy is set equal to 0.
− Otherwise, sx is set equal to subsampling_x and sy is set equal to subsampling_y.
The output array clampedMv is specified by the following steps:
bh = num_8x8_blocks_high_lookup[ MiSize ]
mbToTopEdge = -((MiRow * MI_SIZE) * 16) >> sy
mbToBottomEdge = (((MiRows - bh - MiRow) * MI_SIZE) * 16) >> sy
bw = num_8x8_blocks_wide_lookup[ MiSize ]
mbToLeftEdge = -((MiCol * MI_SIZE) * 16) >> sx
mbToRightEdge = (((MiCols - bw - MiCol) * MI_SIZE) * 16) >> sx
spelLeft = (INTERP_EXTEND + ((bw * MI_SIZE) >> sx) ) << SUBPEL_BITS
spelRight = spelLeft - SUBPEL_SHIFTS
spelTop = (INTERP_EXTEND + ((bh * MI_SIZE) >> sy) ) << SUBPEL_BITS
spelBottom = spelTop - SUBPEL_SHIFTS
clampedMv[ 0 ] = Clip3( mbToTopEdge - spelTop, mbToBottomEdge + spelBottom,
(2 * mv[ 0 ]) >> sy )
clampedMv[ 1 ] = Clip3( mbToLeftEdge - spelLeft, mbToRightEdge + spelRight,
(2 * mv[ 1 ]) >> sx )
This process is responsible for computing the sampling locations in the reference frame based on the motion
vector. The sampling locations are also adjusted to compensate for any difference in the size of the reference
frame compared to the current frame.
A variable refIdx specifying which reference frame is being used is set equal to
ref_frame_idx[ ref_frame[ refList ] - LAST_FRAME ].
It is a requirement of bitstream conformance that all the following conditions are satisfied:
− 2 * FrameWidth >= RefFrameWidth[ refIdx ]
− 2 * FrameHeight >= RefFrameHeight[ refIdx ]
The output variable stepX is set equal to (16 * xScale) >> REF_SCALE_SHIFT.
The output variable stepY is set equal to (16 * yScale) >> REF_SCALE_SHIFT.
The output variable startX is set equal to (baseX << SUBPEL_BITS) + dX.
The output variable startY is set equal to (baseY << SUBPEL_BITS) + dY.
NOTE – Even for chroma blocks the fractional part of the start position is based on the luma block location
in lumaX and lumaY.
8.5.2.4 Block inter prediction process
The inputs to this process are:
− a variable plane,
− a variable refList specifying that we should predict from ref_frame[ refList ],
− variables x and y giving the block location in units of 1/16 th of a sample,
− variables xStep and yStep giving the step size in units of 1/16 th of a sample. (These will be at most equal
to 80 due to the restrictions on scaling between reference frames.)
− variables w and h giving the width and height of the block in units of samples
The output from this process is the 2D array named pred containing inter predicted samples.
A variable refIdx specifying which reference frame is being used is set equal to
ref_frame_idx[ ref_frame[ refList ] - LAST_FRAME ].
A variable ref specifying the reference frame contents is set equal to FrameStore[ refIdx ].
The variables subX and subY are set equal to the subsampling for the current plane as follows:
− If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.
− Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variable lastX is set equal to ( (RefFrameWidth[ refIdx ] + subX) >> subX) - 1.
The variable lastY is set equal to ( (RefFrameHeight[ refIdx ] + subY) >> subY) - 1.
(lastX and lastY specify the coordinates of the bottom right sample of the reference plane.)
The variable intermediateHeight specifying the height required for the intermediate array is set equal to (((h -
1) * yStep + 15) >> 4) + 8.
The sub-sample interpolation is effected via two one-dimensional convolutions. First a horizontal filter is used
to build up a temporary array, and then this array is vertically filtered to obtain the final prediction. The
fractional parts of the motion vectors determine the filtering process. If the fractional part is zero, then the
filtering is equivalent to a straight sample copy.
The filtering is applied as follows:
− The array intermediate is specified as follows:
for ( r = 0; r < intermediateHeight; r++ ) {
for ( c = 0; c < w; c++ ) {
s=0
p = x + xStep * c
for ( t = 0; t < 8; t++ )
s += subpel_filters[ interp_filter ][ p & 15 ][ t ] * ref[ plane ] [ Clip3( 0, lastY, (y >> 4) + r - 3 ) ]
[ Clip3( 0, lastX, (p >> 4) + t - 3 ) ]
intermediate[ r ][ c ] = Clip1( Round2(s, 7) )
}
}
105, 107, 108, 110, 111, 113, 114, 116, 117, 118, 120, 121,
123, 125, 127, 129, 131, 134, 136, 138, 140, 142, 144, 146,
148, 150, 152, 154, 156, 158, 161, 164, 166, 169, 172, 174,
177, 180, 182, 185, 187, 190, 192, 195, 199, 202, 205, 208,
211, 214, 217, 220, 223, 226, 230, 233, 237, 240, 243, 247,
250, 253, 257, 261, 265, 269, 272, 276, 280, 284, 288, 292,
296, 300, 304, 309, 313, 317, 322, 326, 330, 335, 340, 344,
349, 354, 359, 364, 369, 374, 379, 384, 389, 395, 400, 406,
411, 417, 423, 429, 435, 441, 447, 454, 461, 467, 475, 482,
489, 497, 505, 513, 522, 530, 539, 549, 559, 569, 579, 590,
602, 614, 626, 640, 654, 668, 684, 700, 717, 736, 755, 775,
796, 819, 843, 869, 896, 925, 955, 988, 1022, 1058, 1098, 1139,
1184, 1232, 1282, 1336
},
{
4, 9, 10, 13, 15, 17, 20, 22, 25, 28, 31, 34,
37, 40, 43, 47, 50, 53, 57, 60, 64, 68, 71, 75,
78, 82, 86, 90, 93, 97, 101, 105, 109, 113, 116, 120,
124, 128, 132, 136, 140, 143, 147, 151, 155, 159, 163, 166,
170, 174, 178, 182, 185, 189, 193, 197, 200, 204, 208, 212,
215, 219, 223, 226, 230, 233, 237, 241, 244, 248, 251, 255,
259, 262, 266, 269, 273, 276, 280, 283, 287, 290, 293, 297,
300, 304, 307, 310, 314, 317, 321, 324, 327, 331, 334, 337,
343, 350, 356, 362, 369, 375, 381, 387, 394, 400, 406, 412,
418, 424, 430, 436, 442, 448, 454, 460, 466, 472, 478, 484,
490, 499, 507, 516, 525, 533, 542, 550, 559, 567, 576, 584,
592, 601, 609, 617, 625, 634, 644, 655, 666, 676, 687, 698,
708, 718, 729, 739, 749, 759, 770, 782, 795, 807, 819, 831,
844, 856, 868, 880, 891, 906, 920, 933, 947, 961, 975, 988,
1001, 1015, 1030, 1045, 1061, 1076, 1090, 1105, 1120, 1137, 1153, 1170,
1186, 1202, 1218, 1236, 1253, 1271, 1288, 1306, 1323, 1342, 1361, 1379,
1398, 1416, 1436, 1456, 1476, 1496, 1516, 1537, 1559, 1580, 1601, 1624,
1647, 1670, 1692, 1717, 1741, 1766, 1791, 1817, 1844, 1871, 1900, 1929,
1958, 1990, 2021, 2054, 2088, 2123, 2159, 2197, 2236, 2276, 2319, 2363,
2410, 2458, 2508, 2561, 2616, 2675, 2737, 2802, 2871, 2944, 3020, 3102,
3188, 3280, 3375, 3478, 3586, 3702, 3823, 3953, 4089, 4236, 4394, 4559,
4737, 4929, 5130, 5347
},
{
4, 12, 18, 25, 33, 41, 50, 60,
70, 80, 91, 103, 115, 127, 140, 153,
166, 180, 194, 208, 222, 237, 251, 266,
281, 296, 312, 327, 343, 358, 374, 390,
405, 421, 437, 453, 469, 484, 500, 516,
532, 548, 564, 580, 596, 611, 627, 643,
The function ac_q( b ) is specified as ac_qlookup[ (BitDepth-8) >> 1 ][ Clip3( 0, 255, b ) ] where ac_lookup is
defined as follows:
ac_qlookup[ 3 ][ 256 ] = {
{
4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,
104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126,
128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
152, 155, 158, 161, 164, 167, 170, 173, 176, 179, 182, 185,
188, 191, 194, 197, 200, 203, 207, 211, 215, 219, 223, 227,
231, 235, 239, 243, 247, 251, 255, 260, 265, 270, 275, 280,
285, 290, 295, 300, 305, 311, 317, 323, 329, 335, 341, 347,
353, 359, 366, 373, 380, 387, 394, 401, 408, 416, 424, 432,
440, 448, 456, 465, 474, 483, 492, 501, 510, 520, 530, 540,
550, 560, 571, 582, 593, 604, 615, 627, 639, 651, 663, 676,
689, 702, 715, 729, 743, 757, 771, 786, 801, 816, 832, 848,
864, 881, 898, 915, 933, 951, 969, 988, 1007, 1026, 1046, 1066,
1087, 1108, 1129, 1151, 1173, 1196, 1219, 1243, 1267, 1292, 1317, 1343,
1369, 1396, 1423, 1451, 1479, 1508, 1537, 1567, 1597, 1628, 1660, 1692,
1725, 1759, 1793, 1828
},
{
4, 9, 11, 13, 16, 18, 21, 24, 27, 30, 33, 37,
40, 44, 48, 51, 55, 59, 63, 67, 71, 75, 79, 83,
88, 92, 96, 100, 105, 109, 114, 118, 122, 127, 131, 136,
140, 145, 149, 154, 158, 163, 168, 172, 177, 181, 186, 190,
195, 199, 204, 208, 213, 217, 222, 226, 231, 235, 240, 244,
249, 253, 258, 262, 267, 271, 275, 280, 284, 289, 293, 297,
302, 306, 311, 315, 319, 324, 328, 332, 337, 341, 345, 349,
354, 358, 362, 367, 371, 375, 379, 384, 388, 392, 396, 401,
409, 417, 425, 433, 441, 449, 458, 466, 474, 482, 490, 498,
506, 514, 523, 531, 539, 547, 555, 563, 571, 579, 588, 596,
604, 616, 628, 640, 652, 664, 676, 688, 700, 713, 725, 737,
749, 761, 773, 785, 797, 809, 825, 841, 857, 873, 889, 905,
922, 938, 954, 970, 986, 1002, 1018, 1038, 1058, 1078, 1098, 1118,
1138, 1158, 1178, 1198, 1218, 1242, 1266, 1290, 1314, 1338, 1362, 1386,
1411, 1435, 1463, 1491, 1519, 1547, 1575, 1603, 1631, 1663, 1695, 1727,
1759, 1791, 1823, 1859, 1895, 1931, 1967, 2003, 2039, 2079, 2119, 2159,
2199, 2239, 2283, 2327, 2371, 2415, 2459, 2507, 2555, 2603, 2651, 2703,
2755, 2807, 2859, 2915, 2971, 3027, 3083, 3143, 3203, 3263, 3327, 3391,
3455, 3523, 3591, 3659, 3731, 3803, 3876, 3952, 4028, 4104, 4184, 4264,
4348, 4432, 4516, 4604, 4692, 4784, 4876, 4972, 5068, 5168, 5268, 5372,
5476, 5584, 5692, 5804, 5916, 6032, 6148, 6268, 6388, 6512, 6640, 6768,
6900, 7036, 7172, 7312
},
{
4, 13, 19, 27, 35, 44, 54, 64,
75, 87, 99, 112, 126, 139, 154, 168,
183, 199, 214, 230, 247, 263, 280, 297,
314, 331, 349, 366, 384, 402, 420, 438,
456, 475, 493, 511, 530, 548, 567, 586,
604, 623, 642, 660, 679, 698, 716, 735,
753, 772, 791, 809, 828, 846, 865, 884,
902, 920, 939, 957, 976, 994, 1012, 1030,
1049, 1067, 1085, 1103, 1121, 1139, 1157, 1175,
The function get_qindex( ) returns the quantizer index for the current block and is specified by the following:
− If seg_feature_active( SEG_LVL_ALT_Q ) is equal to 1 the following ordered steps apply:
1. Set the variable data equal to FeatureData[ segment_id ][ SEG_LVL_ALT_Q ].
2. If segmentation_abs_or_delta_update is equal to 0, set data equal to base_q_idx + data
3. Return Clip3( 0, 255, data ).
− Otherwise, return base_q_idx.
The function get_dc_quant( plane ) returns the quantizer value for the dc coefficient for a particular plane and
is derived as follows:
− If plane is equal to 0, return dc_q( get_qindex( ) + delta_q_y_dc ).
− Otherwise, return dc_q( get_qindex( ) + delta_q_uv_dc ).
The function get_ac_quant( plane ) returns the quantizer value for the ac coefficient for a particular plane and
is derived as follows:
− If plane is equal to 0, return ac_q( get_qindex( ) ).
− Otherwise, return ac_q( get_qindex( ) + delta_q_uv_ac ).
The function brev(numBits, x) returns the bit-reversal of numBits of x and is specified as follows:
brev( numBits, x ) {
t=0
for ( i = 0; i < numBits; i++ ) {
bit = (x >> i) & 1
t += bit << (numBits - 1 - i)
}
return t
}
The function B( a, b, angle, 0 ) performs a butterfly rotation specified by the following ordered steps:
1. The variable x is set equal to T[ a ] * cos64( angle ) - T[ b ] * sin64( angle ).
2. The variable y is set equal to T[ a ] * sin64( angle ) + T[ b ] * cos64( angle ).
3. T[ a ] is set equal to Round2( x, 14 ).
4. T[ b ] is set equal to Round2( y, 14 ).
It is a requirement of bitstream conformance that the values saved into the array T by this function are
representable by a signed integer using 8 + BitDepth bits of precision.
The function cos64( angle ) is specified for integer values of the input angle by the following ordered steps:
1. Set a variable angle2 equal to angle & 127.
2. If angle2 is greater than or equal to 0 and less than or equal to 32, return cos64_lookup[ angle2 ].
3. If angle2 is greater than 32 and less than or equal to 64, return cos64_lookup[ 64 - angle2 ] * -1.
4. If angle2 is greater than 64 and less than or equal to 96, return cos64_lookup[ angle2 - 64 ] * -1.
5. Otherwise (if angle2 is greater than 96 and less than 128), return cos64_lookup[ 128 - angle2 ].
Where cos64_lookup is a constant lookup table defined as:
cos64_lookup[ 33 ] = {
16384, 16364, 16305, 16207, 16069, 15893, 15679, 15426,
15137, 14811, 14449, 14053, 13623, 13160, 12665, 12140,
11585, 11003, 10394, 9760, 9102, 8423, 7723, 7005,
6270, 5520, 4756, 3981, 3196, 2404, 1606, 804,
0
}
When the angle is equal to 16 + 32*k for integer k the butterfly rotation can be equivalently performed with two
fewer multiplications (because the magnitude of cos64(16 + 32*k) is always equal to that of sin64(16 + 32*k))
by the following process:
1. The variable v is set equal to (angle & 32) ? T[ a ] + T[ b ] : T[ a ] - T[ b ].
2. The variable w is set equal to (angle & 32) ? -T[ a ] + T[ b ] : T[ a ] + T[ b ].
3. The variable x is set equal to v * cos64( angle ).
4. The variable y is set equal to w * cos64( angle ).
5. T[ a ] is set equal to Round2( x, 14 ).
6. T[ b ] is set equal to Round2( y, 14 ).
It is a requirement of bitstream conformance that the angle is equal to 16 + 32*k for integer k, the variables v
and w are representable by a signed integer using 8 + BitDepth bits of precision.
The function B( a ,b, angle, 1 ) performs a butterfly rotation and flip specified by the following ordered steps:
1. The function B( a, b, angle, 0 ) is invoked.
2. The contents of T[ a ] and T[ b ] are exchanged.
The function H( a, b, 0 ) performs a Hadamard rotation specified by the following ordered steps:
1. The variable x is set equal to T[ a ].
2. The variable y is set equal to T[ b ].
3. T[ a ] is set equal to x + y.
4. T[ b ] is set equal to x - y.
It is a requirement of bitstream conformance that the values saved into the array T by this function are
representable by a signed integer using 8 + BitDepth bits of precision.
The function H( a, b, 1 ) performs a Hadamard rotation with flipped indices and is specified as follows:
1. The function H( b, a, 0 ) is invoked.
The function SB( a, b, angle, 0 ) performs a butterfly rotation according to the following ordered steps:
1. S[ a ] is set equal to T[ a ] * cos64( angle ) - T[ b ] * sin64( angle ).
2. S[ b ] is set equal to T[ a ] * sin64( angle ) + T[ b ] * cos64( angle ).
The function SB( a, b, angle, 1 ) performs a butterfly rotation and flip according to the following ordered steps:
1. The function SB( a, b, angle, 0 ) is invoked.
2. The contents of S[ a ] and S[ b ] are exchanged.
The function SH( a, b ) performs a Hadamard rotation and rounding specified by the following ordered steps:
1. T[ a ] is set equal to Round2( S[ a ] + S[ b ], 14 ).
2. T[ b ] is set equal to Round2( S[ a ] - S[ b ], 14 ).
s1 = SINPI_2_9 * T[ 0 ]
s2 = SINPI_3_9 * T[ 1 ]
s3 = SINPI_4_9 * T[ 2 ]
s4 = SINPI_1_9 * T[ 2 ]
s5 = SINPI_2_9 * T[ 3 ]
s6 = SINPI_4_9 * T[ 3 ]
v = T[ 0 ] - T[ 2 ] + T[ 3 ]
s7 = SINPI_3_9 * v
x0 = s0 + s3 + s5
x1 = s1 - s4 - s6
x2 = s7
x3 = s2
s0 = x0 + x3
s1 = x1 + x3
s2 = x2
s3 = x0 + x1 - x3
T[ 0 ] = Round2( s0, 14 )
T[ 1 ] = Round2( s1, 14 )
T[ 2 ] = Round2( s2, 14 )
T[ 3 ] = Round2( s3, 14 )
It is a requirement of bitstream conformance that the values saved into the variable v and into the array T by
this function are representable by a signed integer using 8 + BitDepth bits of precision.
The constants used in this function are defined as:
SINPI_1_9 5283
SINPI_2_9 9929
SINPI_3_9 13377
SINPI_4_9 15212
8. Invoke the ADST output array permutation process specified in section 8.7.1.5 with the input variable n
set equal to 3.
9. Set T[ 1+2*i ] equal to -T[ 1+2*i ] for i = 0..3.
8.7.1.8 Inverse ADST16 process
This process does an in-place transform of the array T using a higher precision array S for intermediate
results. The following ordered steps apply:
1. Invoke the ADST input array permutation process specified in section 8.7.1.4 with the input variable n set
equal to 4.
2. Invoke SB( 2*i, 1+2*i, 31-4*i, 1 ) for i = 0..7.
3. Invoke SH( i, 8+i ) for i = 0..7.
4. Invoke SB( 8+2*i, 9+2*i, 28-16*i, 1 ) for i = 0..3.
5. Invoke SH( 8+i, 12+i ) for i = 0..3.
6. Invoke H( i, 4+i, 0 ) for i = 0..3.
7. Invoke SB( 4+8*i+3*j, 5+8*i+j, 24-16*j, 1 ) for i = 0..1, for j = 0..1.
8. Invoke SH( 4+8*j+i, 6+8*j+i ) for i = 0..1, j = 0..1.
9. Invoke H( 8*j+i, 2+8*j+i, 0 ) for i = 0..1, for j = 0..1.
10. Invoke B( 2+4*j+8*i, 3+4*j+8*i, 48+64*(i^j), 0 ) for i = 0..1, for j = 0..1.
11. Invoke the ADST output array permutation process specified in section 8.7.1.5 with the input variable n
set equal to 4.
12. Set T[ 1+12*j+2*i ] equal to -T[ 1+12*j+2*i ] for i = 0..1, for j = 0..1.
8.7.1.9 Inverse ADST process
n
This process performs an in-place inverse ADST process on the array T of size 2 for 2 ≤ n ≤ 4.
The input to this process is a variable n that specifies the base 2 logarithm of the length of the input array.
The process to invoke depends on n as follows:
− If n is equal to 2, invoke the Inverse ADST4 process specified in section 8.7.1.6.
− Otherwise if n is equal to 3, invoke the Inverse ADST8 process specified in section 8.7.1.7.
− Otherwise (n is equal to 4), invoke the Inverse ADST16 process specified in section 8.7.1.8.
8.7.1.10 Inverse Walsh-Hadamard transform process
The input to this process is a variable shift that specifies the amount of pre-scaling.
This process does an in-place transform of the array T (of length 4) by the following ordered steps:
a = T[ 0 ] >> shift
c = T[ 1 ] >> shift
d = T[ 2 ] >> shift
b = T[ 3 ] >> shift
a += c
d -= b
e = (a - d) >> 1
b=e-b
c=e-c
a -= b
d += c
T[ 0 ] = a
T[ 1 ] = b
T[ 2 ] = c
T[ 3 ] = d
The outputs of this process are modified values in the array CurrFrame.
The variables subX and subY describing the subsampling of the current plane are derived as follows:
− If plane is equal to 0, subX and subY are set equal to 0.
− Otherwise (plane is not equal to 0), subX is set equal to subsampling_x and subY is set equal to
subsampling_y.
The variables dx, dy, sub, and edgeLen are derived as follows:
− If pass is equal to 0, then dx is set equal to 1, dy is set equal to 0, sub is set equal to subX, edgeLen is set
equal to 64 >> subY.
− Otherwise (pass is equal to 1), dy is set equal to 1, dx is set equal to 0, sub is set equal to subY, edgeLen
is set equal to 64 >> subX.
dx and dy specify the offset between the samples to be filtered.
sub is the subsampling factor in the direction of the filter (i.e. perpendicular to the boundary being filtered).
edgeLen is the length of the boundary in samples (64 for luma, but fewer for chroma when subsampling is
being used).
For the variable edge taking values between 0 and (16 >> sub) - 1 (in increasing order) and i taking values
from 0 to edgeLen - 1, the following ordered steps apply:
1. The variables x and y (containing the location in luma coordinates) are derived as follows:
− If pass is equal to 0, x is set equal to col * 8 + edge * (4 << subX), and y is set equal to row * 8 + (i <<
subY).
− Otherwise (pass is equal to 1), x is set equal to col * 8 + (i << subX), and y is set equal to row * 8 +
edge * (4 << subY).
2. Set the variable loopCol equal to ( (x >> 3) >> subX ) << subX.
3. Set the variable loopRow equal to ( (y >> 3) >> subY ) << subY. (loopRow and loopCol specify the luma
location in units of 8x8 blocks.)
4. Set the variable MiSize equal to MiSizes[ loopRow ][ loopCol ].
5. Set the variable tx_size equal to TxSizes[ loopRow ][ loopCol ].
6. Set the variable txSz equal to (plane > 0) ? get_uv_tx_size( ) : tx_size.
7. The variable sbSize is derived as follows:
− If sub is equal to 0, set sbSize equal to MiSize.
− Otherwise (sub is equal to 1), set sbSize equal to Max(BLOCK_16X16, MiSize).
8. Set the variable skip equal to Skips[ loopRow ][ loopCol ].
9. Set the variable isIntra equal to RefFrames[ loopRow ][ loopCol ][ 0 ] <= INTRA_FRAME.
10. The variable isBlockEdge (equal to 1 if the samples cross a prediction block edge) is derived as follows:
− If pass is equal to 0 and x is an exact multiple of 8*num_8x8_blocks_wide_lookup[ sbSize ],
isBlockEdge is set equal to 1.
− Otherwise, if pass is equal to 1 and y is an exact multiple of 8*num_8x8_blocks_high_lookup[ sbSize ],
isBlockEdge is set equal to 1.
− Otherwise, isBlockEdge is set equal to 0.
11. The variable isTxEdge (equal to 1 if the samples cross a transform block edge) is derived as follows:
− If pass is equal to 1 and subX is equal to 1 and MiCols is odd and edge is odd and (x + 8) >= MiCols *
8, isTxEdge is set equal to 0. (This corresponds to a case when the horizontal boundary of a chroma
block crosses the right hand edge of the image.)
− Otherwise, if edge is an exact multiple of 1 << txSz, isTxEdge is set equal to 1.
− Otherwise, isTxEdge is set equal to 0.
12. The variable is32Edge (equal to 1 if the samples cross a 32 sample boundary) is derived as follows:
− If edge is an exact multiple of 8, is32Edge is set equal to 1.
− Otherwise, is32Edge is set equal to 0.
13. The variable onScreen (equal to 1 if the samples on both sides of the boundary lie in the visible area) is
derived as follows:
− If x is greater than or equal to 8 * MiCols, onScreen is set equal to 0.
− Otherwise, if y is greater than or equal to 8 * MiRows, onScreen is set equal to 0.
− Otherwise, if pass is equal to 0 and x is equal to 0, onScreen is set equal to 0.
− Otherwise, if pass is equal to 1 and y is equal to 0, onScreen is set equal to 0.
− Otherwise, onScreen is set equal to 1.
14. The variable applyFilter (equal to 1 if the samples should be filtered) is derived as follows:
− If onScreen is equal to 0, applyFilter is set equal to 0.
− Otherwise, if isBlockEdge is equal to 1, applyFilter is set equal to 1.
− Otherwise, if isTxEdge is equal to 1 and isIntra is equal to 1, applyFilter is set equal to 1.
− Otherwise, if isTxEdge is equal to 1 and skip is equal to 0, applyFilter is set equal to 1.
− Otherwise applyFilter is set equal to 0.
15. The filter size process specified in 8.8.3 is invoked with the inputs txSz, is32Edge, pass, x, y, subX, and
subY and the output assigned to the variable filterSize (containing the maximum filter size that can be
used).
16. The adaptive filter strength process specified in 8.8.4 is invoked with the inputs loopRow and loopCol, and
the output assigned to the variables lvl, limit, blimit, and thresh.
17. If applyFilter is equal to 1 and lvl is greater than zero, the sample filtering process specified in 8.8.5 is
invoked with the input variable x set equal to x >> subX, the input variable y set equal to y >> subY, and
the variables plane, limit, blimit, thresh, plane, dx, dy, filterSize.
8.8.3 Filter size process
The inputs to this process are:
− a variable txSz specifying the size of the transform block,
− a variable is32Edge that is equal to 1 if the boundary is at a multiple of 32 samples,
− a variable pass specifying the direction of the edges,
− variables x and y specifying the location in luma coordinates,
− variables subX and subY describing the subsampling of the current plane.
The output of this process is the variable filterSize containing the maximum filter size that can be used.
The purpose of this process is to reduce the width of the chroma filters if the filter would cross the frame
boundary. In addition, the filter size is clipped to a minimum size of TX_8X8 for boundaries on a multiple of 32
samples.
First the filter mask process specified in section 8.8.5.1 is invoked with the inputs x, y, plane, limit, blimit,
thresh, dx, dy, and filterSize, and the output is assigned to the variables hevMask, filterMask, flatMask, and
flatMask2.
Then the appropriate filter process is invoked with the inputs x, y, plane, dx, dy as follows:
− If filterMask is equal to 0, no filter is invoked.
− Otherwise, if filterSize is equal to TX_4X4 or flatMask is equal to 0, the narrow filter process specified in
section 8.8.5.2 is invoked with the additional input variable hevMask.
− Otherwise, if filterSize is equal to TX_8X8 or flatMask2 is equal to 0, the wide filter process specified in
section 8.8.5.3 is invoked with the additional input variable log2Size set to 3.
− Otherwise, the wide filter process specified in section 8.8.5.3 is invoked with the additional input variable
log2Size set to 4.
8.8.5.1 Filter mask process
The inputs to this process are:
− variables x and y specifying the location within CurrFrame[ plane ],
− a variable plane specifying whether the block is the Y, U or V plane,
− variables limit, blimit, thresh that specify the strength of the filtering operation,
− variables dx and dy specifying the direction perpendicular to the edge being filtered,
− a variable filterSize of specifying the maximum size of filter allowed.
The values output for these masks depend on the differences between samples on either side of the specified
boundary. These samples are specified as follows:
q0 = CurrFrame[ plane ][ y ][ x ]
NOTE – Samples q4, q5, q6, q7, p4, p5, p6 and p7 are only used if filterSize is equal to TX_16X16.
The value of hevMask indicates whether the sample has high edge variance. It is calculated as follows:
hevMask = 0
threshBd = thresh << (BitDepth - 8)
hevMask |= (Abs( p1 - p0 ) > threshBd)
hevMask |= (Abs( q1 - q0 ) > threshBd)
The value of filterMask indicates whether adjacent samples close to the edge (within four samples either side
of the specified boundary) vary by less than the limits given by limit and blimit. It is used to determine if any
filtering should occur and is calculated as follows:
limitBd = limit << (BitDepth - 8)
blimitBd = blimit << (BitDepth - 8)
mask = 0
mask |= (Abs( p3 - p2 ) > limitBd)
mask |= (Abs( p2 - p1 ) > limitBd)
mask |= (Abs( p1 - p0 ) > limitBd)
mask |= (Abs( q1 - q0 ) > limitBd)
mask |= (Abs( q2 - q1 ) > limitBd)
mask |= (Abs( q3 - q2 ) > limitBd)
mask |= (Abs( p0 - q0 ) * 2 + Abs( p1 - q1 ) / 2 > blimitBd)
filterMask = (mask == 0)
The value of flatMask is only required when filterSize >= TX_8X8. It measures whether at least four samples
from each side of the specified boundary are in a flat region. That is whether those samples are at most (1 <<
(BitDepth - 8)) different from the sample on the boundary. It is calculated as follows:
thresholdBd = 1 << (BitDepth - 8)
if ( filterSize >= TX_8X8 ) {
mask = 0
mask |= (Abs( p1 - p0 ) > thresholdBd)
The value of flatMask2 is only required when filterSize >= TX_16X16. It measures whether at least eight
samples from each side of the specified boundary are in a flat region assuming the first four on each side are
(so the full region is flat if flatMask & flatMask2 == 0). The value of flatMask2 is calculated as follows:
thresholdBd = 1 << (BitDepth - 8)
if ( filterSize >= TX_16X16 ) {
mask = 0
mask |= (Abs( p7 - p0 ) > thresholdBd)
mask |= (Abs( q7 - q0 ) > thresholdBd)
mask |= (Abs( p6 - p0 ) > thresholdBd)
mask |= (Abs( q6 - q0 ) > thresholdBd)
mask |= (Abs( p5 - p0 ) > thresholdBd)
mask |= (Abs( q5 - q0 ) > thresholdBd)
mask |= (Abs( p4 - p0 ) > thresholdBd)
mask |= (Abs( q4 - q0 ) > thresholdBd)
flatMask2 = (mask == 0)
}
This process modifies up to two samples on each side of the specified boundary depending on the value of
hevMask as follows:
− If hevMask is equal to 0 (i.e. the samples do not have high edge variance), this process modifies two
samples on each side of the specified boundary, using a filter constructed from just the inner two (one
from each side of the specified boundary).
− Otherwise (the samples do have high edge variance), this process only modifies the one value on each
side of the specified boundary, using a filter constructed from four input samples (two from each side of
the specified boundary).
The process subtracts 0x80 << (BitDepth - 8) from the input sample values so that they are in the range -(1 <<
(BitDepth - 1)) to (1 << (BitDepth - 1)) - 1 inclusive. Intermediate values are made to be in this range by the
following function:
filter4_clamp( value ) {
return Clip3( -(1 << (BitDepth - 1)), (1 << (BitDepth - 1)) - 1, value )
}
This filter is only applied when samples from each side of the boundary are detected to be in a flat region.
The variable n (specifying the number of filter taps on each side of the central sample) is set equal to (1 << (
log2Size - 1 ) ) -1.
This process modifies the samples on each side of the specified boundary by applying a low pass filter as
follows:
for( i = -n; i < n; i++ ) {
t = CurrFrame[ plane ][ y+i*dy ][ x+i*dx ]
where F is an array with indices from -n to n-1 used to store the filtered results.
8.9 Output process
This process is invoked to optionally display a frame.
If show_existing_frame is equal to 1, then the decoder should output a previously decoded frame as follows:
− The variable w is set equal to RefFrameWidth[ frame_to_show_map_idx ].
− The variable h is set equal to RefFrameHeight[ frame_to_show_map_idx ].
− The variable subX is set equal to RefSubsamplingX[ frame_to_show_map_idx ].
− The variable subY is set equal to RefSubsamplingY[ frame_to_show_map_idx ].
− The Y plane is w samples across by h samples down and the sample at location x samples across and y
samples down is given by FrameStore[ frame_to_show_map_idx ][ 0 ][ y ][ x ] with x = 0..w - 1 and y = 0..h
- 1.
− The U plane is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample
at location x samples across and y samples down is given by
FrameStore[ frame_to_show_map_idx ][ 1 ][ y ][ x ] with x = 0..((w + subX) >> subX) - 1 and y = 0..((h +
subY) >> subY) - 1.
− The V plane is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample
at location x samples across and y samples down is given by
FrameStore[ frame_to_show_map_idx ][ 2 ][ x ][ y ] with x = 0..((w + subX) >> subX) - 1 and y = 0..((h +
subY) >> subY) - 1.
− The bit depth for each sample is RefBitDepth[ frame_to_show_map_idx ].
Otherwise, if show_frame is equal to 1, then the decoder should output the current frame as follows:
− The variable w is set equal to FrameWidth.
− The variable h is set equal to FrameHeight.
− The variable subX is set equal to subsampling_x.
− The variable subY is set equal to subsampling_y.
− The Y plane is w samples across by h samples down and the sample at location x samples across and y
samples down is given by CurrFrame[ 0 ][ y ][ x ] with x = 0..w - 1 and y = 0..h - 1.
− The U plane is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample
at location x samples across and y samples down is given by CurrFrame[ 1 ][ y ][ x ] with x = 0..((w +
subX) >> subX) - 1 and y = 0..((h + subY) >> subY) - 1.
− The V plane is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample
at location x samples across and y samples down is given by CurrFrame[ 2 ][ x ][ y ] with x = 0..((w +
subX) >> subX) - 1 and y = 0..((h + subY) >> subY) - 1.
− The bit depth for each sample is BitDepth.
9 Parsing Process
9.1 Parsing process for f(n)
This process is invoked when the descriptor of a syntax element in the syntax tables is equal to f(n).
The next n bits are read from the bit stream.
This process is specified as follows:
x=0
for ( i = 0; i < n; i++ ) {
x = 2 * x + read_bit( )
}
read_bit( ) reads the next bit from the bitstream and advances the bitstream pointer by 1. If the bitstream is
provided as a series of bytes, then the first bit is given by the most significant bit of the first byte.
The value for the syntax element is given by x.
9.2 Parsing process for Boolean decoder
Aside from the uncompressed header and the partition sizes, the entire bitstream is entropy coded. The
entropy decoder is referred to as the “Boolean decoder” and the function init_bool( sz ), exit_bool( ), and
read_bool( p ) are used in this Specification to indicate the entropy decoding operation.
9.2.1 Initialization process for Boolean decoder
The input to this process is a variable sz specifying the number of bytes to be read by the Boolean decoder.
This process is invoked when the function init_bool( sz ) is called from the syntax structure.
The bitstream shall not contain data that results in this process being called with sz < 1.
NOTE – The bit position will always be byte aligned when init_bool is invoked because the uncompressed
header and the data partitions are always a whole number of bytes long.
− BoolValue -= split
− bool = 1
While BoolRange is less than 128 the following applies:
− A variable newBit is derived as:
− If BoolMaxBits > 0, the following ordered steps apply:
1. newBit is read using the f(1) parsing process.
2. BoolMaxBits -= 1
− Otherwise, newBit is set equal to 0. It is a requirement of bitstream conformance that this never
happens.
− BoolRange is doubled.
− BoolValue is modified as follows:
− BoolValue = (BoolValue << 1) + newBit
9.2.3 Exit process for Boolean decoder
This process is invoked when the function exit_bool( ) is called from the syntax structure.
The padding syntax element is read using the f(BoolMaxBits) parsing process.
It is a requirement of bitstream conformance that padding is equal to 0.
It is a requirement of bitstream conformance that enough padding bits are inserted to ensure that the final
coded byte of a frame is not equal to a superframe marker. A byte b is equal to a superframe marker if and
only if (b & 0xe0) is equal to 0xc0, i.e. if the most significant 3 bits are equal to 0b110.
NOTE – The padding is required to make the bit position byte aligned, and is also used to make sure that
frames do not end with a superframe marker. It is legal for there to be 0 bits of padding if the bitstream is
already byte aligned. It is also legal for there to be greater than 8 bits of padding.
9.2.4 Parsing process for read_literal
This process is invoked when the function read_literal( n ) is invoked.
This process is specified as follows:
x=0
for ( i = 0 ; i < n; i++ ) {
x = 2 * x + read_bool( 128 )
}
The return value for the function is given by x.
9.3 Parsing process for tree encoded syntax elements
This process is invoked when the descriptor of a syntax element in the syntax tables is equal to T.
The input to this process is the name of a syntax element.
The decoding of a syntax element depends on a tree and a list of probabilities.
Section 9.3.1 specifies how a tree (or value) is chosen for each syntax element.
Section 9.3.2 specifies how the probabilities are chosen for each syntax element.
Section 9.3.3 specifies how the value of the syntax element is decoded based on the chosen tree and
probabilities.
Section 9.3.4 specifies how the counts should be updated based on the value of the syntax element.
When the description in these sections use variables, these variables are taken to have the values defined by
the syntax tables at the point that the syntax element is being decoded.
When this process is invoked the following ordered steps apply:
1. The tree selection process as specified in section 9.3.1 is invoked with the name of the syntax element.
The next action depends on the return value:
a. If the return value is an integer, then the syntax element is set equal to the integer.
b. If the return value is a tree, then the syntax element is set equal to the return value of the tree
decoding process as specified in section 9.3.3.
2. The syntax element counting process as specified in section 9.3.4 is invoked with the name and value of
the syntax element.
9.3.1 Tree selection process
The input to this process is the name of a syntax element.
The output from this process is either a tree (represented by an array of integers) or a single integer giving the
value of the syntax element (for the cases when the syntax element can be determined without reading any
bits).
The tree is chosen based on the syntax element as follows:
partition: the tree depends on the value of hasRows and hasCols:
− If hasRows is equal to 1 and hasCols is equal to 1, the tree is partition_tree.
− Otherwise, if hasCols is equal to 1, the tree is cols_partition_tree.
− Otherwise, if hasRows is equal to 1, the tree is rows_partition_tree.
− Otherwise, the return value is PARTITION_SPLIT
partition_tree[ 6 ] = {
-PARTITION_NONE, 2,
-PARTITION_HORZ, 4,
-PARTITION_VERT, -PARTITION_SPLIT
}
cols_partition_tree[ 2 ] = {
-PARTITION_HORZ, -PARTITION_SPLIT
}
rows_partition_tree[ 2 ] = {
-PARTITION_VERT, -PARTITION_SPLIT
}
default_intra_mode and default_uv_mode and intra_mode and sub_intra_mode and uv_mode: the tree
is intra_mode_tree:
intra_mode_tree[ 18 ] = {
-DC_PRED, 2,
-TM_PRED, 4,
-V_PRED, 6,
8, 12,
-H_PRED, 10,
-D135_PRED, -D117_PRED,
-D45_PRED, 14,
-D63_PRED, 16,
-D153_PRED, -D207_PRED
}
skip and seg_id_predicted and is_inter and comp_mode and comp_ref and single_ref_p1 and
single_ref_p2 and mv_sign and mv_bit and mv_class0_bit and more_coefs: the tree is binary_tree:
binary_tree[ 2 ] = {
0, -1
}
NOTE – These syntax elements can also be considered as being read directly from the stream using the
read_bool process. They are described using the tree structure because the process for computing the
probability is more complicated than can be easily expressed in the syntax tables.
tx_size: the tree depends on the value of maxTxSize:
− If maxTxSize is equal to TX_32X32 the tree is tx_size_32_tree.
− Otherwise, if maxTxSize is equal to TX_16X16 the tree is tx_size_16_tree.
− Otherwise, the tree is tx_size_8_tree.
tx_size_32_tree[ 6 ] = {
-TX_4X4, 2,
-TX_8X8, 4,
-TX_16X16, -TX_32X32
}
tx_size_16_tree[ 4 ] = {
-TX_4X4, 2,
-TX_8X8, -TX_16X16,
}
tx_size_8_tree[ 2 ] = {
-TX_4X4, -TX_8X8
}
-TWO_TOKEN, 8,
-THREE_TOKEN, -FOUR_TOKEN,
12, 14,
-DCT_VAL_CATEGORY1, -DCT_VAL_CATEGORY2,
16, 18,
-DCT_VAL_CATEGORY3, -DCT_VAL_CATEGORY4,
-DCT_VAL_CATEGORY5, -DCT_VAL_CATEGORY6
}
if ( idy )
abovemode = sub_modes[ idx ]
else
abovemode = AvailU ? SubModes[ MiRow - 1][ MiCol ][ 2 + idx ] : DC_PRED
if ( idx )
leftmode = sub_modes[ idy * 2 ]
else
leftmode = AvailL ? SubModes[ MiRow ][ MiCol - 1 ][ 1 + idy * 2 ] : DC_PRED
}
}
NOTE – We are using a 2D array to store the SubModes for clarity. It is possible to reduce memory
consumption by only storing one intra mode for each 8x8 horizontal and vertical position, i.e. to use two 1D
arrays instead.
default_uv_mode: the probability is given by kf_uv_mode_probs[ y_mode ][ node ].
intra_mode: the probability is given by y_mode_probs[ ctx ][ node ] where ctx is computed by:
ctx = size_group_lookup[ MiSize ]
sub_intra_mode: the probability is given by y_mode_probs[ ctx ][ node ] where ctx is set equal to 0.
uv_mode: the probability is given by uv_mode_probs[ ctx ][ node ] where ctx is set equal to y_mode.
segment_id: the probability is given by segmentation_tree_probs[ node ].
skip: the probability is given by skip_prob[ ctx ] where ctx is computed by:
ctx = 0
if ( AvailU )
ctx += Skips[ MiRow - 1 ][ MiCol ]
if ( AvailL )
ctx += Skips[ MiRow ][ MiCol - 1 ]
seg_id_predicted: the probability is given by segmentation_pred_prob[ ctx ] where ctx is computed by:
ctx = LeftSegPredContext[ MiRow ] + AboveSegPredContext[ MiCol ]
is_inter: the probability is given by is_inter_prob[ ctx ] where ctx is computed by:
if ( AvailU && AvailL )
ctx = (LeftIntra && AboveIntra) ? 3 : LeftIntra || AboveIntra
else if ( AvailU || AvailL )
ctx = 2 * (AvailU ? AboveIntra : LeftIntra)
else
ctx = 0
comp_mode: the probability is given by comp_mode_prob[ ctx ] where ctx is computed by:
if ( AvailU && AvailL ) {
if ( AboveSingle && LeftSingle )
ctx = (AboveRefFrame[ 0 ] == CompFixedRef)
^ (LeftRefFrame[ 0 ] == CompFixedRef)
else if ( AboveSingle )
ctx = 2 + (AboveRefFrame[ 0 ] == CompFixedRef || AboveIntra)
else if ( LeftSingle )
ctx = 2 + (LeftRefFrame[ 0 ] == CompFixedRef || LeftIntra)
else
ctx = 4
} else if ( AvailU ) {
if ( AboveSingle )
ctx = AboveRefFrame[ 0 ] == CompFixedRef
else
ctx = 3
} else if ( AvailL ) {
if ( LeftSingle )
ctx= LeftRefFrame[ 0 ] == CompFixedRef
else
ctx = 3
} else {
ctx = 1
}
comp_ref: the probability is given by comp_ref_prob[ ctx ] where ctx is computed by:
FixRefIdx = ref_frame_sign_bias[ CompFixedRef ]
VarRefIdx = ! FixRefIdx
if ( AvailU && AvailL ) {
if ( AboveIntra && LeftIntra ) {
ctx = 2
} else if ( LeftIntra ) {
if ( AboveSingle )
ctx = 1 + 2 * (AboveRefFrame[ 0 ] != CompVarRef[ 1 ])
else
ctx = 1 + 2 * (AboveRefFrame[ VarRefIdx ] != CompVarRef[ 1 ])
} else if ( AboveIntra ) {
if ( LeftSingle )
ctx = 1 + 2 * (LeftRefFrame[ 0 ] != CompVarRef[ 1 ])
else
ctx = 1 + 2 * (LeftRefFrame[ VarRefIdx ] != CompVarRef[ 1 ])
} else {
vrfa = AboveSingle ? AboveRefFrame[ 0 ] : AboveRefFrame[ VarRefIdx ]
vrfl = LeftSingle ? LeftRefFrame[ 0 ] : LeftRefFrame[ VarRefIdx ]
if ( vrfa == vrfl && CompVarRef[ 1 ] == vrfa ) {
ctx = 0
} else if ( LeftSingle && AboveSingle ) {
if ( (vrfa == CompFixedRef && vrfl == CompVarRef[ 0 ]) ||
(vrfl == CompFixedRef && vrfa == CompVarRef[ 0 ]) )
ctx = 4
else if ( vrfa == vrfl )
ctx = 3
else
ctx = 1
} else if ( LeftSingle || AboveSingle ) {
vrfc = LeftSingle ? vrfa : vrfl
rfs = AboveSingle ? vrfa : vrfl
if ( vrfc == CompVarRef[ 1 ] && rfs != CompVarRef[ 1 ] )
ctx = 1
else if ( rfs == CompVarRef[ 1 ] && vrfc != CompVarRef[ 1 ] )
ctx = 2
else
ctx = 4
} else if ( vrfa == vrfl ) {
ctx = 4
} else {
ctx = 2
}
}
} else if ( AvailU ) {
if ( AboveIntra ) {
ctx = 2
} else {
if ( AboveSingle )
ctx = 3 * (AboveRefFrame[ 0 ] != CompVarRef[ 1 ])
else
ctx = 4 * (AboveRefFrame[ VarRefIdx ] != CompVarRef[ 1 ])
}
} else if ( AvailL ) {
if ( LeftIntra ) {
ctx = 2
} else {
if ( LeftSingle )
ctx = 3 * (LeftRefFrame[ 0 ] != CompVarRef[ 1 ])
else
ctx = 4 * (LeftRefFrame[ VarRefIdx ] != CompVarRef[ 1 ])
}
} else {
ctx = 2
}
single_ref_p1: the probability is given by single_ref_prob[ ctx ][ 0 ] where ctx is computed by:
if ( AvailU && AvailL ) {
if ( AboveIntra && LeftIntra ) {
ctx = 2
} else if ( LeftIntra ) {
if ( AboveSingle )
ctx = 4 * (AboveRefFrame[ 0 ] == LAST_FRAME)
else
ctx = 1 + (AboveRefFrame[ 0 ] == LAST_FRAME || AboveRefFrame[ 1 ] == LAST_FRAME)
} else if ( AboveIntra ) {
if ( LeftSingle )
ctx = 4 * (LeftRefFrame[ 0 ] == LAST_FRAME)
else
ctx = 1 + (LeftRefFrame[ 0 ] == LAST_FRAME || LeftRefFrame[ 1 ] == LAST_FRAME)
} else {
if ( AboveSingle && LeftSingle ) {
ctx = 2 * (AboveRefFrame[ 0 ] == LAST_FRAME) +
2 * (LeftRefFrame[ 0 ] == LAST_FRAME)
} else if ( !AboveSingle && !LeftSingle ) {
ctx = 1 + (AboveRefFrame[ 0 ] == LAST_FRAME ||
AboveRefFrame[ 1 ] == LAST_FRAME ||
LeftRefFrame[ 0 ] == LAST_FRAME ||
LeftRefFrame[ 1 ] == LAST_FRAME)
} else {
rfs = AboveSingle ? AboveRefFrame[ 0 ] : LeftRefFrame[ 0 ]
crf1 = AboveSingle ? LeftRefFrame[ 0 ] : AboveRefFrame[ 0 ]
crf2 = AboveSingle ? LeftRefFrame[ 1 ] : AboveRefFrame[ 1 ]
if ( rfs == LAST_FRAME )
ctx = 3 + (crf1 == LAST_FRAME || crf2 == LAST_FRAME)
else
ctx = crf1 == LAST_FRAME || crf2 == LAST_FRAME
}
}
} else if ( AvailU ) {
if ( AboveIntra ) {
ctx = 2
} else { // inter
if ( AboveSingle )
ctx = 4 * (AboveRefFrame[ 0 ] == LAST_FRAME)
else
ctx = 1 + (AboveRefFrame[ 0 ] == LAST_FRAME ||
AboveRefFrame[ 1 ] == LAST_FRAME)
}
} else if ( AvailL ) {
if ( LeftIntra ) {
ctx = 2
} else {
if ( LeftSingle )
single_ref_p2: the probability is given by single_ref_prob[ ctx ][ 1 ] where ctx is computed by:
if ( AvailU && AvailL ) {
if ( AboveIntra && LeftIntra ) {
ctx= 2
} else if ( LeftIntra ) {
if ( AboveSingle ) {
if ( AboveRefFrame[ 0 ] == LAST_FRAME )
ctx= 3
else
ctx= 4 * (AboveRefFrame[ 0 ] == GOLDEN_FRAME)
} else {
ctx= 1 + 2 * (AboveRefFrame[0] == GOLDEN_FRAME || AboveRefFrame[1] == GOLDEN_FRAME)
}
} else if ( AboveIntra ) {
if ( LeftSingle ) {
if ( LeftRefFrame[ 0 ] == LAST_FRAME )
ctx= 3
else
ctx= 4 * (LeftRefFrame[ 0 ] == GOLDEN_FRAME)
} else {
ctx= 1 + 2 * (LeftRefFrame[ 0 ] == GOLDEN_FRAME || LeftRefFrame[ 1 ] == GOLDEN_FRAME)
}
} else {
if ( AboveSingle && LeftSingle ) {
if ( AboveRefFrame[ 0 ] == LAST_FRAME && LeftRefFrame[ 0 ] == LAST_FRAME ) {
ctx= 3
} else if ( AboveRefFrame[ 0 ] == LAST_FRAME ) {
ctx= 4 * (LeftRefFrame[ 0 ] == GOLDEN_FRAME)
} else if ( LeftRefFrame[ 0 ] == LAST_FRAME ) {
ctx= 4 * (AboveRefFrame[ 0 ] == GOLDEN_FRAME)
} else {
ctx= 2 * (AboveRefFrame[0] == GOLDEN_FRAME) + 2 * (LeftRefFrame[0] == GOLDEN_FRAME)
}
} else if ( !AboveSingle && !LeftSingle ) {
if ( AboveRefFrame[ 0 ] == LeftRefFrame[ 0 ] && AboveRefFrame[ 1 ] == LeftRefFrame[ 1 ] )
ctx= 3 * (AboveRefFrame[ 0 ] == GOLDEN_FRAME || AboveRefFrame[ 1 ] == GOLDEN_FRAME)
else
ctx= 2
} else {
rfs = AboveSingle ? AboveRefFrame[ 0 ] : LeftRefFrame[ 0 ]
crf1 = AboveSingle ? LeftRefFrame[ 0 ] : AboveRefFrame[ 0 ]
crf2 = AboveSingle ? LeftRefFrame[ 1 ] : AboveRefFrame[ 1 ]
if ( rfs == GOLDEN_FRAME )
ctx= 3 + (crf1 == GOLDEN_FRAME || crf2 == GOLDEN_FRAME)
else if ( rfs == ALTREF_FRAME )
ctx= crf1 == GOLDEN_FRAME || crf2 == GOLDEN_FRAME
else
ctx= 1 + 2 * (crf1 == GOLDEN_FRAME || crf2 == GOLDEN_FRAME)
}
}
} else if ( AvailU ) {
if ( AboveIntra || (AboveRefFrame[ 0 ] == LAST_FRAME && AboveSingle) )
ctx= 2
else if ( AboveSingle )
ctx= 4 * (AboveRefFrame[ 0 ] == GOLDEN_FRAME)
else
ctx= 3 * (AboveRefFrame[ 0 ] == GOLDEN_FRAME || AboveRefFrame[ 1 ] == GOLDEN_FRAME)
} else if ( AvailL ) {
if ( LeftIntra || (LeftRefFrame[ 0 ] == LAST_FRAME && LeftSingle) )
ctx = 2
else if ( LeftSingle )
ctx = 4 * (LeftRefFrame[ 0 ] == GOLDEN_FRAME)
else
ctx = 3 * (LeftRefFrame[ 0 ] == GOLDEN_FRAME || LeftRefFrame[ 1 ] == GOLDEN_FRAME)
} else {
ctx = 2
}
if ( !AvailU )
above = left
ctx = (above+left) > maxTxSize
inter_mode: the probability is given by inter_mode_probs[ ctx ][ node ] where ctx is computed by:
ctx = ModeContext[ ref_frame[ 0 ] ]
interp_filter: the probability is given by interp_filter_probs[ ctx ][ node ] where ctx is computed by:
leftInterp = ( AvailL && LeftRefFrame[ 0 ] > INTRA_FRAME ) ?
InterpFilters[ MiRow ][ MiCol - 1] : 3
aboveInterp = ( AvailU && AboveRefFrame[ 0 ] > INTRA_FRAME ) ?
InterpFilters[ MiRow - 1 ][ MiCol ] : 3
if ( leftInterp == aboveInterp )
ctx = leftInterp
else if ( leftInterp == 3 && aboveInterp != 3 )
ctx = aboveInterp
else if ( leftInterp != 3 && aboveInterp == 3 )
ctx = leftInterp
else
ctx = 3
token: token uses the same derivation for the variable ctx as for the syntax element more_coefs. The
probability for token is given by pareto(node, coef_probs[txSz][plane>0][is_inter][band][ctx][Min(2,1+node)]),
where the function pareto is specified as:
pareto( node, prob ) {
if ( node < 2 ) {
return prob
}
x = (prob - 1) / 2
if ( prob & 1 )
return pareto_table[ x ][ node - 2 ]
else
return ( pareto_table[ x ][ node - 2 ] + pareto_table[ x + 1 ][ node - 2 ] ) >> 1
}
n = T[ n + read_bool( P( n >> 1 ) ) ]
} while ( n > 0 )
The function P( x ) is defined to invoke the probability selection process as specified in section 9.3.2 with
inputs of the name of the syntax element and with the input variable node set equal to x.
The output value is then given by -n.
9.3.4 Syntax element counting process
The inputs to this process are:
− the name of a syntax element,
− the value of the syntax element.
We also have access to any variables defined in the probability selection process specified in section 9.3.2
when computing the probabilities used to decode the syntax element.
The action for each syntax element is to either do nothing, or increase the count by 1 for some array elements.
There is one special case (for more_coefs) that is described at the end of this section.
The table below defines which elements are updated for each name. In this table, the variable syntax is
defined to be equal to the value of the syntax element. If the entry to update is “NA”, then no counting needs
to happen for the corresponding syntax element.
Name of syntax element Variable to increase by 1
partition counts_partition[ctx][syntax]
default_intra_mode NA
default_uv_mode NA
intra_mode counts_intra_mode[ctx][syntax]
sub_intra_mode counts_intra_mode[ctx][syntax]
uv_mode counts_uv_mode[ctx][syntax]
segment_id NA
Skip counts_skip[ctx][syntax]
seg_id_predicted NA
is_inter counts_is_inter[ctx][syntax]
comp_mode counts_comp_mode[ctx][syntax]
comp_ref counts_comp_ref[ctx][syntax]
single_ref_p1 counts_single_ref[ctx][0][syntax]
single_ref_p2 counts_single_ref[ctx][1][syntax]
mv_sign counts_mv_sign[comp][syntax]
mv_class0_bit counts_mv_class0_bit[comp][syntax]
mv_bit counts_mv_bits[comp][i][syntax]
tx_size counts_tx_size[maxTxSize][ctx][syntax]
inter_mode counts_inter_mode[ctx][syntax]
interp_filter counts_interp_filter[ctx][syntax]
mv_joint counts_mv_joint[syntax]
mv_class counts_mv_class[comp][syntax]
mv_class0_hp counts_mv_class0_hp[comp][syntax]
mv_fr counts_mv_fr[comp][syntax]
mv_hp counts_mv_hp[comp][syntax]
token counts_token[txSz][plane>0][is_inter][band][ctx][Min(2,syntax)]
more_coefs counts_more_coefs[txSz][plane>0][is_inter][band][ctx][syntax]
10 Additional tables
This section contains tables that do not naturally fit in the main sections of the Specification.
10.1 Scan tables
This section defines the scan order for different types of transform.
default_scan_4x4[ 16 ] = {
0, 4, 1, 5,
8, 2, 12, 9,
3, 6, 13, 10,
7, 14, 11, 15,
}
col_scan_4x4[ 16 ] = {
0, 4, 8, 1,
12, 5, 9, 2,
13, 6, 10, 3,
7, 14, 11, 15,
}
row_scan_4x4[ 16 ] = {
0, 1, 4, 2,
5, 3, 6, 8,
9, 7, 12, 10,
13, 11, 14, 15,
}
default_scan_8x8[ 64 ] = {
0, 8, 1, 16, 9, 2, 17, 24,
10, 3, 18, 25, 32, 11, 4, 26,
33, 19, 40, 12, 34, 27, 5, 41,
20, 48, 13, 35, 42, 28, 21, 6,
49, 56, 36, 43, 29, 7, 14, 50,
57, 44, 22, 37, 15, 51, 58, 30,
45, 23, 52, 59, 38, 31, 60, 53,
46, 39, 61, 54, 47, 62, 55, 63,
}
col_scan_8x8[ 64 ] = {
0, 8, 16, 1, 24, 9, 32, 17,
2, 40, 25, 10, 33, 18, 48, 3,
26, 41, 11, 56, 19, 34, 4, 49,
27, 42, 12, 35, 20, 57, 50, 28,
row_scan_8x8[ 64 ] = {
0, 1, 2, 8, 9, 3, 16, 10,
4, 17, 11, 24, 5, 18, 25, 12,
19, 26, 32, 6, 13, 20, 33, 27,
7, 34, 40, 21, 28, 41, 14, 35,
48, 42, 29, 36, 49, 22, 43, 15,
56, 37, 50, 44, 30, 57, 23, 51,
58, 45, 38, 52, 31, 59, 53, 46,
60, 39, 61, 47, 54, 55, 62, 63,
}
default_scan_16x16[ 256 ] = {
0, 16, 1, 32, 17, 2, 48, 33, 18, 3, 64, 34, 49, 19, 65, 80,
50, 4, 35, 66, 20, 81, 96, 51, 5, 36, 82, 97, 67, 112, 21, 52,
98, 37, 83, 113, 6, 68, 128, 53, 22, 99, 114, 84, 7, 129, 38, 69,
100, 115, 144, 130, 85, 54, 23, 8, 145, 39, 70, 116, 101, 131, 160, 146,
55, 86, 24, 71, 132, 117, 161, 40, 9, 102, 147, 176, 162, 87, 56, 25,
133, 118, 177, 148, 72, 103, 41, 163, 10, 192, 178, 88, 57, 134, 149, 119,
26, 164, 73, 104, 193, 42, 179, 208, 11, 135, 89, 165, 120, 150, 58, 194,
180, 27, 74, 209, 105, 151, 136, 43, 90, 224, 166, 195, 181, 121, 210, 59,
12, 152, 106, 167, 196, 75, 137, 225, 211, 240, 182, 122, 91, 28, 197, 13,
226, 168, 183, 153, 44, 212, 138, 107, 241, 60, 29, 123, 198, 184, 227, 169,
242, 76, 213, 154, 45, 92, 14, 199, 139, 61, 228, 214, 170, 185, 243, 108,
77, 155, 30, 15, 200, 229, 124, 215, 244, 93, 46, 186, 171, 201, 109, 140,
230, 62, 216, 245, 31, 125, 78, 156, 231, 47, 187, 202, 217, 94, 246, 141,
63, 232, 172, 110, 247, 157, 79, 218, 203, 126, 233, 188, 248, 95, 173, 142,
219, 111, 249, 234, 158, 127, 189, 204, 250, 235, 143, 174, 220, 205, 159, 251,
190, 221, 175, 236, 237, 191, 206, 252, 222, 253, 207, 238, 223, 254, 239, 255
}
col_scan_16x16[ 256 ] = {
0, 16, 32, 48, 1, 64, 17, 80, 33, 96, 49, 2, 65, 112, 18, 81,
34, 128, 50, 97, 3, 66, 144, 19, 113, 35, 82, 160, 98, 51, 129, 4,
67, 176, 20, 114, 145, 83, 36, 99, 130, 52, 192, 5, 161, 68, 115, 21,
146, 84, 208, 177, 37, 131, 100, 53, 162, 224, 69, 6, 116, 193, 147, 85,
22, 240, 132, 38, 178, 101, 163, 54, 209, 117, 70, 7, 148, 194, 86, 179,
225, 23, 133, 39, 164, 8, 102, 210, 241, 55, 195, 118, 149, 71, 180, 24,
87, 226, 134, 165, 211, 40, 103, 56, 72, 150, 196, 242, 119, 9, 181, 227,
88, 166, 25, 135, 41, 104, 212, 57, 151, 197, 120, 73, 243, 182, 136, 167,
213, 89, 10, 228, 105, 152, 198, 26, 42, 121, 183, 244, 168, 58, 137, 229,
74, 214, 90, 153, 199, 184, 11, 106, 245, 27, 122, 230, 169, 43, 215, 59,
200, 138, 185, 246, 75, 12, 91, 154, 216, 231, 107, 28, 44, 201, 123, 170,
60, 247, 232, 76, 139, 13, 92, 217, 186, 248, 155, 108, 29, 124, 45, 202,
233, 171, 61, 14, 77, 140, 15, 249, 93, 30, 187, 156, 218, 46, 109, 125,
62, 172, 78, 203, 31, 141, 234, 94, 47, 188, 63, 157, 110, 250, 219, 79,
126, 204, 173, 142, 95, 189, 111, 235, 158, 220, 251, 127, 174, 143, 205, 236,
159, 190, 221, 252, 175, 206, 237, 191, 253, 222, 238, 207, 254, 223, 239, 255,
}
row_scan_16x16[ 256 ] = {
0, 1, 2, 16, 3, 17, 4, 18, 32, 5, 33, 19, 6, 34, 48, 20,
49, 7, 35, 21, 50, 64, 8, 36, 65, 22, 51, 37, 80, 9, 66, 52,
23, 38, 81, 67, 10, 53, 24, 82, 68, 96, 39, 11, 54, 83, 97, 69,
25, 98, 84, 40, 112, 55, 12, 70, 99, 113, 85, 26, 41, 56, 114, 100,
13, 71, 128, 86, 27, 115, 101, 129, 42, 57, 72, 116, 14, 87, 130, 102,
144, 73, 131, 117, 28, 58, 15, 88, 43, 145, 103, 132, 146, 118, 74, 160,
89, 133, 104, 29, 59, 147, 119, 44, 161, 148, 90, 105, 134, 162, 120, 176,
75, 135, 149, 30, 60, 163, 177, 45, 121, 91, 106, 164, 178, 150, 192, 136,
165, 179, 31, 151, 193, 76, 122, 61, 137, 194, 107, 152, 180, 208, 46, 166,
167, 195, 92, 181, 138, 209, 123, 153, 224, 196, 77, 168, 210, 182, 240, 108,
197, 62, 154, 225, 183, 169, 211, 47, 139, 93, 184, 226, 212, 241, 198, 170,
124, 155, 199, 78, 213, 185, 109, 227, 200, 63, 228, 242, 140, 214, 171, 186,
156, 229, 243, 125, 94, 201, 244, 215, 216, 230, 141, 187, 202, 79, 172, 110,
157, 245, 217, 231, 95, 246, 232, 126, 203, 247, 233, 173, 218, 142, 111, 158,
188, 248, 127, 234, 219, 249, 189, 204, 143, 174, 159, 250, 235, 205, 220, 175,
190, 251, 221, 191, 206, 236, 207, 237, 252, 222, 253, 223, 238, 239, 254, 255,
}
default_scan_32x32[ 1024 ] = {
0, 32, 1, 64, 33, 2, 96, 65, 34, 128, 3, 97, 66, 160,
129, 35, 98, 4, 67, 130, 161, 192, 36, 99, 224, 5, 162, 193,
68, 131, 37, 100,
225, 194, 256, 163, 69, 132, 6, 226, 257, 288, 195, 101, 164, 38,
258, 7, 227, 289, 133, 320, 70, 196, 165, 290, 259, 228, 39, 321,
102, 352, 8, 197,
71, 134, 322, 291, 260, 353, 384, 229, 166, 103, 40, 354, 323, 292,
135, 385, 198, 261, 72, 9, 416, 167, 386, 355, 230, 324, 104, 293,
41, 417, 199, 136,
262, 387, 448, 325, 356, 10, 73, 418, 231, 168, 449, 294, 388, 105,
419, 263, 42, 200, 357, 450, 137, 480, 74, 326, 232, 11, 389, 169,
295, 420, 106, 451,
481, 358, 264, 327, 201, 43, 138, 512, 482, 390, 296, 233, 170, 421,
75, 452, 359, 12, 513, 265, 483, 328, 107, 202, 514, 544, 422, 391,
453, 139, 44, 234,
484, 297, 360, 171, 76, 515, 545, 266, 329, 454, 13, 423, 203, 108,
546, 485, 576, 298, 235, 140, 361, 330, 172, 547, 45, 455, 267, 577,
486, 77, 204, 362,
608, 14, 299, 578, 109, 236, 487, 609, 331, 141, 579, 46, 15, 173,
610, 363, 78, 205, 16, 110, 237, 611, 142, 47, 174, 79, 206, 17,
111, 238, 48, 143,
80, 175, 112, 207, 49, 18, 239, 81, 113, 19, 50, 82, 114, 51,
83, 115, 640, 516, 392, 268, 144, 20, 672, 641, 548, 517, 424,
393, 300, 269, 176, 145,
52, 21, 704, 673, 642, 580, 549, 518, 456, 425, 394, 332, 301,
270, 208, 177, 146, 84, 53, 22, 736, 705, 674, 643, 612, 581,
550, 519, 488, 457, 426, 395,
364, 333, 302, 271, 240, 209, 178, 147, 116, 85, 54, 23, 737,
706, 675, 613, 582, 551, 489, 458, 427, 365, 334, 303, 241,
210, 179, 117, 86, 55, 738, 707,
614, 583, 490, 459, 366, 335, 242, 211, 118, 87, 739, 615, 491,
367, 243, 119, 768, 644, 520, 396, 272, 148, 24, 800, 769, 676,
645, 552, 521, 428, 397, 304,
273, 180, 149, 56, 25, 832, 801, 770, 708, 677, 646, 584, 553,
522, 460, 429, 398, 336, 305, 274, 212, 181, 150, 88, 57, 26,
864, 833, 802, 771, 740, 709,
678, 647, 616, 585, 554, 523, 492, 461, 430, 399, 368, 337, 306,
275, 244, 213, 182, 151, 120, 89, 58, 27, 865, 834, 803, 741,
710, 679, 617, 586, 555, 493,
462, 431, 369, 338, 307, 245, 214, 183, 121, 90, 59, 866, 835,
742, 711, 618, 587, 494, 463, 370, 339, 246, 215, 122, 91, 867,
743, 619, 495, 371, 247, 123,
896, 772, 648, 524, 400, 276, 152, 28, 928, 897, 804, 773, 680,
649, 556, 525, 432, 401, 308, 277, 184, 153, 60, 29, 960, 929,
898, 836, 805, 774, 712, 681,
650, 588, 557, 526, 464, 433, 402, 340, 309, 278, 216, 185, 154,
92, 61, 30, 992, 961, 930, 899, 868, 837, 806, 775, 744, 713, 682,
651, 620, 589, 558, 527,
496, 465, 434, 403, 372, 341, 310, 279, 248, 217, 186, 155, 124,
93, 62, 31, 993, 962, 931, 869, 838, 807, 745, 714, 683, 621, 590,
559, 497, 466, 435, 373,
342, 311, 249, 218, 187, 125, 94, 63, 994, 963, 870, 839, 746, 715,
622, 591, 498, 467, 374, 343, 250, 219, 126, 95, 995, 871, 747, 623,
499, 375, 251, 127,
900, 776, 652, 528, 404, 280, 156, 932, 901, 808, 777, 684, 653, 560,
529, 436, 405, 312, 281, 188, 157, 964, 933, 902, 840, 809, 778, 716,
685, 654, 592, 561,
530, 468, 437, 406, 344, 313, 282, 220, 189, 158, 996, 965, 934, 903,
872, 841, 810, 779, 748, 717, 686, 655, 624, 593, 562, 531, 500, 469,
438, 407, 376, 345,
314, 283, 252, 221, 190, 159, 997, 966, 935, 873, 842, 811, 749, 718,
687, 625, 594, 563, 501, 470, 439, 377, 346, 315, 253, 222, 191, 998,
967, 874, 843, 750,
719, 626, 595, 502, 471, 378, 347, 254, 223, 999, 875, 751, 627, 503,
379, 255, 904, 780, 656, 532, 408, 284, 936, 905, 812, 781, 688, 657,
564, 533, 440, 409,
316, 285, 968, 937, 906, 844, 813, 782, 720, 689, 658, 596, 565, 534,
472, 441, 410, 348, 317, 286, 1000, 969, 938, 907, 876, 845, 814, 783,
752, 721, 690, 659,
628, 597, 566, 535, 504, 473, 442, 411, 380, 349, 318, 287, 1001, 970,
939, 877, 846, 815, 753, 722, 691, 629, 598, 567, 505, 474, 443, 381,
350, 319, 1002, 971,
878, 847, 754, 723, 630, 599, 506, 475, 382, 351, 1003, 879, 755, 631,
507, 383, 908, 784, 660, 536, 412, 940, 909, 816, 785, 692, 661, 568,
537, 444, 413, 972,
941, 910, 848, 817, 786, 724, 693, 662, 600, 569, 538, 476, 445, 414,
1004, 973, 942, 911, 880, 849, 818, 787, 756, 725, 694, 663, 632, 601,
570, 539, 508, 477,
446, 415, 1005, 974, 943, 881, 850, 819, 757, 726, 695, 633, 602, 571,
509, 478, 447, 1006, 975, 882, 851, 758, 727, 634, 603, 510, 479,
1007, 883, 759, 635, 511,
912, 788, 664, 540, 944, 913, 820, 789, 696, 665, 572, 541, 976, 945,
914, 852, 821, 790, 728, 697, 666, 604, 573, 542, 1008, 977, 946, 915,
884, 853, 822, 791,
760, 729, 698, 667, 636, 605, 574, 543, 1009, 978, 947, 885, 854, 823,
761, 730, 699, 637, 606, 575, 1010, 979, 886, 855, 762, 731, 638, 607,
1011, 887, 763, 639,
916, 792, 668, 948, 917, 824, 793, 700, 669, 980, 949, 918, 856, 825,
794, 732, 701, 670, 1012, 981, 950, 919, 888, 857, 826, 795, 764, 733,
702, 671, 1013, 982,
951, 889, 858, 827, 765, 734, 703, 1014, 983, 890, 859, 766, 735, 1015,
891, 767, 920, 796, 952, 921, 828, 797, 984, 953, 922, 860, 829, 798,
1016, 985, 954, 923,
892, 861, 830, 799, 1017, 986, 955, 893, 862, 831, 1018, 987, 894, 863,
1019, 895, 924, 956, 925, 988, 957, 926, 1020, 989, 958, 927, 1021,
990, 959, 1022, 991, 1023,
}
tx_mode_to_biggest_tx_size[ TX_MODES ] = {
TX_4X4,
TX_8X8,
TX_16X16,
TX_32X32,
TX_32X32
}
coefband_4x4[ 16 ] = {0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5}
coefband_8x8plus[ 1024 ] = {
0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5
}
energy_class[ 12 ] = {0, 1, 2, 3, 3, 4, 4, 5, 5, 5, 5, 5}
mode2txfm_map[ MB_MODE_COUNT ] = {
DCT_DCT, // DC
ADST_DCT, // V
DCT_ADST, // H
DCT_DCT, // D45
ADST_ADST, // D135
ADST_DCT, // D117
DCT_ADST, // D153
DCT_ADST, // D207
ADST_DCT, // D63
ADST_ADST, // TM
DCT_DCT, // NEARESTMV
DCT_DCT, // NEARMV
DCT_DCT, // ZEROMV
DCT_DCT // NEWMV
}
{ 72, 35, 36, 149, 68, 206, 68, 63, 105 }, // left = d135
{ 73, 31, 28, 138, 57, 124, 55, 122, 151 }, // left = d117
{ 67, 23, 21, 140, 126, 197, 40, 37, 171 }, // left = d153
{ 86, 27, 28, 128, 154, 212, 45, 43, 53 }, // left = d207
{ 74, 32, 27, 107, 86, 160, 63, 134, 102 }, // left = d63
{ 59, 67, 44, 140, 161, 202, 78, 67, 119 } // left = tm
}, { // above = v
{ 63, 36, 126, 146, 123, 158, 60, 90, 96 }, // left = dc
{ 43, 46, 168, 134, 107, 128, 69, 142, 92 }, // left = v
{ 44, 29, 68, 159, 201, 177, 50, 57, 77 }, // left = h
{ 58, 38, 76, 114, 97, 172, 78, 133, 92 }, // left = d45
{ 46, 41, 76, 140, 63, 184, 69, 112, 57 }, // left = d135
{ 38, 32, 85, 140, 46, 112, 54, 151, 133 }, // left = d117
{ 39, 27, 61, 131, 110, 175, 44, 75, 136 }, // left = d153
{ 52, 30, 74, 113, 130, 175, 51, 64, 58 }, // left = d207
{ 47, 35, 80, 100, 74, 143, 64, 163, 74 }, // left = d63
{ 36, 61, 116, 114, 128, 162, 80, 125, 82 } // left = tm
}, { // above = h
{ 82, 26, 26, 171, 208, 204, 44, 32, 105 }, // left = dc
{ 55, 44, 68, 166, 179, 192, 57, 57, 108 }, // left = v
{ 42, 26, 11, 199, 241, 228, 23, 15, 85 }, // left = h
{ 68, 42, 19, 131, 160, 199, 55, 52, 83 }, // left = d45
{ 58, 50, 25, 139, 115, 232, 39, 52, 118 }, // left = d135
{ 50, 35, 33, 153, 104, 162, 64, 59, 131 }, // left = d117
{ 44, 24, 16, 150, 177, 202, 33, 19, 156 }, // left = d153
{ 55, 27, 12, 153, 203, 218, 26, 27, 49 }, // left = d207
{ 53, 49, 21, 110, 116, 168, 59, 80, 76 }, // left = d63
{ 38, 72, 19, 168, 203, 212, 50, 50, 107 } // left = tm
}, { // above = d45
{ 103, 26, 36, 129, 132, 201, 83, 80, 93 }, // left = dc
{ 59, 38, 83, 112, 103, 162, 98, 136, 90 }, // left = v
{ 62, 30, 23, 158, 200, 207, 59, 57, 50 }, // left = h
{ 67, 30, 29, 84, 86, 191, 102, 91, 59 }, // left = d45
{ 60, 32, 33, 112, 71, 220, 64, 89, 104 }, // left = d135
{ 53, 26, 34, 130, 56, 149, 84, 120, 103 }, // left = d117
{ 53, 21, 23, 133, 109, 210, 56, 77, 172 }, // left = d153
{ 77, 19, 29, 112, 142, 228, 55, 66, 36 }, // left = d207
{ 61, 29, 29, 93, 97, 165, 83, 175, 162 }, // left = d63
{ 47, 47, 43, 114, 137, 181, 100, 99, 95 } // left = tm
}, { // above = d135
{ 69, 23, 29, 128, 83, 199, 46, 44, 101 }, // left = dc
{ 53, 40, 55, 139, 69, 183, 61, 80, 110 }, // left = v
{ 40, 29, 19, 161, 180, 207, 43, 24, 91 }, // left = h
{ 60, 34, 19, 105, 61, 198, 53, 64, 89 }, // left = d45
{ 52, 31, 22, 158, 40, 209, 58, 62, 89 }, // left = d135
{ 44, 31, 29, 147, 46, 158, 56, 102, 198 }, // left = d117
{ 35, 19, 12, 135, 87, 209, 41, 45, 167 }, // left = d153
{ 55, 25, 21, 118, 95, 215, 38, 39, 66 }, // left = d207
{ 51, 38, 25, 113, 58, 164, 70, 93, 97 }, // left = d63
{ 47, 54, 34, 146, 108, 203, 72, 103, 151 } // left = tm
}, { // above = d117
{ 64, 19, 37, 156, 66, 138, 49, 95, 133 }, // left = dc
{ 46, 27, 80, 150, 55, 124, 55, 121, 135 }, // left = v
{ 36, 23, 27, 165, 149, 166, 54, 64, 118 }, // left = h
{ 53, 21, 36, 131, 63, 163, 60, 109, 81 }, // left = d45
{ 40, 26, 35, 154, 40, 185, 51, 97, 123 }, // left = d135
{ 35, 19, 34, 179, 19, 97, 48, 129, 124 }, // left = d117
{ 36, 20, 26, 136, 62, 164, 33, 77, 154 }, // left = d153
{ 45, 18, 32, 130, 90, 157, 40, 79, 91 }, // left = d207
{ 45, 26, 28, 129, 45, 129, 49, 147, 123 }, // left = d63
{ 38, 44, 51, 136, 74, 162, 57, 97, 121 } // left = tm
}, { // above = d153
{ 75, 17, 22, 136, 138, 185, 32, 34, 166 }, // left = dc
{ 56, 39, 58, 133, 117, 173, 48, 53, 187 }, // left = v
{ 35, 21, 12, 161, 212, 207, 20, 23, 145 }, // left = h
{ 56, 29, 19, 117, 109, 181, 55, 68, 112 }, // left = d45
{ 47, 29, 17, 153, 64, 220, 59, 51, 114 }, // left = d135
{ 46, 16, 24, 136, 76, 147, 41, 64, 172 }, // left = d117
{ 34, 17, 11, 108, 152, 187, 13, 15, 209 }, // left = d153
{ 51, 24, 14, 115, 133, 209, 32, 26, 104 }, // left = d207
{ 55, 30, 18, 122, 79, 179, 44, 88, 116 }, // left = d63
{ 37, 49, 25, 129, 168, 164, 41, 54, 148 } // left = tm
}, { // above = d207
{ 82, 22, 32, 127, 143, 213, 39, 41, 70 }, // left = dc
{ 62, 44, 61, 123, 105, 189, 48, 57, 64 }, // left = v
{ 47, 25, 17, 175, 222, 220, 24, 30, 86 }, // left = h
{ 68, 36, 17, 106, 102, 206, 59, 74, 74 }, // left = d45
{ 57, 39, 23, 151, 68, 216, 55, 63, 58 }, // left = d135
{ 49, 30, 35, 141, 70, 168, 82, 40, 115 }, // left = d117
{ 51, 25, 15, 136, 129, 202, 38, 35, 139 }, // left = d153
{ 68, 26, 16, 111, 141, 215, 29, 28, 28 }, // left = d207
{ 59, 39, 19, 114, 75, 180, 77, 104, 42 }, // left = d63
{ 40, 61, 26, 126, 152, 206, 61, 59, 93 } // left = tm
}, { // above = d63
{ 78, 23, 39, 111, 117, 170, 74, 124, 94 }, // left = dc
{ 48, 34, 86, 101, 92, 146, 78, 179, 134 }, // left = v
{ 47, 22, 24, 138, 187, 178, 68, 69, 59 }, // left = h
{ 56, 25, 33, 105, 112, 187, 95, 177, 129 }, // left = d45
{ 48, 31, 27, 114, 63, 183, 82, 116, 56 }, // left = d135
{ 43, 28, 37, 121, 63, 123, 61, 192, 169 }, // left = d117
{ 42, 17, 24, 109, 97, 177, 56, 76, 122 }, // left = d153
{ 58, 18, 28, 105, 139, 182, 70, 92, 63 }, // left = d207
{ 46, 23, 32, 74, 86, 150, 67, 183, 88 }, // left = d63
{ 36, 38, 48, 92, 122, 165, 88, 137, 91 } // left = tm
}, { // above = tm
{ 65, 70, 60, 155, 159, 199, 61, 60, 81 }, // left = dc
{ 44, 78, 115, 132, 119, 173, 71, 112, 93 }, // left = v
{ 39, 38, 21, 184, 227, 206, 42, 32, 64 }, // left = h
{ 58, 47, 36, 124, 137, 193, 80, 82, 78 }, // left = d45
{ 49, 50, 35, 144, 95, 205, 63, 78, 59 }, // left = d135
{ 41, 53, 52, 148, 71, 142, 65, 128, 51 }, // left = d117
{ 40, 36, 28, 143, 143, 202, 40, 55, 137 }, // left = d153
{ 52, 34, 29, 129, 183, 227, 42, 35, 43 }, // left = d207
{ 42, 44, 44, 104, 105, 164, 64, 130, 80 }, // left = d63
{ 43, 81, 53, 140, 169, 204, 68, 84, 72 } // left = tm
}
}
default_skip_prob[ SKIP_CONTEXTS ] = {
192, 128, 64
}
default_is_inter_prob[IS_INTER_CONTEXTS] = {
9, 102, 187, 225
}
default_comp_mode_prob[ COMP_MODE_CONTEXTS ] = {
239, 183, 119, 96, 41
}
default_comp_ref_prob[ REF_CONTEXTS ] = {
50, 126, 123, 221, 226
}
default_single_ref_prob[ REF_CONTEXTS ][ 2 ] = {
{ 33, 16 },
{ 77, 74 },
{ 142, 142 },
{ 172, 170 },
{ 238, 247 }
}
default_mv_sign_prob[ 2 ] = {
128, 128
}
default_mv_bits_prob[ 2 ][ MV_OFFSET_BITS ] = {
{136, 140, 148, 160, 176, 192, 224, 234, 234, 240},
{136, 140, 148, 160, 176, 192, 224, 234, 234, 240}
}
default_mv_class0_bit_prob[ 2 ] = {
216, 208
}
{ 15, 101, 0 }
},
{
{ 3, 136, 37 },
{ 5, 52, 13 }
}
}
default_interp_filter_probs[ INTERP_FILTER_CONTEXTS ]
[ SWITCHABLE_FILTERS - 1 ] = {
{ 235, 162 },
{ 36, 255 },
{ 34, 3 },
{ 149, 144 }
}
default_mv_joint_probs[ 3 ] = {
32, 64, 96
}
default_mv_class_probs[ 2 ][ MV_CLASSES - 1 ] = {
{224, 144, 192, 168, 192, 176, 192, 198, 198, 245},
{216, 128, 176, 160, 176, 176, 192, 198, 198, 208}
}
default_mv_class0_fr_probs[ 2 ][ CLASS0_SIZE ][ 3 ] = {
{{128, 128, 64}, {96, 112, 64}},
{{128, 128, 64}, {96, 112, 64}}
}
default_mv_class0_hp_prob[ 2 ] = {
160, 160
default_mv_fr_probs[ 2 ][ 3 ] = {
{64, 96, 64},
{64, 96, 64}
}
default_mv_hp_prob[ 2 ] = {
128, 128
}
}, { /* Coeff Band 4 */
{ 102, 148, 228 },
{ 67, 117, 204 },
{ 17, 82, 154 },
{ 6, 59, 114 },
{ 2, 39, 75 },
{ 1, 15, 29 }
}, { /* Coeff Band 5 */
{ 156, 57, 233 },
{ 119, 57, 212 },
{ 58, 48, 163 },
{ 29, 40, 124 },
{ 12, 30, 81 },
{ 3, 12, 31 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 191, 107, 226 },
{ 124, 117, 204 },
{ 25, 99, 155 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 29, 148, 210 },
{ 37, 126, 194 },
{ 8, 93, 157 },
{ 2, 68, 118 },
{ 1, 39, 69 },
{ 1, 17, 33 }
}, { /* Coeff Band 2 */
{ 41, 151, 213 },
{ 27, 123, 193 },
{ 3, 82, 144 },
{ 1, 58, 105 },
{ 1, 32, 60 },
{ 1, 13, 26 }
}, { /* Coeff Band 3 */
{ 59, 159, 220 },
{ 23, 126, 198 },
{ 4, 88, 151 },
{ 1, 66, 114 },
{ 1, 38, 71 },
{ 1, 18, 34 }
}, { /* Coeff Band 4 */
}, { /* Coeff Band 4 */
{ 124, 146, 240 },
{ 66, 124, 224 },
{ 17, 88, 175 },
{ 4, 58, 122 },
{ 1, 36, 75 },
{ 1, 18, 37 }
}, { /* Coeff Band 5 */
{ 141, 79, 241 },
{ 126, 70, 227 },
{ 66, 58, 182 },
{ 30, 44, 136 },
{ 12, 34, 96 },
{ 2, 20, 47 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 229, 99, 249 },
{ 143, 111, 235 },
{ 46, 109, 192 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 82, 158, 236 },
{ 94, 146, 224 },
{ 25, 117, 191 },
{ 9, 87, 149 },
{ 3, 56, 99 },
{ 1, 33, 57 }
}, { /* Coeff Band 2 */
{ 83, 167, 237 },
{ 68, 145, 222 },
{ 10, 103, 177 },
{ 2, 72, 131 },
{ 1, 41, 79 },
{ 1, 20, 39 }
}, { /* Coeff Band 3 */
{ 99, 167, 239 },
{ 47, 141, 224 },
{ 10, 104, 178 },
{ 2, 73, 133 },
{ 1, 44, 85 },
{ 1, 22, 47 }
}, { /* Coeff Band 4 */
{ 1, 67, 111 },
{ 1, 41, 69 },
{ 1, 14, 24 }
}, { /* Coeff Band 4 */
{ 29, 176, 217 },
{ 12, 145, 201 },
{ 3, 101, 156 },
{ 1, 69, 111 },
{ 1, 39, 63 },
{ 1, 14, 23 }
}, { /* Coeff Band 5 */
{ 57, 192, 233 },
{ 25, 154, 215 },
{ 6, 109, 167 },
{ 3, 78, 118 },
{ 1, 48, 69 },
{ 1, 21, 29 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 202, 105, 245 },
{ 108, 106, 216 },
{ 18, 90, 144 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 33, 172, 219 },
{ 64, 149, 206 },
{ 14, 117, 177 },
{ 5, 90, 141 },
{ 2, 61, 95 },
{ 1, 37, 57 }
}, { /* Coeff Band 2 */
{ 33, 179, 220 },
{ 11, 140, 198 },
{ 1, 89, 148 },
{ 1, 60, 104 },
{ 1, 33, 57 },
{ 1, 12, 21 }
}, { /* Coeff Band 3 */
{ 30, 181, 221 },
{ 8, 141, 198 },
{ 1, 87, 145 },
{ 1, 58, 100 },
{ 1, 31, 55 },
{ 1, 12, 20 }
}, { /* Coeff Band 4 */
{ 32, 186, 224 },
{ 7, 142, 198 },
{ 1, 86, 143 },
{ 1, 58, 100 },
{ 1, 31, 55 },
{ 1, 12, 22 }
}, { /* Coeff Band 5 */
{ 57, 192, 227 },
{ 20, 143, 204 },
{ 3, 96, 154 },
{ 1, 68, 112 },
{ 1, 42, 69 },
{ 1, 19, 32 }
}
}
}, { /* block Type 1 */
{ /* Intra */
{ /* Coeff Band 0 */
{ 212, 35, 215 },
{ 113, 47, 169 },
{ 29, 48, 105 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 74, 129, 203 },
{ 106, 120, 203 },
{ 49, 107, 178 },
{ 19, 84, 144 },
{ 4, 50, 84 },
{ 1, 15, 25 }
}, { /* Coeff Band 2 */
{ 71, 172, 217 },
{ 44, 141, 209 },
{ 15, 102, 173 },
{ 6, 76, 133 },
{ 2, 51, 89 },
{ 1, 24, 42 }
}, { /* Coeff Band 3 */
{ 64, 185, 231 },
{ 31, 148, 216 },
{ 8, 103, 175 },
{ 3, 74, 131 },
{ 1, 46, 81 },
{ 1, 18, 30 }
}, { /* Coeff Band 4 */
{ 65, 196, 235 },
{ 25, 157, 221 },
{ 5, 105, 174 },
{ 1, 67, 120 },
{ 1, 38, 69 },
{ 1, 15, 30 }
}, { /* Coeff Band 5 */
{ 65, 204, 238 },
{ 30, 156, 224 },
{ 7, 107, 177 },
{ 2, 70, 124 },
{ 1, 42, 73 },
{ 1, 18, 34 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 225, 86, 251 },
{ 144, 104, 235 },
{ 42, 99, 181 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 85, 175, 239 },
{ 112, 165, 229 },
{ 29, 136, 200 },
{ 12, 103, 162 },
{ 6, 77, 123 },
{ 2, 53, 84 }
}, { /* Coeff Band 2 */
{ 75, 183, 239 },
{ 30, 155, 221 },
{ 3, 106, 171 },
{ 1, 74, 128 },
{ 1, 44, 76 },
{ 1, 17, 28 }
}, { /* Coeff Band 3 */
{ 73, 185, 240 },
{ 27, 159, 222 },
{ 2, 107, 172 },
{ 1, 75, 127 },
{ 1, 42, 73 },
{ 1, 17, 29 }
}, { /* Coeff Band 4 */
{ 62, 190, 238 },
{ 21, 159, 222 },
{ 2, 107, 172 },
{ 1, 72, 122 },
{ 1, 40, 71 },
{ 1, 18, 32 }
}, { /* Coeff Band 5 */
{ 61, 199, 240 },
{ 27, 161, 226 },
{ 4, 113, 180 },
{ 1, 76, 129 },
{ 1, 46, 80 },
{ 1, 23, 41 }
}
}
}
},
{
{ /* block Type 0 */
{ /* Intra */
{ /* Coeff Band 0 */
{ 7, 27, 153 },
{ 5, 30, 95 },
{ 1, 16, 30 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 50, 75, 127 },
{ 57, 75, 124 },
{ 27, 67, 108 },
{ 10, 54, 86 },
{ 1, 33, 52 },
{ 1, 12, 18 }
}, { /* Coeff Band 2 */
{ 43, 125, 151 },
{ 26, 108, 148 },
{ 7, 83, 122 },
{ 2, 59, 89 },
{ 1, 38, 60 },
{ 1, 17, 27 }
}, { /* Coeff Band 3 */
{ 8, 134, 191 },
{ 1, 86, 142 },
{ 1, 56, 96 },
{ 1, 30, 53 },
{ 1, 12, 20 }
}, { /* Coeff Band 4 */
{ 20, 190, 215 },
{ 4, 135, 192 },
{ 1, 84, 139 },
{ 1, 53, 91 },
{ 1, 28, 49 },
{ 1, 11, 20 }
}, { /* Coeff Band 5 */
{ 13, 196, 216 },
{ 2, 137, 192 },
{ 1, 86, 143 },
{ 1, 57, 99 },
{ 1, 32, 56 },
{ 1, 13, 24 }
}
}
}, { /* block Type 1 */
{ /* Intra */
{ /* Coeff Band 0 */
{ 211, 29, 217 },
{ 96, 47, 156 },
{ 22, 43, 87 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 78, 120, 193 },
{ 111, 116, 186 },
{ 46, 102, 164 },
{ 15, 80, 128 },
{ 2, 49, 76 },
{ 1, 18, 28 }
}, { /* Coeff Band 2 */
{ 71, 161, 203 },
{ 42, 132, 192 },
{ 10, 98, 150 },
{ 3, 69, 109 },
{ 1, 44, 70 },
{ 1, 18, 29 }
}, { /* Coeff Band 3 */
{ 1, 35, 54 },
{ 1, 13, 20 }
}, { /* Coeff Band 3 */
{ 40, 142, 167 },
{ 17, 110, 157 },
{ 2, 71, 112 },
{ 1, 44, 72 },
{ 1, 27, 45 },
{ 1, 11, 17 }
}, { /* Coeff Band 4 */
{ 30, 175, 188 },
{ 9, 124, 169 },
{ 1, 74, 116 },
{ 1, 48, 78 },
{ 1, 30, 49 },
{ 1, 11, 18 }
}, { /* Coeff Band 5 */
{ 10, 222, 223 },
{ 2, 150, 194 },
{ 1, 83, 128 },
{ 1, 48, 79 },
{ 1, 27, 45 },
{ 1, 11, 17 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 36, 41, 235 },
{ 29, 36, 193 },
{ 10, 27, 111 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 85, 165, 222 },
{ 177, 162, 215 },
{ 110, 135, 195 },
{ 57, 113, 168 },
{ 23, 83, 120 },
{ 10, 49, 61 }
}, { /* Coeff Band 2 */
{ 85, 190, 223 },
{ 36, 139, 200 },
{ 5, 90, 146 },
{ 1, 60, 103 },
{ 1, 38, 65 },
{ 1, 18, 30 }
}, { /* Coeff Band 3 */
{ 72, 202, 223 },
{ 23, 141, 199 },
{ 2, 86, 140 },
{ 1, 56, 97 },
{ 1, 36, 61 },
{ 1, 16, 27 }
}, { /* Coeff Band 4 */
{ 55, 218, 225 },
{ 13, 145, 200 },
{ 1, 86, 141 },
{ 1, 57, 99 },
{ 1, 35, 61 },
{ 1, 13, 22 }
}, { /* Coeff Band 5 */
{ 15, 235, 212 },
{ 1, 132, 184 },
{ 1, 84, 139 },
{ 1, 57, 97 },
{ 1, 34, 56 },
{ 1, 14, 23 }
}
}
}, { /* block Type 1 */
{ /* Intra */
{ /* Coeff Band 0 */
{ 181, 21, 201 },
{ 61, 37, 123 },
{ 10, 38, 71 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 47, 106, 172 },
{ 95, 104, 173 },
{ 42, 93, 159 },
{ 18, 77, 131 },
{ 4, 50, 81 },
{ 1, 17, 23 }
}, { /* Coeff Band 2 */
{ 62, 147, 199 },
{ 44, 130, 189 },
{ 28, 102, 154 },
{ 18, 75, 115 },
{ 2, 44, 65 },
{ 1, 12, 19 }
}, { /* Coeff Band 3 */
{ 55, 153, 210 },
{ 24, 130, 194 },
{ 3, 93, 146 },
{ 1, 61, 97 },
{ 1, 31, 50 },
{ 1, 10, 16 }
}, { /* Coeff Band 4 */
{ 49, 186, 223 },
{ 17, 148, 204 },
{ 1, 96, 142 },
{ 1, 53, 83 },
{ 1, 26, 44 },
{ 1, 11, 17 }
}, { /* Coeff Band 5 */
{ 13, 217, 212 },
{ 2, 136, 180 },
{ 1, 78, 124 },
{ 1, 50, 83 },
{ 1, 29, 49 },
{ 1, 14, 23 }
}
}, { /* Inter */
{ /* Coeff Band 0 */
{ 197, 13, 247 },
{ 82, 17, 222 },
{ 25, 17, 162 },
{ 0, 0, 0 }, // unused
{ 0, 0, 0 }, // unused
{ 0, 0, 0 } // unused
}, { /* Coeff Band 1 */
{ 126, 186, 247 },
{ 234, 191, 243 },
{ 176, 177, 234 },
{ 104, 158, 220 },
{ 66, 128, 186 },
{ 55, 90, 137 }
}, { /* Coeff Band 2 */
{ 111, 197, 242 },
{ 46, 158, 219 },
{ 9, 104, 171 },
{ 2, 65, 125 },
{ 1, 44, 80 },
{ 1, 17, 91 }
}, { /* Coeff Band 3 */
{ 104, 208, 245 },
{ 39, 168, 224 },
{ 3, 109, 162 },
{ 1, 79, 124 },
{ 1, 50, 102 },
{ 1, 43, 102 }
}, { /* Coeff Band 4 */
{ 84, 220, 246 },
{ 31, 177, 231 },
{ 2, 115, 180 },
{ 1, 79, 134 },
{ 1, 55, 77 },
{ 1, 60, 79 }
}, { /* Coeff Band 5 */
{ 43, 243, 240 },
{ 8, 180, 217 },
{ 1, 115, 166 },
{ 1, 84, 121 },
{ 1, 51, 67 },
{ 1, 16, 6 }
}
}
}
}
}
Annex A
Levels
A.1 Overview
Levels specify restrictions on the capabilities needed to decode the bitstreams.
There is only a single level currently defined in this Specification. The level is called level 1.
Level 1 limits the bitstream to a worst case of roughly a 1080P 60fps 20Mbps stream by the following
restrictions:
− Width must be less than or equal to 1920.
− Height must be less than or equal to 1080.
− The number of frames per second must be less than or equal to 60 (this includes both output frames and
frames that are not output).
− The size in bits for an intra frame must be less than or equal to 100,000,000 / 60.
− The size in bits for an inter frame must be less than or equal to 50,000,000 / 60.
− The number of times the function read_bool is called for an intra frame must be less than or equal to 2 *
100,000,000 / 60.
− The number of times the function read_bool is called for an inter frame must be less than or equal to 2 *
50,000,000 / 60.
Annex B
Superframes
B.1 Overview
VP9 supports consolidating multiple coded frames into one single chunk, called a “superframe”.
The superframe index is stored in the last up to 34 bytes of a chunk. The enclosed frames can be located by
parsing this superframe index.
The syntax of the superframe is shown in section B.2, and the meaning of the syntax elements is described in
section B.3. The method for parsing a superframe is described in section B.4.
Bibliography
[1] Recommendation ITU-R BT.601-7 (2011), Studio encoding parameters of digital television for
standard 4:3 and wide screen 16:9 aspect ratios.
[2] Recommendation ITU-R BT.709-6 (2015), Parameter values for the HDTV standards for production
and international programme exchange.
[3] SMPTE ST 170 (2004), Television – Composite Analog Video Signal – NTSC for Studio Applications.
[4] SMPTE ST 240 (1999), For Television – 1125-Line High-Definition Production Systems – Signal
Parameters.
[5] Recommendation ITU-R BT.2020-2 (2015), Parameter values for ultra-high definition television
systems for production and international programme exchange.
[6] IEC 61966-2-1 (1999), Multimedia systems and equipment – Colour measurement and management
– Part 2-1: Colour management – Default RGB colour space – sRGB.