0% found this document useful (0 votes)
21 views11 pages

VP9

Uploaded by

sayssandeep5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

VP9

Uploaded by

sayssandeep5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Here is an extended summary of the VP9 Bitstream & Decoding Process Specification v0.

6 (March
31, 2016), broken down into more detail for each key section:

1. Introduction and Scope (Pages 1-2)

The VP9 codec, developed by Google, is designed for efficient video compression and decoding,
intended to reduce bandwidth requirements for storing and transmitting high-resolution video (e.g.,
1920x1080 resolution at 30 frames per second). The specification defines the VP9 bitstream format
and the decoding process. While it fully describes the decoding process (which must be standardized
for playback compatibility), the encoding process is not explicitly defined, allowing flexibility in how
different VP9 encoders might choose to compress video data.

VP9 is widely used for online streaming, including platforms like YouTube, where bandwidth
efficiency and video quality are paramount.

2. Terms and Definitions (Pages 2-5)

The document provides a comprehensive list of technical definitions that are used throughout the
specification, which are important for understanding how VP9 operates. Key terms include:

 Bitstream: The compressed sequence of bits that represent video frames.

 Transform coefficients: Numerical values that represent frequency data for compressed pixel
blocks. They are divided into AC coefficients (non-zero frequencies) and DC coefficients (zero
frequency, average brightness).

 Motion vectors: 2D vectors used to predict the movement of blocks between frames.

 Intra-frame: A self-contained frame that does not rely on data from other frames.

 Inter-frame: A frame compressed by referencing previously decoded frames, usually smaller


in size.

 Quantization: A lossy compression process where data is reduced by rounding, trading off
precision for smaller data sizes.

Understanding these terms helps grasp the functioning of the VP9 codec, where frames are divided
into blocks and compressed using various transform, prediction, and encoding techniques.

3. Bitstream Structure (Pages 6-9)

The bitstream consists of a structured sequence of bits that encodes video data. The bitstream is
divided into several key elements:

1. Frames: Each frame in the bitstream can either be a keyframe (independent) or an inter-
frame (dependent on previous frames).

2. Headers: Each frame contains an uncompressed header and a compressed header that
store metadata (e.g., frame size, color space, quantization settings).
3. Prediction Blocks: Each frame is partitioned into blocks or superblocks (64x64 pixels).
Smaller blocks or sub-blocks are used for areas with more detail or motion.

VP9 allows for extensive customization in how frames and blocks are represented, with varying block
sizes, prediction modes, and quantization factors to balance compression efficiency and quality.

4. Decoding Process Overview (Pages 12-26)

4.1 Purpose of VP9

The goal of VP9 is to provide an efficient way to store and transmit video by significantly reducing the
bandwidth required. A raw uncompressed video (e.g., 1920x1080 resolution, 30 frames per second)
could require over 700 million bits per second. VP9 compresses this by exploiting redundancies in
both spatial (within a frame) and temporal (across frames) domains.

4.2 Compressing Image Data

VP9 applies transforms to pixel data, converting spatial data (pixel values) into frequency data. For
example, in a flat area of an image where most pixel values are the same, VP9 will compress this
efficiently by representing the region with fewer bits.

4.3 Quantization and Lossy Compression

VP9 uses quantization to achieve lossy compression. Transform coefficients are scaled and rounded
to reduce their precision, saving bits. The larger the quantization factor, the more aggressive the
compression and the greater the quality loss.

4.4 Predicting Image Data

VP9 employs intra prediction to estimate the pixel values of a block based on neighboring pixels
within the same frame. Different prediction modes (e.g., vertical, horizontal, diagonal) are used to
match patterns in the image.

4.5 Inter Prediction and Motion Compensation

Inter prediction allows blocks to be predicted from previous frames. Motion vectors are used to
track objects or camera motion between frames, further reducing the amount of data that needs to
be stored. Inter-frame prediction is a major contributor to bandwidth efficiency in VP9.

5. Superblocks and Partitioning (Pages 15-16)

VP9 divides frames into superblocks (64x64 pixels) that are further partitioned based on image
complexity. Each superblock can be split into smaller blocks, down to 4x4 pixels. This hierarchical
partitioning allows for flexible adaptation to different levels of detail within the frame.

 Large, uniform areas (e.g., skies) can be encoded as a single large block, reducing overhead.

 Detailed or fast-moving areas (e.g., faces, objects in motion) are split into smaller blocks to
capture detail and motion.

Partitioning strategies can vary based on the content and desired balance between compression
efficiency and image quality.
6. Transform Techniques and Inverse Transforms (Pages 16-19)

VP9 applies various transform techniques to compress image data, converting pixel values into
frequency data using:

1. Discrete Cosine Transform (DCT): Widely used for block-based compression, the DCT
captures spatial frequency information. Low-frequency components, which represent broad
image details, are prioritized, while high-frequency components (e.g., fine details or noise)
are often discarded.

2. Asymmetric Discrete Sine Transform (ADST): Used for intra-predicted blocks, ADST better
handles directional data.

After compression, the decoder uses inverse transforms to convert the frequency data back into
pixel values. The transform size (e.g., 4x4, 8x8, 16x16, or 32x32) is chosen based on the block size
and content.

7. Reference Frames and Motion Vectors (Pages 19-20)

VP9 allows up to three reference frames for inter prediction. These can be selected from any of eight
available reference frames stored in memory. This flexibility allows the codec to choose the best
reference frame(s) for each block, improving compression efficiency.

 Golden frames and altref (alternate reference) frames are special reference frames that can
be preserved for multiple inter frames.

 Compound prediction allows two reference frames to be blended for better prediction
accuracy in some cases.

Motion vectors are used to indicate the movement of blocks from reference frames to the current
frame. VP9 supports sub-pixel motion estimation, providing fractional-pixel accuracy for improved
video quality.

8. Loop Filtering and Deblocking (Pages 23-24)

After decoding, loop filters are applied to reduce the visibility of block boundaries, improving the
overall image quality. Deblocking filters smooth the transition between blocks, particularly in areas
with high contrast, preventing "blockiness" artifacts that are common in compressed video.

Loop filtering is applied after each frame is decoded but before the frame is displayed or used as a
reference for future frames.

9. Probability Adaptation and Arithmetic Coding (Pages 21-22)

VP9 uses arithmetic coding, which allows for highly efficient bit allocation, especially when some
symbols are more frequent than others. For example, motion vectors that are small or near-zero
occur more often, so fewer bits are allocated to represent them.
Additionally, VP9 continuously adapts the probability models used for encoding based on the actual
data being processed. These probabilities are updated in the frame headers, allowing the codec to
adjust to different types of video content dynamically.

10. Chroma Subsampling and High Bit Depth Support (Pages 22-23)

To reduce data size without a significant reduction in perceived quality, VP9 supports chroma
subsampling, where the color information (chroma) is sampled at a lower resolution than brightness
(luma). VP9 primarily uses the 4:2:0 format, where color information is halved in both horizontal and
vertical directions.

VP9 also supports high bit-depth video, allowing for color depths of 10 or 12 bits per pixel. This
enhances color fidelity, especially in HDR (High Dynamic Range) content, compared to the standard
8-bit color depth.

11. Superframes and Tiling (Pages 20-21)

Superframes combine multiple frames into a single unit for easier transmission and decoding. This is
useful for applications such as streaming, where latency and synchronization are important.

Tiles allow parts of the frame to be decoded independently, facilitating parallel processing. This is
particularly useful for multi-core processors, enabling faster decoding and playback.

12. Parsing and Syntax Elements (Pages 27-54)

This section covers the syntax elements of the VP9 bitstream, including how different types of data
(e.g., motion vectors, coefficients, and prediction modes) are parsed from the bitstream during
decoding. The syntax elements are often encoded using boolean arithmetic coding for efficient
storage.

Parsing follows a hierarchical structure, where higher-level elements like frame headers are parsed
first, followed by smaller elements like transform coefficients and motion vectors.

Conclusion:

The VP9 Bitstream & Decoding Process Specification provides an in-depth look at how VP9 achieves
high-efficiency video compression through various techniques such as transform coding, prediction,
motion compensation, and probability modeling. It is a highly flexible codec designed

Video Quality (PSNR/SSIM Metrics)

Objective quality metrics like PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity
Index) are used to compare visual quality after compression.
 VP9 often performs better than H.264 in both PSNR and SSIM at the same bitrate, providing
better quality for a lower bitrate.

 Compared to H.265, VP9 offers similar PSNR and SSIM scores, though H.265 may outperform
VP9 slightly at higher resolutions (4K and beyond) and for low-bitrate scenarios.

 Overall, H.265 tends to have a slight edge over VP9 in high-quality and high-resolution video
compression, while VP9 is more comparable or even superior in web video streaming
contexts (where both quality and speed matter).

Latency

Latency is critical for real-time applications such as video conferencing and live streaming.

 H.264 has lower latency compared to VP9 and H.265 because it requires fewer
computational resources.

 VP9 tends to have higher encoding and decoding latency compared to H.264, making it less
ideal for real-time applications unless optimized.

 H.265 can also introduce more latency than H.264 due to its increased complexity, though
hardware acceleration can help mitigate this.

Licensing and Costs

 H.264 is covered by patents and licensing fees, managed by the MPEG-LA. Most users are
covered under general-use terms, but licensing fees apply to certain commercial uses.

 H.265 also has licensing fees, which are considered more complex and costly than those for
H.264. The need to pay royalties has slowed H.265's adoption in some cases.

 VP9 is royalty-free, developed under an open-source model by Google, making it attractive


for platforms like YouTube, as it avoids licensing costs.
Summary of Comparison Between VP9, H.264, and H.265

Performance
VP9 H.264 (AVC) H.265 (HEVC)
Metric

Compression 30-50% more efficient


Baseline 5-15% better than VP9
Efficiency than H.264

Slower than H.264, Faster than VP9 but slower


Encoding Speed Fastest
H.265 (~2-3x) than H.264

Slower than H.264 but Fastest (widespread HW Slowest, requires HW


Decoding Speed
improving support) acceleration

Better than H.264, Slightly better than VP9 in


Video Quality Lower than VP9, H.265
similar to H.265 high-bitrate cases

Growing (Android, Good but limited due to


Hardware Support Widely supported
Chrome, etc.) licensing

Requires licensing, more


Licensing Royalty-free Requires licensing
costly

Can be high without HW


Latency Higher than H.264 Lowest
acceleration

Streaming (YouTube), General use, web, lower


Use Cases 4K, HDR, UHD broadcasting
web videos resolutions

You might also like