0% found this document useful (0 votes)
15 views

Fuzzing JavaScript Engines With A Graph-Based IR

Uploaded by

zhouxt0505
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Fuzzing JavaScript Engines With A Graph-Based IR

Uploaded by

zhouxt0505
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Fuzzing JavaScript Engines with a Graph-based IR

Haoran Xu Zhiyuan Jiang∗ Yongjun Wang∗


NUDT NUDT NUDT
Changsha, China Changsha, China Changsha, China
[email protected] [email protected] [email protected]

Shuhui Fan Shenglin Xu Peidai Xie


NUDT NUDT NUDT
Changsha, China Changsha, China Changsha, China
[email protected] [email protected] [email protected]

Shaojing Fu Mathias Payer


NUDT EPFL
Changsha, China Lausanne, Switzerland
[email protected] [email protected]

Abstract ACM Reference Format:


Mutation-based fuzzing effectively discovers defects in JS engines. Haoran Xu, Zhiyuan Jiang, Yongjun Wang, Shuhui Fan, Shenglin Xu, Peidai
Xie, Shaojing Fu, and Mathias Payer. 2024. Fuzzing JavaScript Engines with
High-quality mutations are key for the performance of mutation-
a Graph-based IR. In Proceedings of the 2024 ACM SIGSAC Conference on
based fuzzers. The choice of the underlying representation (e.g., a Computer and Communications Security (CCS ’24), October 14–18, 2024, Salt
sequence of tokens, an abstract syntax tree, or an intermediate rep- Lake City, UT, USA. ACM, New York, NY, USA, 15 pages. https://doi.org/10.
resentation) defines the possible mutation space and subsequently 1145/3658644.3690336
influences the design of mutation operators. Current program repre-
sentations in JS engine fuzzers center around abstract syntax trees
and customized bytecode-level intermediate languages. However, 1 Introduction
existing efforts struggle to generate semantically valid and mean-
ingful mutations, limiting the discovery of defects in JS engines. JavaScript (JS) is the most popular programming language and
Our proposed graph-based intermediate representation, FlowIR, drives the modern Web [54]. An astonishing 98.8% of websites
directly represents the JS control flow and data flow as the muta- execute JS on the client side [45]. As of 2022, JS has more code
tion target. FlowIR is essential for the implementation of powerful repositories than any other language on GitHub [15]. The JS engine
semantic mutation. It supports mutation operators at the data flow serves as a language processor to compile and execute JS code. It is
and control flow level, thereby expanding the granularity of mu- integrated into Web browsers to facilitate dynamic features of web-
tation operators. Experimental results show that our method is sites. In recent years, a notable series of high-risk vulnerabilities
more effective in discovering new bugs. Our prototype, FuzzFlow, [40] have emerged in widely used JS engines, posing substantial
outperforms state-of-the-art fuzzers in generating valid test cases security risks for billions of users. Adversaries can chain successful
and exploring code coverage. In our evaluation, we detected 37 new attacks with an escape from the browser sandbox, gaining unau-
defects in thoroughly tested mainstream JS engines. thorized privileges by crafting malicious Web pages and enticing
victims to access them [41]. It is imperative to proactively identify
CCS Concepts potential defects in JS engines to protect users against attacks.
Fuzzing is an effective automated bug discovery approach for JS
• Security and privacy → Software security engineering. engines [21, 24, 35, 44, 46]. Mutation-based fuzzers [53, 56] create
new test cases by mutating existing seeds. They are effective in
Keywords exploring the input space surrounding existing inputs, thereby
fuzzing, JavaScript engine, mutation, control flow, data flow potentially uncovering unexpected edge cases or vulnerabilities
[1, 20, 37, 47, 48].
∗ Corresponding author. Efficient mutation operators are key to the performance of mu-
tational fuzzers [19]. Designing effective mutation operators for
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed JS engines raises several challenges: C1-Validity. Mutations must
for profit or commercial advantage and that copies bear this notice and the full citation produce test cases with correct syntax and semantics to prevent
on the first page. Copyrights for components of this work owned by others than the the engine from rejecting the test case during the early stages
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission of the engine’s processing (e.g., parsing, semantic analysis). C2-
and/or a fee. Request permissions from [email protected]. Semantically meaningful mutation. The attack surface of JS
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA engines predominantly resides in the engine’s backend [42], which
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0636-3/24/10 process the semantics of JS programs. Syntactic information is
https://doi.org/10.1145/3658644.3690336 largely discarded during the parsing stage. Effective testing of
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

backend components necessitates semantically meaningful muta- of the program through the interconnected relationships between
tions, going beyond mere syntactic changes. Achieving this requires nodes. FlowIR supports bidirectional conversion to and from JS.
thoughtfully altering the control flow or data flow of the program. During fuzzing, we maintain the seed queue in the FlowIR format.
C3-Mutation granularity. High-quality input corpora such as Based on FlowIR, we design a series of mutation operators. Mu-
known proof of concept (PoC) inputs or regression test cases are tations are performed on FlowIR, and subsequently, the mutated
deliberately designed to deal with vulnerable components like JIT representation is converted back to source code as input for the
compilers, which have specific control-flow conditions crucial for tested engines.
JIT compilation [37]. Effectively identifying vulnerabilities in JIT To address C1, FlowIR emphasizes representing semantics rather
compilers necessitates refining the granularity of mutation. This en- than syntactic structures. This strategy facilitates the enforcement
tails preserving the control-flow structure within the seed to retain of semantic constraints during mutation. To address C2, FlowIR
the trigger conditions, coupled with effective semantic mutations represents the control flow and data flow of the program directly.
at the data-flow level. Mutations operate directly on the semantics of the seed, simplifying
The fuzzer’s input representation format dictates possible mu- the implementation of meaningful semantic mutations. For instance,
tation operators. The representation defines the rules and mecha- explicit modeling of the data flow allows for the easy identification
nisms by which mutations are applied to existing seeds. Fuzzers for of unused data flows in the seed, enabling the avoidance of mutation
JS engines fall into three categories based on the representation of on invalid semantics. To address C3, this paper establishes a low
the JS program: coupling between control flow and data flow in FlowIR. This feature
enables the fuzzer to concentrate on mutations within either the
• Initially, fuzzers have mutated source code based on byte
control-flow or data-flow subgraph independently, mitigating the
or token sequences [31, 55]. As this mutation is unaware
need to simultaneously address their mutual influence and thereby
of the syntax, the resulting test cases are prone to syntax
minimizing the likelihood of introducing semantic errors. This
errors, thereby diminishing their validity. The majority of
enhances the operational granularity of mutations.
generated inputs fail at the early parsing stage, and cannot
To validate the efficacy of our approach, we implement FuzzFlow
proceed to more complex aspects of the implementations.
as a prototype fuzzer that utilizes FlowIR. The experimental results
• Current methods mutate the Abstract Syntax Tree (AST). In
demonstrate that FuzzFlow enhances the syntax correctness and
contrast to token sequences, mutating the AST facilitates
semantic validity of generated test cases significantly. Low coupling
the generation of syntactically correct test cases [37, 47].
between data flow and control flow enhances testing effectiveness
• Alternatively, mutation may happen at an intermediate lan-
for backend engine components. The validity of test cases gener-
guage [20]. Fuzzilli devises a bytecode-level Intermediate
ated by FuzzFlow reaches 72% (18.6% higher than the baselines),
Representation (IR) of the JS as the mutation target, aiming
showcasing a remarkable improvement in code coverage by 4.78%.
to produce high-quality inputs.
Moreover, mutations based on the FlowIR achieve high throughput,
However, both AST and bytecode IR, commonly used for mu- leading to efficient fuzzing. After applying our technique to six
tation, have limitations in exploring vulnerabilities in JS engine mainstream JS engines (V8 in Chrome, SpiderMonkey in FireFox,
backends. AST mutations lack semantic constraints, often yielding JavaScriptCore in Safari, ChakraCore, JerryScript, QuickJS), Fuz-
test cases with semantic errors that cannot reach the backend (C1). zFlow has identified a total of 37 new defects. Our prototype will
Additionally, mutations on the AST generate many test cases with be available at https://github.com/walkcreate/FuzzFlow. We make
altered syntax but unchanged semantics, failing to adequately test the following contributions:
complex interactions of the backend implementations (C2). Byte- • This paper proposes FlowIR, a graph-based program rep-
code IR lacks explicit control and data flow, making semantically resentation used for mutation that directly represents the
meaningful mutation implementation difficult (C2). For instance, control and data flow of JS.
identifying unused data flow in bytecode IR is challenging, leading • Based on FlowIR, this paper proposes mutation operators
to wasteful computing resource consumption due to a substantial with the following advantages: (1) facilitate the generation of
number of mutations on invalid data flows. Designing fine-grained semantically valid test cases, (2) enable semantically mean-
mutation operators for comprehensive data-flow-level mutation ingful mutation, (3) enhance mutation at finer granularity.
proves highly challenging, regardless of whether based on AST or • The experimental results demonstrate that FlowIR serves
bytecode IR (C3). Overall, the inherent characteristics of these repre- as a highly effective mutation target for fuzzing JS engines.
sentations constrain mutation operator design in fuzzing, rendering Based on FlowIR, we implemented a new fuzzer named Flow-
neither AST nor bytecode ideal choices for mutation. Fuzz, which has successfully identified 37 new defects in the
We address the aforementioned gap by designing a new repre- mainstream JS engines.
sentation to support effective mutation operators. The mutation
of a program requires code transformation, and one common sce- 2 Background
nario within this transformation is optimization [27]. Similarly, the
mutation operator is closely related to the IR where the mutation is 2.1 Challenges when Fuzzing JS Engines
carried out. We introduce a graph IR named FlowIR. Our proposed Mainstream JS engines are composed of a parser, bytecode compiler,
IR directly represents the JS program’s control flow and data flow interpreter, JIT compiler, and supporting components. To enhance
for mutation. Specifically, we explicitly model the control flow and the execution efficiency, mainstream JS engines adopt a mixed
data flow of JS programs with FlowIR, capturing the semantics compilation architecture. Specifically, they use a bytecode compiler
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

Mutations on token/strings var v0 = 0;


Source Code Syntax 1
Token insertions, replacements...
2 var v1 = " hello " ;
Abstract Mutations on subtree
Syntax Tree Subtree insertions, replacements...
3 v0 = 3;
4 var v2 = v1 + " world " ;
Mutations on instructions
Bytecode IR 5 print ( foo ( v0 , v2 ) ) ;
Instruction insertions, replacements...

Mutations on control and data flow


6 // ------ mutation ------
Graph IR Semantics
Node/edge insertions, replacements... 7 var v0 = 100; // mutation on unused data flow
8 var v1 = " hello " ;
9 v0 = 3;
Figure 1: Mutations for JS can be applied at different levels 10 var v2 = v1 + " world " ;
11 var v3 = foo ( v0 , v2 ) ; // mutation on syntax only
12 print ( v3 ) ;
Listing 1: An example of mutation on AST
[18] as a baseline, complemented by one or even multiple levels
of JIT compilers [16] to deeply optimize hot code, and compile it
into machine instructions for execution. The introduction of a JIT
compiler has significantly improved the execution efficiency of JS 1 v0 <- LoadInt 0
engines [25]. However, its complexity has also unveiled new attack 2 v1 <- LoadString " test "
3 ...
surfaces, making JIT compilers a focal point for researchers [3, 40].
4 v23 <- LoadInt 0
The attack surface of the JS engine predominantly resides in 5 v24 <- LoadBuiltin " foo "
its backend components, notably represented by the JIT compiler 6 CallMethod v24 , v0
[42] and garbage collector [39]. While the frontend handles lexical 7 // ------ mutation ------
analysis and syntax analysis, the backend handles garbage collec- 8 v0 <- LoadInt 0
9 v1 <- LoadString " test "
tion, code optimization, and deoptimization. Backend functions 10 ...
involve complex logic with frequent updates. Uncovering defects in 11 v23 <- LoadInt 0
backend components imposes elevated demands on the mutation 12 v24 <- LoadBuiltin " foo "
operators employed by fuzzers. 13 CallMethod v24 , v23 // mutation with the same data flow
Listing 2: An example of mutation on Bytecode-level IR
2.2 Existing Mutation Targets and Operators
The JS engine exclusively receives input in the form of source code.
Regardless of the chosen IR for mutation, such as AST or byte-
Listing 1 highlights the mutation with changes in syntax but no
code, a conversion process is required between the JS program
change in semantics. At the first mutation position, the initial as-
and the selected IR. Although this conversion introduces compu-
signment to variable v0 in the seed is an unused data flow, because
tational overhead, opting for an IR instead of the token sequence
the value of the variable is reassigned to 3 before v0 is subsequently
significantly enhances the quality of generated test cases [47]. This
read. Therefore, there is no benefit in mutating the value 0 in line
improvement contributes to the overall effectiveness of fuzzing,
1. Detecting such unused data flows based on the existing muta-
explaining its widespread adoption among researchers.
tion targets requires challenging data flow analysis. In the second
The parsing process produces the AST. Given the existence of
mutation position, the syntax tree of the test case has changed, yet
open-source parsers for most programming languages [2], AST
the semantics of the program remain unchanged. Unfortunately,
stands out as the most readily accessible IR. In comparison to token
these mutations are inevitable with existing fuzzers. This type of
sequences, AST-based mutations prove advantageous in preserv-
mutation is incapable of testing the backend components of the JS
ing the syntactic validity of the code. Therefore, it serves as the
engine.
predominant mutation target for the fuzzing of language proces-
To enhance the mutation of the seed’s data flow, Fuzzilli intro-
sors [1, 23, 47]. However, mutations based on the AST often cause
duces several constraints on the bytecode IR. These constraints
semantic errors (with the result that test cases are rejected early)
involve adopting a Static Single Assignment (SSA) paradigm [8, 9]
[37]. Moreover, as it corresponds directly to the syntax structure,
and restricting instructions to accept only a single variable as input.
mutating at the AST-abstraction yields alterations in syntax but no
While these constraints enhance the efficacy of semantic mutation,
change in semantics. Such mutations fail to reach the vulnerable
they do not fully address the underlying challenge. For instance,
backend of JS engines.
Listing 2 illustrates the analytical dilemma encountered when im-
To enhance the capability of mutation operators in generating se-
plementing semantic mutation based on bytecode IR. Variable v0
mantic mutations, Fuzzilli [20] conducts mutations on the bytecode
and v23 differ, yet they contain identical data. As a result, chang-
IR developed in their work. They introduce FuzzIL, an IR comprised
ing v0 to v23 does not modify the seed’s semantics, presenting a
of instruction sequences. The supported mutation operators en-
challenge in identifying these indistinguishable data flows based
compass the insertion and modification of instructions, as well as
on bytecode IR.
the mutation of instruction input. Unlike AST, bytecode-level IR is
generated after specific semantic analysis, providing a more direct
expression of semantics. However, FuzzIL does not explicitly cor- 2.3 Program Transformation and Graph IRs
respond to the control flow and data flow, potentially resulting in The mutation of JS seeds involves transforming these programs.
meaningless semantic mutation. Compilation optimization transforms code to be more efficient.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

JS2Graph Muta�on on FlowIR Graph2JS & Tes�ng


Seeds Muta�ons
3.1 FlowIR
(Regression Graph2JS Run JS
Control Flow
tests, PoCs) Our main contribution, FlowIR, expressively represents the control
Data Flow Extract
JS2Graph FlowIR as Data Flow Instrumented flow, data flow, and dependencies between JS entities. Our prototype
Seeds JS Engine
Seeds Splicing implementation, FuzzFlow, utilizes FlowIR for JS fuzzing.
Queue Interes�ng Crash
Extract Data Flow Independent
Append to
The first key difference between FlowIR and existing graph IRs
Data Flow Run�me
Subgraph Pool
Seed Queue Feedback is that FlowIR supports bidirectional conversion with source code,
achieved through a careful redesign of nodes and edges. Bidirec-
Figure 2: Overall architecture of FuzzFlow tional conversion enables the fuzzer to reflect on and improve seeds
continuously, mapping execution feedback to concrete parts of the
test input.
The second key difference between FlowIR and existing IRs is
However, a notable distinction is that optimization requires pre-
the precise definition of FlowIR’s functional scope, which prevents
serving the semantic invariance of the program, a constraint not
exposure to unnecessary information during mutation. This feature
applicable to mutation.
allows the fuzzer to avoid redundancy in the test inputs. Nodes are
Optimizations are accomplished on IRs. IR serves as an interme-
categorized into two types: Control Flow Nodes (CFN) capturing the
diate layer between the source code and the machine code [28, 43].
program’s structural representation, and Data Flow Nodes (DFN)
Modern compilers often employ a variety of IRs. Common examples
handling data-related operations. The label on a node indicates the
include the AST, bytecode IR, or graph-based IR. The AST repre-
operation it represents, with inputs to a node serving as inputs to
sents syntax structure but lacks a direct representation of program
the operation.
semantics, resulting in challenges for the execution of semantic-
To construct FlowIR from a JS program, we conduct an intra-
related optimizations. Bytecode IR involves complex expression of
procedural analysis, associating each method with a corresponding
control flow and data flow [7]. Optimizing at this level may require
graph. We merge control flow and data flow into a cohesive graph
dealing with intricate branching and data manipulation, making it
with minimal coupling. This unified graph provides two key bene-
more challenging compared to higher-level representations.
fits: it aids in JS code conversion and facilitates mutation operators
Graph-based IRs [4] play a crucial role in modern compiler de-
involving both control flow and data flow, thereby enhancing flexi-
sign by providing a structured unified representation that facili-
bility in mutation operator design. The coupling is low, allowing
tates a wide range of analyses and optimizations across different
independent mutations of control flow and data flow.
programming languages [6]. Graph IRs utilize graph-based data
Definition 1: The FlowIR 𝐺 = (𝑉 , 𝐸, 𝜆) is a directed, node-labeled,
structures to depict the control flow and data flow of a program.
and edge-unlabeled graph where 𝑉 is a set of nodes, 𝐸 ⊆ (𝑉 × 𝑉 )
Generally, nodes symbolize program entities (e.g., constants, expres-
is a set of directed edges, and 𝜆 : 𝑁 → Σ is a node labeling function
sions, statements), and edges between nodes signify relationships
assigning a label from the alphabet Σ to each node.
or dependencies among program entities. Presently, graph-based
This graph encompasses a Control-Flow subGraph (CFG) and a
IRs are widely used in modern compilers and contribute to a range
Data flow Dependency subGraph (DDG):
of optimization tasks, including constant propagation, common
sub-expression elimination, loop optimization, or function inlin-
• Nodes represent control structures in the program, as well as
ing [13]. Program Dependency Graph (PDG) [14] stands out as a
operators and operands. The labels on the nodes represent
widely adopted paradigm within the realm of graph IR. PDG ex-
the node’s semantic operation.
plicitly models and represents the control-flow dependencies and
• The edges incident to a node represent both the data values
data-flow dependencies of the program, offering a structured repre-
on which the node’s operations depend and the control con-
sentation that assists the efficient deployment of many traditional
ditions on which the execution of the operations depends.
optimization techniques. Illustrating on the JS engine, the Turbo-
Fan compiler [16] employed by V8 is developed on the graph-based The set of all dependencies for a program are viewed as induc-
TurboFan IR. Edges in TurborFan IR represent data flow, control ing a partial order on the operations in the program that must be
flow, and dependencies. followed to preserve the semantics of the original program. For
Inspired by the code transformation scenario of optimization, improved analysis, FlowIR is in SSA form. Each node produces at
this paper advocates for the use of graph IR as a mutation target most one value. 𝜙 functions are used at join points. 𝜙 nodes input
to effectively achieve semantic mutations of JS programs. This is several data values and output a single selected data value.
depicted in Figure 1. Definition 2: The CFG represents the partial order of statement
execution, as defined by the semantics of JS.
3 Design The CFG comprises two node types: operator or instruction
Figure 2 highlights the overall framework of FuzzFlow, the pioneer nodes with side effects (behavioral CFNs), and region nodes (struc-
fuzzer that mutates programs using a graph IR. FuzzFlow comprises tural CFNs) indicating control flow structures. Operator nodes with
three components: JS2Graph, Mutation, and Graph2JS. In the fuzzing side effects include New and Delete for object creation and dele-
initiation, JS2Graph compiles the initial seed set into FlowIR format. tion, Load and Store for reading and writing objects, and Invoke
While fuzzing, Mutation randomly picks a seed from the queue and for method calls. Instruction nodes denote control flow changes
mutates it. Post-mutation, Graph2JS lifts test cases in FlowIR back like return and throw statements. Operator and instruction nodes
to source code for execution. We detail the modules as follows. in CFG are fixed within the control flow. Motion of these nodes
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

will change the semantics of the program. Region nodes encapsu- Algorithm 1: Translate JS to FlowIR
late control conditions, grouping nodes with identical conditions. Input : The seed JS code
FlowIR employs region nodes to represent control structures such Output : FlowIR of the seed
1 𝑎𝑠𝑡 ← Parse(JSCode)
as branches, loops, and exceptions. The distinguished entry node 2 𝑎𝑠𝑡 ← ScopeAnalysis(𝑎𝑠𝑡 )
Start representing the beginning of the program. 3 𝑎𝑠𝑡 ← ReferenceResolve(𝑎𝑠𝑡 )
Nodes in the CFG have successor edges indicating their possible 4 𝑎𝑠𝑡 ← LeftValueAnalysis(𝑎𝑠𝑡 )
5 𝑔𝑟𝑎𝑝ℎ ← CreateEmptyGraph ()
next nodes. Edges between control flow nodes denote direct transi-
6 function ProcessTree(node):
tions. Each node in the graph has at most two successors. Nodes 7 if node == VariableDeclaration then
with two successors are assumed to have true and false attributes 8 if node.initialization is not null then
9 flowNode ← ProcessTree (node.initialization)
associated with their outgoing edges. 10 Create a new variable proxy
Definition 3: The DDG demonstrates the flow of values from 11 Set the variable proxy to flowNode
12 AddNodeToGraph ()
definitions of a variable to its uses. The nodes in DDG consist of 13 return flowNode
operators and operands. 14 if node == BinaryOpExpression then
15 leftFlow ← ProcessTree (node.left)
For data flow representation, a node has input edges pointing to 16 rightFlow ← ProcessTree (node.right)
nodes providing its operands. The inputs to a node are inputs to the 17 Create BinaryOpNode
node’s operation. Operands encompass literals, variables, or expres- 18 Create edges between BinaryOpNode and its dependencies
19 AddNodeToGraph ()
sions. Each node defines a value based on its inputs and operation, 20 return BinaryOpNode
available on all output edges. All input edges signify scheduling 21 if node == IfStatement then
22 conditionFlow ← ProcessTree (node.condition)
dependencies. A node must be scheduled post its dependencies 23 Create BeginNodes and EndNodes for the two branches
when lifting FlowIR to JS. 24 Create IfNode
Control-flow structures form a backbone for data-flow nodes. 25 Create edges between IfNode and its dependencies

Data flow nodes are solely restricted by their data dependencies. 26 Set the current control flow as the BeginNode for true branch
27 ProcessTree (node.trueBranch)
Formally, let X and Y be nodes in a DDG. There is a data dependence
28 Set the current control flow as the BeginNode for false branch
from X to Y with respect to a variable v iff: 29 ProcessTree (node.falseBranch)
(1) there is a non-null path p from X to Y with no interventing 30 Create MergeNode
definition of v and either: 31 Merge the two branches control flow into MergeNode
32 AddNodeToGraph ()
(2) X contains a definition of v and Y a use of v; or X contains 33 Set the current control flow as the MergeNode
the use of v and Y a definition of v. 34 return IfNode
35 ...
Edges between CFG and DDG: FlowIR minimizes the coupling be- 36 return node
tween control flow and data flow. Interaction between the CFG and 37 ProcessTree (𝑎𝑠𝑡 )
DDG is restricted to three node types: behavioral CFNs, PhiNode
and IfNode. Behavioral CFNs often take data flow input. Meanwhile,
they may have control predecessors and successors. PhiNode is clas- for other language processors. However, the conceptual foundation
sified into data flow nodes, linked to control condition expressions of FlowIR can be extendable to other languages. Moreover, nodes
and resulting data-flow nodes from two branches. The ith data for fundamental language features are reusable.
input to PhiNode corresponds to the ith branch. Similarly, IfNode Figure 4 illustrates the FlowIR corresponding to a seed triggering
is classified into a control flow node, linked to both the data flow CVE-2021-21220 in V8. Notably, FlowIR directly represents the
node representing the conditional expression and the two control control flow and dependencies in data flow, achieving low coupling.
flow branches. We show the example edges between CFG and DDG Data flow analysis on FlowIR is straightforward. For example, in
in Figure 3. Figure 4c, node-1 is unused by any other, indicating an unused data
Predecessor CallTarget Predecessor Condition Merge Data[0] Data[1]
flow (parameter in function foo). Consequently, mutating this node
is ineffective for uncovering defects in the JS engine.
Invoke IfNode PhiNode
Mutations on FlowIR are highly efficient. For instance, altering
the edge from node-20 to node-19 in Figure 4b and redirecting it
Successor Begin Begin Usages to node-3 (red dashed line) allows mutation of the loop condition
from i < 0x100 to i < 2**31. Notably, the latter expression
Figure 3: Nodes connecting control flow and data flow corresponds to an AST subtree with a height of 2. Generating such
an AST subtree requires 2 steps. In contrast, leveraging the graph,
In our prototype, FlowIR incorporates support for diverse JS only modifying the destination node of an edge is needed.
language features, covering basics like variable operations, bina-
ry/ternary operations, if-else branches, loops, and functions. Addi- 3.2 Translating JS to FlowIR
tionally, FlowIR extends support to advanced features, including To leverage existing high-quality seeds, FuzzFlow supports the
JS object-oriented programming, and the JS exception-handling translation of JS to FlowIR. Notably, Lee et al. [29] underscore the
mechanism. Nodes for fundamental language features are language- effectiveness of regression test cases as fuzzing seeds. Development
independent, while those expressing JS-specific semantics are not. teams for mainstream JS engines continually augment their regres-
This implies that FlowIR cannot be directly used as a mutation target sion test sets with tests that expose historical engine defects, aiding
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

0 Start
8
6
LoopBegin
NewInstance
0 Start
5 7 21 1 Param(a)
8
CallTarget LoopIf a0
1 arr = new Uint32Array ([2**31]) ; EndNode LoadNode
Uint32Array(arr0)
7
2 10 12 20 9 Add(+)
4 LoopExit Begin Mutation
3 function foo ( a ) { < Return
ArrayLiteral example 3
6
4 var x = 1; 1
Xor(^)
15 19 17 x0
5 x = ( arr [0] ^ 0) + 1; 3 22 Phi
Power(**) EndProgram Invoke 0x100
6 return x ; i1 5
LoadIndex 4
7 } 1 2 9
14 11 18
CallTarget 0 0
8 Add(+) 2
2 31 LoopEnd foo i0 Global
9 for ( let i =0; i <0 x100 ; ++ i ) 13 16 arr0
Independent data flow
10 { foo ( true ) ; } subgraph example true 1
(c) The FlowIR of function
(a) Test case (b) The FlowIR of the main script foo

Figure 4: Test case to trigger CVE-2021-21220 in V8 and the corresponding FlowIR. Blue nodes denote control flow nodes, while
orange nodes signify data flow nodes. Blue edges represent control flow edges, while orange edges represent data flow edges.
Black edges denote connections between control flow and data flow, as well as auxiliary connections.

subsequent development. By mutating these regression test cases, structure of the JS program. However, this mutation is restricted
the fuzzer gains an avenue to delve deeper into historical vulner- by instruction sequence, limiting the available data flow to that
ability patterns. To allow continuous testing and protect against generated by preceding instructions. Consequently, the mutation
regression bugs, we implement the JS2Graph module, designed to space is constrained, hindering full utilization of the seed’s overall
seamlessly convert JS programs into FlowIR. Consequently, Fuz- data flow. Secondly, input mutation of TAC instructions alters vari-
zFlow can navigate the input space surrounding existing test cases, able names as instruction parameters. As noted in the background,
thereby revealing potentially unforeseen neighboring bugs. variable names do not directly map to the data flow. Different vari-
The JS2Graph involves syntax analysis of the JS to acquire the able names may represent the same data flow, posing challenges
AST. Subsequently, a top-down semantic analysis is conducted on for effective data flow mutation. FlowIR effectively addresses the
the AST to establish scope, identify declared symbols, perform re- aforementioned challenges. Firstly, FlowIR’s DFNs are independent
solve resolution, and analyze left values. Finally, syntax-directed of control flow, constrained only by data dependencies. After chang-
translation is performed based on the results of the semantic anal- ing these dependencies through mutation, the Graph2JS module
ysis to convert the AST into FlowIR. Throughout this conversion, can appropriately relocate DFNs within control flow regions. Thus,
the control flow and data flow of the program are analyzed. leveraging FlowIR’s data-flow subgraph enables mutations to span
Algorithm 1 outlines the conversion process of JS2Graph. Given instructions and basic blocks, facilitating comprehensive analysis
the diverse syntax features of JS and space constraints on the page, and mutation of the entire seed’s data flow. Secondly, the input
the algorithm only presents a subset of JS2Graph functionality. for mutation is the DFN, not the variable name. DFNs in FlowIR
Considering that the basic building blocks of JS programs are dec- accurately model data dependencies within the seed, enabling more
larations (e.g., VariableDeclaration), expressions (e.g., BinaryOp, targeted mutations.
represented by the DFN), and statements (e.g., IfStatement, rep- Data-flow subgraph mutation operators can be further catego-
resented by the CFN), Algorithm 1 opts to showcase these three rized into two types. The first category involves node attribute
elements. The prototype implementation encompasses support for mutation. This fine-grained mutation alters the attributes (labels)
additional language features. of DFNs. For example, if an original binary operation node employs
an addition operator, mutating it into a subtraction modifies the
3.3 Mutation Operators on FlowIR data flow of the seed. Similarly, for a terminal node with an original
integer value of 1, changing its value to -1 alters the data flow of the
FuzzFlow encompasses two types of mutation operators: data-flow seed. The second category is input mutation of the node, which is
subgraph mutation and control-flow subgraph mutation. further divided into intra-procedural mutation and inter-procedural
Mutation on Data-flow Subgraph. The data-flow subgraph mu- mutation. In intra-process mutation, the data flow is mutated by
tation entails preserving the control structures of the seed while modifying the connection between data nodes and changing the
solely mutating the data flow. This mutation operator introduces a input of a specific data node within the method. Leveraging FlowIR,
finer granularity. For instance, it enables the retention of conditions this mutation demonstrates high execution efficiency. It involves
that trigger JIT compilation. If the seed induces JIT compilation, changing the starting point of an input edge of the data node to
the mutated test cases under this operator can also similarly trigger another one without having to copy any nodes. Inter-procedural
JIT compilation. mutation on data flow aims to splice the flow across methods. In-
Data flow mutation via FlowIR provides distinct advantages over dependent data-flow subgraphs are extracted from the seeds to
Fuzzilli’s FuzzIL, a sequential IR based on Three Address Code serve as the material for splicing.
(TAC). Firstly, TAC’s data flow is fixed within the control flow
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

Figure 4b shows an independent data-flow subgraph in the gray • If-Else Branch Swapping: This mutation involves swapping
rounded rectangle. An independent data-flow subgraph is defined the if-else branches, thereby changing the execution branch
as a subgraph extracted from the FlowIR. In this context, all nodes under specific branch conditions.
within the subgraph depend solely on the data flow inputs ex-
isting in that subgraph. The extraction of independent data-flow Control flow scheduling proves highly effective in altering the ex-
subgraphs aims to establish a comprehensive data flow 𝑃𝑜𝑜𝑙 for ecution of a seed. For instance, it can shift the control flow from
mutations. During initialization of a fuzzing run, after converting within a loop to outside the loop, or vice versa. Leveraging the
all seed JS programs to FlowIR, we analyze each FlowIR and extract graph data structure, CFN deletion requires only the modification
independent data-flow subgraphs into the 𝑃𝑜𝑜𝑙. Each node within of predecessor and successor edges between CFNs to achieve the
any subgraph in 𝑃𝑜𝑜𝑙 has the potential to be inserted into a seed desired changes. CFN insertion involves incorporating new nodes
as an input node for splicing, thus achieving data flow splicing or node groups into the existing flow. For instance, in a seed con-
between procedures. taining function declarations, one can introduce nodes for new
To extract independent data-flow subgraphs, the traversal ini- function calls with different parameters. Additionally, loops can
tiates from the leaf nodes, typically literals, within the data-flow be inserted into a seed, thereby augmenting the complexity of its
subgraph, ascending along their dependencies. Leaf nodes, without control structure. CFN data input mutation, coupling control flow,
data input, are unequivocally eligible for inclusion in the data-flow and data flow, specifically modifies the data input of CFNs. For
subgraph. Following this, the analysis advances along the data flow instance, altering the data input of a ReturnNode impacts the data
edge of each node, examining the higher-level nodes that utilize the flow that is returned. In most cases, when mutating the control-flow
current node as their data input. For each traversed DFN, whether subgraph, changes in the data-flow subgraph are often implicated,
it is included in the subgraph depends on all the DFNs it depends on indicating a larger mutation granularity.
already being part of the subgraph. If an input (designated as 𝐼 ) on Speeding Up Mutators. We conduct further optimizations on
which the DFN relies is absent from the extracted subgraph, there FlowIR-based mutations to enhance the fuzzer’s efficiency. Instead
is an attempt to recursively incorporate node 𝐼 into the subgraph. of copying the entire graph for a mutation, we conduct the mutation
When executing input mutations of data flow across graphs, an directly on the FlowIR of the seed, keeping a record of the performed
independent subgraph is randomly selected from 𝑃𝑜𝑜𝑙 as the input mutation operators. After mutation, the Graph2JS translation is
for the mutation node. applied to the mutated FlowIR to generate the JS program for testing.
Mutation on Control-flow Subgraph. In comparison to exist- When the engine finishes execution, we copy the mutated FlowIR to
ing mutation platforms, control flow mutation based on FlowIR generate a new seed only when the test case reveals new coverage.
proves advantageous in generating test cases that are both syntac- Irrespective of the interest of the mutated seed, after the execution
tically correct and semantically valid. When mutating the control finishes, a reverse operation is performed according to the recorded
flow in AST or bytecode IR, changes in the positioning of control mutation operators, restoring the seed to its pre-mutated state. This
flow elements can easily break the semantic integrity of the seed. design mitigates unnecessary copying of FlowIR, thereby improving
Conversely, FlowIR, by modeling dependencies between nodes, en- the overall efficiency of the fuzzer.
sures that moving a CFN does not impact other edges of the node,
except for the mutated control flow edge. The reason is that de-
pendencies related to unchanged edges are automatically fulfilled 3.4 Translating FlowIR to JS
in the Graph2JS module. This helps control flow mutation based After mutation, FuzzFlow converts FlowIR back into JS as fuzzing
on FlowIR without compromising semantic dependencies. Such a input. This conversion poses new challenges. Firstly, the IR must
mechanism contributes to the generation of test cases that are not preserve high-level semantic information to facilitate the conver-
only syntactically correct but also semantically valid. sion back to JS programs, distinguishing it from existing IRs. For
The mutation target of the control-flow subgraph may be a single example, in LLVM IR, the control flow of exception handling is
behavioral CFN, or a node group containing a control flow structure, simplified to ordinary branch and jump instructions, resulting in
such as a branch or a loop. Control flow mutation operators can be the loss of crucial high-level semantic information that makes it im-
further divided into four types: CFN scheduling, CFN deletion, CFN practical to restore the IR to source code. Secondly, the conversion
insertion and CFN data input mutation. CFN scheduling involves must ensure the accurate translation of both control and data flow,
alterations in the placement of CFNs, which can markedly modify maintaining consistency between FlowIR and the source code.
the execution of the seed and induce semantic changes. Specific To address the aforementioned challenges, FuzzFlow incorpo-
implementation methods for control flow scheduling include: rates targeted designs. Firstly, in FlowIR, high-level semantics are
preserved as nodes. By utilizing these nodes (e.g., TryNode, Store-
FieldNode), precise restoration of semantics is achieved. Secondly,
FuzzFlow conducts graph traversal along the control flow based
• Individual CFN Movement: This mutation relocates a behav- on the dependencies and completes code generation during this
ioral CFN (e.g., function call or object creation) to a random traversal process. Dependencies are satisfied during this process.
position in the control flow. When lifting the FlowIR, FuzzFlow structures the source code in
• CFN Group Relocation: This method entails moving a control regions. A region represents a nested structure, corresponding to
flow unit, such as conditional branches or loops, to a random a block scope of the program. Organizing the generated code in
location in the control flow. regions helps maintain the relationships between scopes.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

Algorithm 2: Translate FlowIR to JS first performs syntax analysis on the input JS. We use ANTLR [38]
Input : FlowIR of the seed to implement the parsing component, and implement the rest of
Output : JS code of the seed JS2Graph ourselves. Fuzzer components including Graph2JS and
1 StartNode ← FlowIR
2 topRegion ← CreateRegion ()
Mutation are implemented entirely in C++.
3 function LiftGraphNode(node):
We rigorously tested the JS2Graph and Graph2JS modules, en-
4 if node == BinaryOpNode then suring the accuracy of the JS-to-Graph and Graph-to-JS conversion
5 region ← LoadCurrentRegion ()
6 proxies ← GetVariableProxyOnDataNode ()
processes with numerous unit tests. JS2Graph only operates during
7 if IsFirstVisit then the fuzzing startup phase to handle the initial seed set, encounter-
8 leftCode ← LiftGraphNode (node.left)
9 rightCode ← LiftGraphNode (node.right)
ing high-quality JS test cases, including historical PoCs. Therefore,
10 binaryCode ← leftCode + node.operator + rightCode the primary challenge lies in accurately translating JS language fea-
11 if len(proxies) > 0 then tures. Unit tests help confirm the accurate translation of supported
12 GenAssignmentStatement (region)
13 return proxies.name JS language features.
14 else Conversely, Graph2JS encounters issues with malformed FlowIR
15 return binaryCode
16 if node == IfNode then
due to mutation, resulting in some seeds being unable to convert
17 parentRegion ← SaveCurrentRegion () back to JS. After running FuzzFlow for 24 hours, we found that,
18 ifRegion ← CreateRegion ()
19 conditionCode ← LiftGraphNode (node.condition)
in the current version, 8% of mutated Graph instances cannot be
20 EmitExpression (ifRegion, conditionCode) lifted to JS. We can address this challenge by applying semantic con-
21 branchTrueRegion ← CreateRegion () straints to FuzzFlow’s mutation operators, which will help mitigate
22 Change current region to branchTrueRegion the issue. This optimization will be a focus in future development
23 LiftGraphNode (node.branchTrue)
iterations. It is worting noting that the possible failure of converting
24 branchFalseRegion ← CreateRegion ()
25 Change current region to branchFalseRegion mutated IR back to JS is not exclusive to FuzzFlow. When muta-
26 LiftGraphNode (node.branchFalse) tions are applied based on IRs like AST, fuzzers encounter similar
27 Add two sub-regions to ifRegion
issues, as mutations can disrupt the original well-formed structure,
28 Restore the parentRegion as current region
29 LiftGraphNode (node.next)
rendering conversion back to the source code impossible.
30 return
31 ...
32 return 4 Evaluation
33 LiftGraphNode (𝑔𝑟𝑎𝑝ℎ )
34 JSCode ← MergeRegionsCode (topRegion)
Evaluation Targets. We evaluated a total of six mainstream JS
engines. The targeted engines encompass Chrome V8, Firefox Spi-
derMonkey, Safari JavaScriptCore, and ChakraCore designed for
desktop browsers. Additionally, we tested JerryScript and QuickJS,
Algorithm 2 illustrates the lifting process of certain FlowIR nodes. which are often deployed on IoT devices. There are currently no
The prototype FuzzFlow encompasses support for the full list of benchmarks, such as LAVA-M [11] or Magma [22], specifically de-
FlowIR nodes. The Graph2JS initiates from the control flow start- signed for evaluating JS engine fuzzers. Meanwhile, the number
ing node and proceeds backward along the flow. When encoun- of mature industrial-grade JS engines is limited. Therefore, despite
tering CFNs that generate scopes, such as IfBlock, ForLoop, or ChakraCore not being utilized in the Edge browser, given its sta-
WhileLoop, a new region is created to accumulate the code that tus as an industrial-grade JS engine developed by Microsoft over
needs to be generated within the scope. When a CFN depends on several years, we consider it an interesting target.
any DFN, a depth-first traversal of the data flow is performed along These JS engines have undergone thorough code audits and
the dependencies. If variables are associated with the DFN, variable testing conducted by both the development team and security re-
assignments are generated. After traversing the current CFN and searchers. Any newly detected defects by FuzzFlow have been
the dependent DFNs, the traversal continues backward along the missed by earlier evaluations, affirming the efficacy of the method
control-flow dependencies to subsequent nodes. Once the traversal proposed in this paper.
of the entire graph is completed, the statement sequence within the Experimental Setup. Both V8 and SpiderMonkey already include
nested region is consolidated into a complete program and passed built-in support for Fuzzilli. To ensure compatibility, we adhere to
to the test module. the Fuzzilli interface. For the other four engines, we modified the
engine code slightly based on Fuzzilli’s patch method. This modifi-
3.5 Implementation cation enables communication between the fuzzer and the JS engine
To demonstrate the effectiveness of our proposed method, we imple- through pipelines for test case delivery and coverage feedback. To
ment a fuzzer (FuzzFlow) that leverages FlowIR. Given the pivotal prevent the fuzzer from becoming stuck on non-terminating pro-
role of mutation target and operators in fuzzer implementation, cesses, the timeout mechanism is commonly employed by fuzzers.
crafting FuzzFlow based on existing open-source fuzzers would The timeout interval represents the duration permitted by the fuzzer
require us to change over 90% of the code. Therefore, we implement for executing a single test case. In our experiments, we adopted the
FuzzFlow from scratch with C++. FuzzFlow is a coverage-guided default timeout of Fuzzilli, which is set to 250ms.
grey-box fuzzer that we implement in 19K lines of code (LoC). We To enhance bug detection, JS engine developers have incorpo-
use clang sanitizer coverage as feedback to guide the fuzzer to ex- rated numerous assertions into the engine source code for internal
plore the code coverage of the JS engine. The JS2Graph module checks. Assertion errors within JS engines may indicate significant
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

Table 1: New bugs found by FuzzFlow. The two bugs in V8 are also noticed and patched by developers before our reporting.

# JS Engine Issue ID Component Security Status Description


1 V8 - Torque Fixed Fatal error in string-tq-inl.inc
2 V8 - JIT compiler ✓ Fixed Fatal error in deoptimizer.cc
3 JavaScriptCore 261949 Runtime Assertion error in runtime/SparseArrayValueMap.cpp
4 JavaScriptCore 265272 JIT compiler ✓ Fixed Integer calculation error after JIT optimization
5 SpiderMonkey 1849099 JIT compiler Fixed Incomplete patch for bug-1745907
6 SpiderMonkey 1849100 Debugger Duplicate makeDebuggeeNativeFunction allows to create a copy of native function
7 SpiderMonkey 1851135 Debugger Fixed Incomplete patch for bug-1845270
8 SpiderMonkey 1852729 (CVE-2023-5728) Garbage collector ✓ Fixed weakRefMap is updated when a WeakRef target is cleared
9 SpiderMonkey 1853488 JIT Compiler Duplicate FoldConstants optimization reduces the conditional expression
10 SpiderMonkey 1863183 Runtime weakRefMap contains a dead wrapper
11 SpiderMonkey 1864246 Builtins Fixed Incorrect conditional unwrapping. Regression from bug 1841118
12 SpiderMonkey 1864257 Builtins Fixed Regression from bug 1848467
13 ChakraCore 6944 JIT compiler ✓ Assertion error in Backend/FlowGraph.cpp
14 ChakraCore 6945 JIT compiler ✓ Assertion error in Backend/GlobOptArrays.cpp
15 ChakraCore 6946 JIT compiler ✓ Assertion error in Backend/LinearScan.cpp
16 ChakraCore 6947 JIT compiler ✓ Assertion error in Backend/FlowGraph.cpp
17 ChakraCore 6948 Runtime Assertion error in Runtime/Language/ValueType.cpp
18 ChakraCore 6949 JIT compiler ✓ Assertion error in Backend/FlowGraph.cpp
19 ChakraCore 6951 JIT compiler ✓ Assertion error in Backend/BackwardPass.cpp
20 ChakraCore 6959 JIT compiler ✓ Assertion error in Backend/TempTracker.cpp
21 ChakraCore 6960 JIT compiler ✓ Confirmed Assertion error in Backend/BailOut.cpp
22 ChakraCore 6961 Runtime Assertion error in Library/ScriptFunction.h
23 ChakraCore 6962 Runtime Assertion error in Types/RecyclableObject.h
24 ChakraCore 6963 Runtime ✓ Confirmed Segmentation fault
25 ChakraCore 6964 Runtime ✓ Confirmed Assertion error in Runtime/Language/JavascriptConversion.cpp
26 ChakraCore 6965 Code generator Confirmed Segmentation fault
27 QuickJS 192 Bytecode emitter Fixed Null Pointer Dereference
28 QuickJS 198 Garbage collector ✓ Fixed Heap use after free
29 JerryScript 5097 ByteCode generator ✓ Null pointer dereference
30 JerryScript 5098 ByteCode generator Assertion error
31 JerryScript 5099 JIT compiler Assertion error
32 JerryScript 5100 JIT compiler Assertion error
33 JerryScript 5104 Frontend Segmentation fault
34 JerryScript 5105 Frontend Segmentation fault
35 JerryScript 5117 Frontend Fixed Segmentation fault
36 JerryScript 5118 Frontend Assertion error
37 JerryScript 5119 Frontend Fixed Assertion error

security vulnerabilities. The identification of many high-risk vul- mutate the AST of the seed, while Fuzzilli mutates the bytecode IR.
nerabilities, such as CVE-2019-8622, often originates from assertion These four competitors represent the two currently prevalent types
errors. Therefore, similar to Fuzzilli, we utilized the debug configu- of mutation targets.
ration when compiling the engine to capture assertion errors. In addition to the mentioned baselines, open-source JS engine
Additionally, for the triggering conditions of JIT compilation, we fuzzers also include CodeAlchemist [21] and jsfunfuzz [35]. How-
have employed the same processing method as Fuzzilli, specifically ever, these two fuzzers are both generation-based and differ from
by lowering the threshold of repeated executions needed to trigger the mutation target explored in this paper. Furthermore, in previous
JIT compilation. Typically, we configured the thresholds so that evaluations [29, 37], Montage demonstrated superior performance
approximately 100 executions trigger the compilation of a function. compared to CodeAlchemist and jsfunfuzz, while DIE outperformed
This threshold strikes a balance, allowing ample iterations for the CodeAlchemist. Thus, we selected these four advanced mutation-
engine to gather type information while speeding up the fuzzing. based fuzzers as comparison targets.
Initial seeds and Experiment platform. We selected test cases, As the paper concentrates on mutation targets and operators,
independent of external harnesses, from the regression test suites of the evaluation aims to scrutinize their impact on fuzzing. Superion,
the six engines to create the initial seed set. The combined seed set DIE, Montage and FuzzFlow are mutation-based fuzzers. In contrast,
is for all engines. These regression test suites are readily accessible Fuzzilli can create new test cases through two methods: generation
in the engine’s repository. In total, we gathered 1,280 test cases from and mutation. Initial versions of Fuzzilli lacked a JS-to-bytecode
regression test suites. When performing comparative experiments, compiler, using a generative engine to prepare the seed set. This
this seed set served as the initial seeds for baselines. We conducted allowed it to run without initial seeds. Currently, Fuzzilli includes
experiments for RQ1, RQ3, RQ4, and RQ5 on an Ubuntu 20.04 LTS a compiler module for JS-to-bytecode conversion. We provided
system with an Intel Xeon Gold 6238R (56 cores) and 128 GB RAM, Fuzzilli with the same seed set as other fuzzers while deactivating
using an RTX 3090 for neural network baselines. RQ2 was evaluated its generative engine for a fair evaluation.
on an Ubuntu 20.04 system with two AMD EPYC 9654 processors Evaluation design. We aim to answer the following research
(192 cores) and 512 GB RAM. questions through our experiments:
Baselines. We conducted a comparative analysis of FuzzFlow RQ1: Can FuzzFlow find new bugs in real-world JS engines?
against four state-of-the-art mutation-based JS engine fuzzers: Su- RQ2: How does FuzzFlow perform in terms of code coverage and
perion [47], DIE [37], Montage [29], and Fuzzilli [20]. FuzzFlow bug finding against state-of-the-art fuzzers?
and four baselines are all general-purpose JS engine fuzzers, not RQ3: Does FuzzFlow’s improvement in mutation granularity help
targeting specific components [10, 48]. Superion, DIE, and Montage trigger the vulnerable components of the engine?
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

RQ4: Does FuzzFlow generate correct JS code, both syntactically Table 2: Code Coverage.
and semantically?
RQ5: Does using FlowIR as the mutation target introduce signifi- Subject Metric FuzzFlow Superion DIE Fuzzilli FuzzFlowNH

cant runtime overhead? Average 20.53% 13.68% 14.27% 16.03% 19.30%


Improvement - 6.85% 6.26% 4.50% 1.23%
To measure and compare code coverage, bug finding, throughput, SM
𝐴ˆ12 - 0.99 0.99 0.99 0.99
and correctness, we conduct 10 rounds of experiments. We then 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01

test the statistical significance of FuzzFlow achieving better perfor- Average 19.32% 15.62% 13.88% 15.13% 18.41%
Improvement - 3.70% 5.44% 4.19% 0.91%
mance than baselines using Vargha Delaney 𝐴ˆ12 and Mann Whitney V8
𝐴ˆ12 - 0.99 0.99 0.99 0.99
U test (𝑈 ). 𝑈 tests whether a list of observations is stochastically 𝑝𝑈 - <0.01 <0.01 <0.01 0.02

greater than the other list, while 𝐴ˆ12 measures the magnitude of Average
Improvement
22.56%
-
18.38%
4.18%
18.81%
3.75%
19.03%
3.53%
21.51%
1.05%
the difference (effect size). Finally, we analyze the bug-triggering JSC
𝐴ˆ12 - 0.99 0.99 0.99 0.99
test cases generated by FuzzFlow through case studies to further 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 19.52% 16.77% 17.19% 18.44% 18.68%
discuss the effectiveness of our method. Improvement - 2.75% 2.33% 1.08% 0.84%
CH
𝐴ˆ12 - 0.95 0.92 0.92 0.90
- 0.01
4.1 RQ1: Identified Bugs 𝑝𝑈 <0.01 <0.01 <0.01
Average 67.84% 62.30% 63.55% 63.86% 65.68%
We first evaluate FuzzFlow’s bug-finding capability on real-world JERRY
Improvement - 5.54% 4.29% 3.98% 2.16%
𝐴ˆ12 - 0.99 0.99 0.99 0.99
engines and demonstrate the defects detected by FuzzFlow. To eval- 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
uate FuzzFlow’s ability to find unknown defects, we use FuzzFlow to Average 52.05% 40.52% 46.34% 45.13% 50.33%
conduct testing on six JS engines for 120 days. We allocate 15 cores QJS
Improvement - 11.53% 5.71% 6.92% 1.72%
𝐴ˆ12 - 0.99 0.99 0.99 0.99
to each engine. FuzzFlow has detected a total of 37 new defects, 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
including 2 in V8, 2 in JSC, 8 in SpiderMonkey, 14 in ChakraCore, Average(among subjects) - 5.80% 3.92% 4.62% 1.32%
9 in JerryScript, and 2 in QuickJS. Developers have confirmed 16
bugs, two bugs were duplicates. 12 have been fixed. All confirmed
bugs in SpiderMonkey and QuickJS have been fixed. Two bugs in
V8, JavaScriptCore, SpiderMonkey, and QuickJS engines during the
V8 were concurrently patched.
test period. This indicates that JS engines are adequately audited
Table 1 shows details of all detected bugs. Many of the identified
and tested. On the more vulnerable ChakraCore and JerryScript
defects originate from assertion errors. Some of these assertion er-
engines, FuzzFlow demonstrated the highest bug-finding capability.
rors can be classified as security-related based on the bug locations
For industrial engines like V8, SpiderMonkey, and JavaScriptCore,
and crash messages (e.g., Issue-198 in QuickJS). However, a com-
dynamic testing is already integrated into their development pro-
prehensive analysis is required for the remaining assertion errors
cesses. Due to extensive testing, comparing bug finding capabilities
to ascertain their exploitability, which is a task beyond the scope of
over weeks yields few bugs. Thus, our focus shifts to evaluating
this paper. We have submitted all bug information to the developer
fuzzing based on code coverage, exemplified by Fuzzilli [20].
teams for thorough examination.
Branch coverage reflects the percentage of the branches exer-
Note that FuzzFlow discovered bugs in various components of
cised by the test cases over the total number of branches. A higher
the JS engine. For instance, we have detected defects in the bytecode
branch coverage implies a more thorough examination of the tar-
emitter, debugger, and runtime of SpiderMonkey. Besides, FuzzFlow
get’s state space. For each tool and each test target, we evaluate the
also detects deep bugs in the JIT compilers. We have detected 11
reached branch coverage after 24 hours of fuzzing, and count the
bugs in the JIT compilers of 4 browser engines. Among them, 8
number of test cases executed to reflect the throughput.
out of 14 defects detected in the ChakraCore are located in the JIT
For FuzzFlow and Fuzzilli, we can directly obtain the number
compiler. This result demonstrates the advantages of FlowIR as a
of branches triggered by the engine through the API. For AFL-
mutation target in finding deep vulnerabilities.
based Superion and DIE, directly obtaining branch coverage is not
feasible. Superion records code coverage using 1 « 20 bytes of
4.2 RQ2: Bug-finding Ability and Exploring shared memory, recording coverage in one byte per instrumented
Code Coverage location. DIE uses one bit in shared memory to record the coverage
General-purpose fuzzers (e.g., AFL and its variants) are evaluated of each instrumentation location. The shared memory used by DIE
through ground-truth benchmarks like Magma or FuzzBench [34]. is 1 « 16 bytes. Following the method of prior work [48], we obtain
However, no such benchmark exists for JS engines, necessitating the coverage recorded in the shared memory as the branch coverage
validation on real software without ground truth. To evaluate the explored by Superion and DIE. Montage, being a black-box fuzzer
bug-finding capabilities of various fuzzers, we tested FuzzFlow and without coverage feedback, was omitted from this evaluation.
baselines for 360 hours each on V8, SpiderMonkey, JavaScriptCore, To evaluate the impact of high-level semantic nodes in FlowIR
ChakraCore, JerryScript, and QuickJS. Ten instances were run per on exploring the state space of JS engines, we conducted an ablation
fuzzer-target pair to reduce randomness. After deduplication based experiment on FuzzFlow. We selected high-level semantic nodes in-
on crashing stacks, FuzzFlow, Superion, DIE, Montage and Fuzzilli cluding TryNode, CatchNode and ForInNode, disabled support for
detected an average of 5.7, 3.5, 2.6, 2.8, and 4.9 defects in Chakra- these nodes in FuzzFlow, and labeled this version as FuzzFlowNH.
Core engine, and 2.7, 1.1, 1.9, 1.3, and 2.2 defects in the JerryScript Without these nodes, FuzzFlowNH cannot process the correspond-
engine, respectively. However, none of them detected defects in the ing AST nodes in the JS2Graph stage or generate test cases involving
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

these features. We then measured the code coverage achieved by 80%


FuzzFlowNH.
Table 2 presents the results of branch coverage. The row “Im- 60%
provements” indicates the percentage of coverage improvement of 40%
FuzzFlow compared with baselines. Note that the comparison be-
tween Fuzzilli and Superion differs from that presented by Samuel 20%
et al. [20]. This discrepancy arises because Fuzzilli doesn’t employ 0%
the initial seed set in their experimental setup. Our evaluation in- FuzzFlow Superion DIE Montage fuzzilli
dicates that the mutation target and operators utilized by Fuzzilli
outperform those of Superion. On all subjects, FuzzFlow achieves Figure 5: Proportion of tests that trigger JIT compilation
higher coverage than Superion, DIE, and Fuzzilli. FuzzFlow boosted
average coverage by 4.78%. JS engines have extensive codebases,
often exceeding tens of millions of lines. A 4.78% coverage en-
4.4 RQ4: Validity of Generated Input
hancement corresponds to an increase of several hundred thousand Generating syntactically correct and semantically valid test cases is
lines. These results suggest that the bug-finding abilities of different a prerequisite for deep code testing of the targets. A valid test case
fuzzers align closely with their coverage exploration capabilities. is free of syntax and semantic errors and does not trigger uncaught
On average, the branch coverage of the target engines triggered by exceptions during execution. The higher the validity ratio of test
FuzzFlowNH is 1.32% lower than that triggered by FuzzFlow. This cases generated by the fuzzer, the smaller the proportion discarded
indicates that FlowIR’s support for high-level JS features enhances in the early stages of the engine. Upon each test case execution,
path exploration capabilities. As FuzzFlow expands its support for the JS engine’s exit code, as well as the output in stdout and stderr,
these features, its ability to explore the target software’s state space indicates whether the test case has syntax or semantic errors. We
will also improve. run both FuzzFlow and baselines for 24 hours and record the ratio of
valid generated tests over all test cases. The experiment is repeated
4.3 RQ3: Effectiveness of Data-Flow Mutation for 10 rounds, and we used the results of statistical analysis to reduce
the impact of randomness. The validity of test cases generated by
The low coupling between control flow and data flow in FlowIR
Montage decreases with higher Top-k values. In our study, we set
introduces novel mutation opportunities. Altering the data flow of
the Top-k parameter to 64, as Montage performs optimally in defect
the seed does not have a direct impact on the control flow, and vice
detection under this setting. Additionally, this configuration is the
versa. One immediate benefit of mutating the data-flow subgraph
default in their code.
is the preservation of the seed’s JIT compilation trigger condition,
Table 3 reports the result. Overall, FuzzFlow achieves the highest
allowing mutants to continue exerting stress on the engine’s JIT
test case effectiveness on all six engines. To enhance the validity,
compiler. To demonstrate that effect, we evaluate the data-flow sub-
baseline fuzzers have dedicated considerable efforts beyond the
graph mutation’s efficacy in upholding the seed’s JIT trigger condi-
mutation operators. For instance, DIE incorporates type attributes
tion. We evaluate the fuzzers with a seed corpus that only contains
into the AST and mutates the typed AST, while Fuzzilli introduces a
the JS programs that trigger JIT compilation. We gather the mu-
type system to mitigate semantic errors. Montage resolves possible
tants and assess their ability to consistently trigger JIT compilation.
reference errors by renaming them with the declared identifiers. In
Meanwhile, we evaluate the code coverage of the JIT component
contrast, FuzzFlow does not impose overhead to enhance the gen-
with grcov1 .
erated test cases. It directly mutates FlowIR, achieving the desired
Figure 5 illustrates the proportion of JIT activation by newly
effect. Two primary reasons contribute to the result: Firstly, Fuz-
generated inputs. In comparison to Superion, DIE, Montage, and
zFlow does not alter the syntax structure of the seed. The Graph2JS
Fuzzilli, FuzzFlow generates 2.28×, 1.23×, 2.06×, and 1.20× the num-
module generates syntactically correct test cases, thereby resolving
ber of test cases capable of triggering JIT compilation. Moreover,
the challenge of syntactic correctness. Secondly, FuzzFlow performs
FuzzFlow achieved 3.32%, 5.84%, and 9.15% higher line coverage
mutation within the constraints of the node type, leveraging the
in the SpiderMonkey engine’s js/src/jit directory (over 330k
node’s subtype functions as a JS type system. The occurrence of
LoC) compared to the greybox baselines Fuzzilli, DIE, and Superion,
semantic errors is notably reduced.
respectively. Superion and Montage may select a subtree contain-
ing a control structure when it mutates the AST, which potentially
4.5 RQ5: Throughput of FuzzFlow
disrupts the control flow characteristics of the seed. In contrast,
DIE, when selecting AST nodes for mutation, strategically filters Throughput refers to the quantity of test cases that the fuzzer
certain control structures to focus on structural aspects. In the case mutates and executes within a given time unit. The program repre-
of FuzzFlow’s data flow mutation, the only scenario that might com- sentation to mutate stands out as a crucial determinant influenc-
promise existing JIT triggering conditions is altering the number ing throughput. Given that FlowIR represents a newly developed
of loops, potentially falling short of the JIT compilation threshold. mutation target, a key area of our focus is to assess whether it
In summary, FuzzFlow’s data-flow subgraph mutation offers finer introduces substantial transformation overhead in comparison to
mutation granularity, effectively preserving triggering conditions established methodologies. We conducted a throughput compari-
for engine-specific functions embedded in high-quality seeds. son between FuzzFlow and baselines. The results are presented in
Table 4, where throughput is measured by the number of test cases
1 https://github.com/mozilla/grcov executed within 24 hours. In summary, FuzzFlow exhibits a 4.19×,
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

Table 3: The semantic correctness of tests generated by Fuz- Table 4: The total number of tests executed during 24-hour
zFlow and baselines fuzzing by FuzzFlow and baselines

Subject Metric FuzzFlow Superion DIE Montage Fuzzilli Subject Metric FuzzFlow Superion DIE Montage Fuzzilli
Average 69.18% 39.43% 60.07% 34.38% 58.20% Average 656.40k 1,121k 143.58k 213.08k 337.17k
Improvement - 29.75% 9.11% 34.80% 10.98% SM
SM 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝐴ˆ12 - 0.99 0.99 0.99 0.99
Average 530.25k 1,970k 134.88k 229.00k 250.32k
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 V8
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 72.04% 38.96% 58.26% 35.54% 51.53%
Improvement - 33.08% 13.78% 36.50% 20.51% Average 422.91k 3,218.40k 294.02k 294.3k 85.16k
V8 JSC
𝐴ˆ12 - 0.99 0.99 0.99 0.99 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 Average 265.98k 1,688.76k 177.90k 204.25k -
CH
Average 70.23% 32.97% 61.83% 35.15% 50.93% 𝑝𝑈 - <0.01 <0.01 <0.01 -
Improvement - 37.26% 8.40% 35.08% 19.30% Average 3,641.91k 18,973.78k 1,257.79k 137.85k 1,575.27k
JSC JERRY
𝐴ˆ12 - 0.99 0.99 0.99 0.99 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 7,629.24k 19,056.02k 703.82k 219.40k 4,345.47k
Average 72.18% 35.52% 56.40% 35.09% - QJS
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Improvement - 36.66% 15.78% 37.09% -
CH
𝐴ˆ12 - 0.99 0.99 0.99 -
𝑝𝑈 - <0.01 <0.01 <0.01 -
Average 64.55% 36.52% 57.79% 38.90% 57.03% detection. Records indicate that the applied mutation operators
Improvement - 28.02% 6.76% 25.65% 7.52%
JERRY
𝐴ˆ12 - 0.99 0.99 0.99 0.99
encompass five data-flow subgraph mutations. Notably, the first
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 two parameters of the Object.defineProperty method call at line
Average 69.28% 37.41% 61.95% 38.56% 63.09% 4, the method name assign, and the parameters at line 6, are all
Improvement - 31.87% 7.33% 30.72% 6.19% derived from data flow mutation. By altering data-flow semantics
QJS
𝐴ˆ12 - 0.99 0.99 0.99 0.99
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
through the amalgamation of independent data-flow subgraphs
Average Improvement - 32.77% 10.19% 33.31% 12.90%
across the seed set, FuzzFlow successfully triggers this edge case.
1 var x = Object () ;
2 try {
11.55× and 2.62× improvement of throughput compared to AST- 3 var a = {};
based DIE, Montage, and bytecode-based Fuzzilli but falls short of 4 var b = Object . defineProperty ( Object , 1 , a );
5 } catch ( e ) {}
the throughput achieved by Superion. The elevated throughput of
6 var y = Object [ " assign " ]( x , Object ) ;
FuzzFlow indicates that the mutation on FlowIR does not introduce
significant additional performance overhead. This is a crucial factor Listing 3: Test case produced by FuzzFlow which triggers
for efficiently identifying defects within a specified time interval. issue-261949 in JavaScriptCore
FuzzFlow’s superior throughput is attributed to two key factors. Semantic meaningful mutation. CVE-2023-5728 denotes a vul-
Firstly, stored seeds in the queue remain in FlowIR format. Conse- nerability identified by FuzzFlow within Firefox’s SpiderMonkey.
quently, during the fuzzing process, each mutation only necessitates This bug specifically resides in the engine’s garbage collector, stem-
a one-way conversion from FlowIR to JS. Secondly, FuzzFlow is im- ming from an issue in the engine update process related to weakRefMap
plemented in C++. This choice of a system-level language provides when a WeakRef target is cleared. Consequently, the test case trig-
a notable efficiency advantage compared to DIE, which employs gers a crash upon a validity check detecting the presence of a dead
TypeScript for mutation. It is noteworthy that, in addition to observ- wrapper within the weakRefMap.
ing the throughput of a fuzzer, the quality of generated test cases FuzzFlow generates the test case in Listing 4 through effective se-
holds more significance. Test cases of inferior quality may fail to mantic mutations. The attribute names (representing data flows in
adequately cover the critical path of the engines. Despite FuzzFlow FlowIR), namely nukeAllCCWs, WeakRef, and transplantableOb-
exhibiting a lower throughput than Superion, it is important to ject, are sourced from three distinct seeds. It is noteworthy that the
highlight that the validity ratio of test cases produced by FuzzFlow initial seed set lacks a seed containing all the aforementioned data
and the exploration of code coverage surpass those achieved by flows. The control flows new newGlobal and bar.deref are both
Superion to a significant extent. extracted from SpiderMonkey’s regression test case tests_gc_
weakRefs.js. The bug has existed for over three years. FuzzFlow’s
4.6 Case Studies semantic mutation successfully merges the specified control flows
Mutation on data-flow subgraph. Issue-261949 represents a de- and data flows in the final test case.
fect identified by FuzzFlow in Safari’s JavaScriptCore. Listing 3
1 const g = newGlobal ({ newCompartment : true }) ;
illustrates a simplified test case that triggers the bug. The initial 2 const domObj = this . transplantableObject () . object ;
seed for this test case originates from Safari’s regression test case 3 const bar = new g . WeakRef ( domObj ) ;
regress-176485.js. The seed includes both try-catch and invo- 4 bar . deref () ;
cation of the Object.defineProperty method, encompassing two 5 this . nukeAllCCWs () ;
critical control-flow semantics. Listing 4: Test case produced by FuzzFlow which triggers
FuzzFlow preserves the existing control flow within the seed, CVE-2023-5728
with data-flow subgraph mutation playing a pivotal role in defect
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

5 Discussion Specifically, aiming to improve the testing of the JIT compiler, DIE
In this section, we discuss the limitations of our approach and the endeavors to preserve the structure and type semantic aspects of
possible future works. the seed AST throughout the mutation. These two aspect features,
Intra-procedural Analysis. FlowIR emerges from intra-procedural structure and type, serve as approximations of specific control-flow
analysis, where each procedure is represented by a distinct graph. structures and data flow features. Preserving these semantic fea-
The inherent limitation of intra-procedural analysis lies in its dis- tures proves to be beneficial for testing deep locations within the
regard for the global context of the program, concentrating solely engine. In contrast to DIE, this paper proposes the FlowIR, which
on dependencies within individual methods. This narrow focus directly represents control flow and data flow as a mutation target,
may lead to the oversight of global dependencies, including those facilitating preserving the semantic features more seamlessly.
between different procedures. Inter-procedural analysis [36], on Instead of mutating the AST, specific IRs have been proposed.
the other hand, proves adept at addressing this issue by considering Samuel et al. [20] propose to operate at bytecode level, which is
dependencies between procedures and offering a holistic view of closer to the engine’s internal representation of the code. They
the entire program. Nonetheless, the adoption of inter-procedural introduce Fuzzilli and employ a new IR called FuzzIL to stress the
analysis often comes with an associated increase in computational JIT compiler. PolyGlot [5] is a fuzzing framework that generates
resources. FuzzFlow is a first but substantial step in this direction. high-quality test cases for processors of different programming
Bug Oracle. In fuzzing, bug oracle determines whether a given language. To achieve the generic applicability, PolyGlot neutralizes
execution of the target program violates a specific security policy the difference in syntax and semantics of programming languages
[33]. The most frequently employed bug oracle involves monitoring with a uniform IR. The IR used in PolyGlot consists of a list of
whether the input causes an execution crash. Additionally, the JS statements. Each statement includes an order, a type, an operator,
engines detect defects through internal self-checks (i.e., assertions). no more than two operands, a value and a list of semantic prop-
After the engine detects an assertion error, it will deliberately ini- erties. In contrast to the bytecode IR used by Fuzzilli, our FlowIR
tiate a crash. However, there are still defects that may not lead to supports the direct mutation on the semantics of the seed, as the
a crash. For instance, incorrect optimization by the JIT compiler mutation target itself embodies the semantics. Compared with Poly-
might not result in memory corruption or assertion error. Improv- Glot, this paper recognizes the difference (e.g., type system, memory
ing the bug oracle can consequently enhance the effectiveness of management, concurrency) among programming languages, and
vulnerability detection. therefore focuses on the IR of JS. The idea of FlowIR-based mutation
A commonly utilized solution is differential testing. Differen- is extensible to other programming languages.
tial testing relies on another implementation of the exact same Existing fuzzers have targeted specific components like binding
functionality as a reference. It involves comparing the execution layers or JIT compilers. Favocado [10] focuses on fuzzing binding
outcomes of different JS engines or optimization levels, enabling layers of JS runtime systems. It aims to generate syntactically and
the detection of non-crash defects. Differential testing has demon- semantically correct test cases and reduce the size of the input space
strated significant efficacy in identifying JIT-related optimization for fuzzing. FuzzJIT [48] leverages differential testing as an bug
defects [3] and conformance bugs [52]. oracle to detect non-crashing JIT compiler bugs. To facilitate each
The mutation target and bug oracle are two orthogonal chal- test case in triggering the JIT compilation, FuzzJIT introduces an
lenges in fuzzing. The FlowIR-based mutation target outlined in input-wrapping template based on human knowledge. In contrast
this paper is adaptable and can be extended to support other testing to FuzzJIT, FuzzFlow is oriented towards leveraging the inherent
oracles. Our upcoming research direction entails the integration of characteristics of the seeds themselves. Section 4.1 verifies the vul-
differential testing into FuzzFlow. However, these aspects are not nerability detection efficacy of FuzzFlow on non-JIT components.
central to the present paper. Our key contribution is introducing a Mutation Operators. Mutation-based grey-box fuzzing offers dis-
graph-based IR that allows the representation of JS programs with tinctive advantages in detecting vulnerabilities. Combined with
efficient fuzzing mutators. evolutionary algorithms, mutation-based fuzzers explore the state
spaces of the target gradually. Established fuzzers like AFL and
Honggfuzz [17] have pre-defined a set of mutation operators. For
6 Related Work instance, AFL regards test cases as byte sequences and applies muta-
Fuzzing JS Engines. Existing JS engine fuzzers fall into two cat- tors such as byte insertion, modification, or deletion. The mutation
egories based on the construction of new test cases: generation- operator stands as the core component of the fuzzer [50].
based and mutation-based. Notable examples of generation-based Researchers have conducted studies on the scheduling of muta-
fuzzers include Jsfunfuzz [35] and CodeAlchemist [21]. Jsfunfuzz is tion operators. MOPT [32] introduces a scheduling scheme based
a black-box fuzzer developed by Mozilla, it creates new test cases on the particle swarm optimization algorithm. Building upon AFL,
using pre-defined grammars. CodeAlchemist learns the language MOPT introduces a comprehensive mutation operator scheduling
semantics from a corpus of JS seed files, it extracts code bricks, and algorithm designed to orchestrate operators predefined by AFL. The
subsequently reassembling them. findings indicate that, in comparison with AFL, which selects muta-
The Superion study [47] shows that the mutation operator based tion operators based on fixed probability, MOPT exhibits advantages
on AST, owing to its syntax awareness, is more effective in ex- in exploring code coverage and uncovering software defects.
ploring the JS engine compared to vanilla AFL. DIE [37] employs However, there are limited studies on mutation operator design.
AST as its mutation target and advocates for an enhancement in The key insight of this paper is that the representation of mutation
the utilization of high-quality seeds through aspect preservation.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.

targets influences the design of the operators significantly. There- closely aligned with machine instructions, foregoing some source
fore, we focus on the mutation target itself and subsequently design code-level semantics, rendering conversion back to JS unfeasible.
several effective mutation operators based on it. As described in Section 3.1, FlowIR differs from existing graph
Graph-based Program Representation. Representing the seman- IRs in two key aspects. First, FlowIR is unique as the first graph-
tics of source programs with graphs is a long-standing research based IR supporting bidirectional conversion with source code,
problem. FlowIR differs from existing graph IR paradigms such as achieved through a careful redesign of nodes and edges. High-level
PDG [14, 49] and CPG [51]. semantics in the source code can be expressed in FlowIR, which are
PDG and FlowIR represent different control relationships. PDG essential for triggering specific processing logic of target language
consists of a Control Dependency subGraph (CDG) and a Data processors. Second, we precisely define FlowIR’s functional scope
Dependency subGraph (DDG), with two types of edges: control to avoid unnecessary information for mutations. Selecting mutation
dependency edge and data dependency edge. The CDG in PDG, positions is a critical research topic in itself [30]. Each node and
which is designed to detect potential parallel optimization, is ob- edge can be a mutation point, so redundant elements complicate
tained through post-dominator analysis of the Control Flow Graph effective mutation. For example, IRs in PDG paradigm, designed for
(CFG). The control dependency edges indicate that the execution parallelism detection, includes control dependency edges. These
of a particular statement is conditionally dependent on another. In edges are crucial for its purpose but unnecessary for semantic mu-
contrast, the control flow edges in FlowIR indicate the direct flow tation. Thus, FlowIR excludes these edges and uses control flow
of control from one statement to another. These two types of edges graphs. Unlike IRs in CPG paradigm, which includes an AST and
have distinct purposes in program analysis: CFG edges represent for static vulnerability detection, FlowIR focuses on control flow
the possible paths of execution within a program, CDG edges repre- and data dependency graphs to convey program semantics directly.
sent the dependencies based on control conditions, specifically how The AST in CPG is redundant for semantic mutation, adding un-
the execution of certain parts of the code depends on the outcomes necessary complexity without benefit. We highlight the potential
of conditional statements. of graph IRs in fuzzing mutation and consider this work a first step,
Converting between PDG and source code is more challenging anticipating future advancements.
than with FlowIR. Firstly, constructing a JS PDG is more complex
than FlowIR, as it involves additional post-dominator analysis be- 7 Conclusion
yond the CFG. Secondly, converting a PDG back to JS is more In this paper, we introduce a new graph IR to implement effective
challenging than converting from FlowIR. While no current work mutation operators for fuzzing JS engines. One key contribution lies
addresses converting PDG back to JS, we believe it is feasible. How- in the proposal of new mutations for JS programs that are carried out
ever, the reconstruction of control flow from the exact CDG is more on the control and data flow directly. Instead of mutating the AST
difficult as noted in the PDG paper [14]. or bytecode-level IR, FlowIR is developed and mutations are defined
The CPG stands out as a popular choice for detecting vulner- on it. Our evaluation shows that FuzzFlow achieves 18.6% higher
abilities. A CPG integrates graph structures including AST, CFG validity of generated test cases and 4.78% higher code coverage.
and PDG for static analysis. The integration holds all information More importantly, FuzzFlow has found 37 new bugs in mainstream
relevant to security analysis but is less suitable for program transfor- JS engines.
mation. Firstly, subgraphs like AST and PDG contain overlapping
semantic content. While this redundancy is manageable in static 8 Acknowledgments
graphs, it complicates the design of mutation operators due to syn-
The authors would like to thank the anonymous referees for their
chronization issues in dynamic contexts. For instance, mutations
valuable comments and helpful suggestions. This project has re-
applied to CPG’s AST necessitate the re-establishment of the CFG
ceived funding from the National Natural Science Foundation of
and PDG to preserve semantic consistency. Similarly, mutations in
China (grant 62402509), the European Research Council (ERC) un-
the PDG necessitate updates to the AST and CFG graphs, raising
der the European Union’s Horizon 2020 research and innovation
similar synchronization concerns. Secondly, converting the mutated
program (grant agreement No. 850868), and SNSF PCEGP2_186974.
composite graph back to JS source code poses another challenge.
Any findings are those of the authors and do not necessarily reflect
Currently, there is no existing research to guide this process. Addi-
the views of our sponsors.
tionally, the conversion between the mutation target and JS must be
fast for effective fuzzing, but current approaches lack performance
optimization. For the reasons discussed, we chose not to develop
References
[1] Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig,
FuzzFlow based on Joern [26], despite Joern’s ability to construct Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep
PDG and CPG for JS using the GraalVM JS project. Joern stores Bugs with Grammars.. In NDSS.
graphs in a database for static analysis applications. To the best of [2] astexplorer. 2017. A web tool to explore the ASTs generated by various parsers.
https://astexplorer.net
our knowledge, there is no research example of converting these [3] Lukas Bernhard, Tobias Scharnowski, Moritz Schloegel, Tim Blazytko, and
graphs back into source code. Thorsten Holz. 2022. JIT-picking: Differential fuzzing of JavaScript engines.
In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communica-
There are other open-source implementations of graph IR. The tions Security. 351–364.
GraalVM IR [12] is a graph IR initially conceived for Java but has [4] Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang,
been expanded to include support for multiple languages. Both Yuyan Bao, and Tiark Rompf. 2023. Graph IRs for Impure Higher-Order Lan-
guages: Making Aggressive Optimizations Affordable with Precise Effect Depen-
GraalVM IR and TurboFan IR [16] are derived from bytecode and dencies. Proceedings of the ACM on Programming Languages 7, OOPSLA2 (2023),
subsequently optimized into machine instructions. They are more 400–430.
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA

[5] Yongheng Chen, Rui Zhong, Hong Hu, Hangfan Zhang, Yupeng Yang, Dinghao [31] Xiao Liu, Xiaoting Li, Rupesh Prajapati, and Dinghao Wu. 2019. DeepFuzz:
Wu, and Wenke Lee. 2021. One engine to fuzz’em all: Generic language processor Automatic Generation of Syntax Valid C Programs for Fuzz Testing. In Proceedings
testing with semantic validation. In 2021 IEEE Symposium on Security and Privacy of the... AAAI Conference on Artificial Intelligence.
(SP). IEEE, 642–658. [32] Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and
[6] Cliff Click and Michael Paleczny. 1995. A simple graph-based intermediate Raheem Beyah. 2019. { MOPT } : Optimized mutation scheduling for fuzzers. In
representation. ACM Sigplan Notices 30, 3 (1995), 35–49. 28th USENIX Security Symposium (USENIX Security 19). 1949–1966.
[7] Keith D Cooper and Linda Torczon. 2011. Engineering a compiler. Elsevier. [33] Valentin JM Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel
[8] Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Egele, Edward J Schwartz, and Maverick Woo. 2019. The art, science, and engi-
Zadeck. 1989. An efficient method of computing static single assignment form. neering of fuzzing: A survey. IEEE Transactions on Software Engineering 47, 11
In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of (2019), 2312–2331.
programming languages. 25–35. [34] Jonathan Metzman, László Szekeres, Laurent Simon, Read Sprabery, and Abhishek
[9] Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Arya. 2021. Fuzzbench: an open fuzzer benchmarking platform and service. In
Zadeck. 1991. Efficiently computing static single assignment form and the control Proceedings of the 29th ACM joint meeting on European software engineering
dependence graph. ACM Transactions on Programming Languages and Systems conference and symposium on the foundations of software engineering. 1393–1403.
(TOPLAS) 13, 4 (1991), 451–490. [35] Mozilla. 2007. A collection of fuzzers in a harness for testing the SpiderMonkey
[10] Sung Ta Dinh, Haehyun Cho, Kyle Martin, Adam Oest, Kyle Zeng, Alexandros JavaScript engine. https://github.com/MozillaSecurity/funfuzz
Kapravelos, Gail-Joon Ahn, Tiffany Bao, Ruoyu Wang, Adam Doupé, et al. 2021. [36] Flemming Nielson, Hanne R Nielson, and Chris Hankin. 2015. Principles of
Favocado: Fuzzing the Binding Code of JavaScript Engines Using Semantically program analysis. Springer.
Correct Test Cases.. In NDSS. [37] Soyeon Park, Wen Xu, Insu Yun, Daehee Jang, and Taesoo Kim. 2020. Fuzzing
[11] Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, JavaScript engines with aspect-preserving mutation. In 2020 IEEE Symposium on
Wil Robertson, Frederick Ulrich, and Ryan Whelan. 2016. Lava: Large-scale Security and Privacy (SP). IEEE, 1629–1642.
automated vulnerability addition. In 2016 IEEE symposium on security and privacy [38] Terence Parr. 1992. ANTLR. https://www.antlr.org
(SP). IEEE, 110–121. [39] projectzero. 2021. CVE-2021-37975: Chrome v8 garbage collector logic bug
[12] Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wim- causing live objects to be collected. https://googleprojectzero.github.io/0days-
mer, and Hanspeter Mössenböck. 2013. Graal IR: An extensible declarative inter- in-the-wild//0day-RCAs/2021/CVE-2021-37975.html
mediate representation. In Proceedings of the Asia-Pacific Programming Languages [40] ProjectZero. 2022. V8 0-day In-the-Wild 2021-2022. https://docs.google.com/
and Compilers Workshop. 1–9. spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/view
[13] Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug [41] saelo. 2018. Safari RCE, sandbox escape, and LPE to kernel for macOS. https:
Simon, and Hanspeter Mössenböck. 2013. An intermediate representation for //github.com/saelo/pwn2own2018
speculative optimizations in a dynamic compiler. In Proceedings of the 7th ACM [42] saelo. 2022. Attacking JavaScript Engines in 2022. https://saelo.github.io/
workshop on Virtual machines and intermediate languages. 1–10. presentations/offensivecon_22_attacking_javascript_engines.pdf
[14] Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. 1987. The program de- [43] James Stanier and Des Watson. 2013. Intermediate representations in imperative
pendence graph and its use in optimization. ACM Transactions on Programming compilers: A survey. ACM Computing Surveys (CSUR) 45, 3 (2013), 1–27.
Languages and Systems (TOPLAS) 9, 3 (1987), 319–349. [44] Spandan Veggalam, Sanjay Rawat, Istvan Haller, and Herbert Bos. 2016. Ifuzzer:
[15] Github. 2022. JavaScript stays as the 1st most used language. https://octoverse. An evolutionary interpreter fuzzer using genetic programming. In European
github.com/2022/top-programming-languages Symposium on Research in Computer Security. Springer, 581–601.
[16] Google. 2015. TurboFan is one of V8’s optimizing compilers. https://v8.dev/ [45] W3Techs. 2023. Usage statistics of JavaScript as client-side programming lan-
docs/turbofan guage on websites. https://w3techs.com/technologies/details/cp-javascript
[17] Google. 2016. Honggfuzz. https://github.com/google/honggfuzz [46] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven
[18] Google. 2017. V8 features an interpreter called Ignition. https://v8.dev/docs/ seed generation for fuzzing. In 2017 IEEE Symposium on Security and Privacy (SP).
ignition IEEE, 579–594.
[19] Rahul Gopinath, Philipp Görz, and Alex Groce. 2022. Mutation analysis: Answer- [47] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-
ing the fuzzing challenge. arXiv preprint arXiv:2201.11303 (2022). aware greybox fuzzing. In 2019 IEEE/ACM 41st International Conference on Soft-
[20] Samuel Groß, Simon Koch, Lukas Bernhard, Thorsten Holz, and Martin Johns. ware Engineering (ICSE). IEEE, 724–735.
2023. FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities. In Network [48] Junjie Wang, Zhiyi Zhang, Shuang Liu, Xiaoning Du, and Junjie Chen. 2023.
and Distributed Systems Security (NDSS) Symposium. FuzzJIT: Oracle-Enhanced Fuzzing for JavaScript Engine JIT Compiler. (2023).
[21] HyungSeok Han, DongHyeon Oh, and Sang Kil Cha. 2019. CodeAlchemist: [49] Daniel Weise, Roger F Crew, Michael Ernst, and Bjarne Steensgaard. 1994. Value
Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines.. dependence graphs: Representation without taxation. In Proceedings of the 21st
In NDSS. ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 297–
[22] Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A ground- 310.
truth fuzzing benchmark. Proceedings of the ACM on Measurement and Analysis [50] Mingyuan Wu, Ling Jiang, Jiahong Xiang, Yanwei Huang, Heming Cui, Lingming
of Computing Systems 4, 3 (2020), 1–29. Zhang, and Yuqun Zhang. 2022. One fuzzing strategy to rule them all. In Pro-
[23] Xiaoyu He, Xiaofei Xie, Yuekang Li, Jianwen Sun, Feng Li, Wei Zou, Yang Liu, Lei ceedings of the 44th International Conference on Software Engineering. 1634–1645.
Yu, Jianhua Zhou, Wenchang Shi, et al. 2021. SoFi: Reflection-Augmented Fuzzing [51] Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and
for JavaScript Engines. In Proceedings of the 2021 ACM SIGSAC Conference on discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium
Computer and Communications Security. 2229–2242. on Security and Privacy. IEEE, 590–604.
[24] Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code frag- [52] Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Xi-
ments. In Presented as part of the 21st { USENIX } Security Symposium ( { USENIX } aoyang Sun, Lizhong Bian, Haibo Wang, and Zheng Wang. 2021. Automated
Security 12). 445–458. conformance testing for javascript engines via deep compiler fuzzing. In Proceed-
[25] Sanghoon Jeon and Jaeyoung Choi. 2012. Reuse of JIT compiled code in JavaScript ings of the 42nd ACM SIGPLAN international conference on programming language
engine. In Proceedings of the 27th Annual ACM Symposium on Applied Computing. design and implementation. 435–450.
1840–1842. [53] Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and Xu Zhou.
[26] Joern. 2021. Honggfuzz. https://github.com/joernio/joern 2020. { EcoFuzz } : Adaptive { Energy-Saving } Greybox Fuzzing as a Variant of the
[27] Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for Adversarial { Multi-Armed } Bandit. In 29th USENIX Security Symposium (USENIX
lifelong program analysis & transformation. In International symposium on code Security 20). 2307–2324.
generation and optimization, 2004. CGO 2004. IEEE, 75–86. [54] Nicholas C Zakas. 2005. Professional JavaScript for Web Developers. John Wiley
[28] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, & Sons.
Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Olek- [55] Michal Zalewski. 2017. american fuzzy lop. http://lcamtuf.coredump.cx/afl
sandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific [56] G Zhang, P Wang, T Yue, X Kong, S Huang, X Zhou, and K Lu. 2022. Mob-
computation. In 2021 IEEE/ACM International Symposium on Code Generation and fuzz: Adaptive multi-objective optimization in gray-box fuzzing. In Network and
Optimization (CGO). IEEE, 2–14. Distributed Systems Security (NDSS) Symposium, Vol. 2022.
[29] Suyoung Lee, HyungSeok Han, Sang Kil Cha, and Sooel Son. 2020. Montage:
A Neural Network Language Model-Guided JavaScript Engine Fuzzer. In 29th
USENIX Security Symposium (USENIX Security 20). 2613–2630.
[30] Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy
for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE
International Conference on Automated Software Engineering. 475–485.

You might also like