Fuzzing JavaScript Engines With A Graph-Based IR
Fuzzing JavaScript Engines With A Graph-Based IR
backend components necessitates semantically meaningful muta- of the program through the interconnected relationships between
tions, going beyond mere syntactic changes. Achieving this requires nodes. FlowIR supports bidirectional conversion to and from JS.
thoughtfully altering the control flow or data flow of the program. During fuzzing, we maintain the seed queue in the FlowIR format.
C3-Mutation granularity. High-quality input corpora such as Based on FlowIR, we design a series of mutation operators. Mu-
known proof of concept (PoC) inputs or regression test cases are tations are performed on FlowIR, and subsequently, the mutated
deliberately designed to deal with vulnerable components like JIT representation is converted back to source code as input for the
compilers, which have specific control-flow conditions crucial for tested engines.
JIT compilation [37]. Effectively identifying vulnerabilities in JIT To address C1, FlowIR emphasizes representing semantics rather
compilers necessitates refining the granularity of mutation. This en- than syntactic structures. This strategy facilitates the enforcement
tails preserving the control-flow structure within the seed to retain of semantic constraints during mutation. To address C2, FlowIR
the trigger conditions, coupled with effective semantic mutations represents the control flow and data flow of the program directly.
at the data-flow level. Mutations operate directly on the semantics of the seed, simplifying
The fuzzer’s input representation format dictates possible mu- the implementation of meaningful semantic mutations. For instance,
tation operators. The representation defines the rules and mecha- explicit modeling of the data flow allows for the easy identification
nisms by which mutations are applied to existing seeds. Fuzzers for of unused data flows in the seed, enabling the avoidance of mutation
JS engines fall into three categories based on the representation of on invalid semantics. To address C3, this paper establishes a low
the JS program: coupling between control flow and data flow in FlowIR. This feature
enables the fuzzer to concentrate on mutations within either the
• Initially, fuzzers have mutated source code based on byte
control-flow or data-flow subgraph independently, mitigating the
or token sequences [31, 55]. As this mutation is unaware
need to simultaneously address their mutual influence and thereby
of the syntax, the resulting test cases are prone to syntax
minimizing the likelihood of introducing semantic errors. This
errors, thereby diminishing their validity. The majority of
enhances the operational granularity of mutations.
generated inputs fail at the early parsing stage, and cannot
To validate the efficacy of our approach, we implement FuzzFlow
proceed to more complex aspects of the implementations.
as a prototype fuzzer that utilizes FlowIR. The experimental results
• Current methods mutate the Abstract Syntax Tree (AST). In
demonstrate that FuzzFlow enhances the syntax correctness and
contrast to token sequences, mutating the AST facilitates
semantic validity of generated test cases significantly. Low coupling
the generation of syntactically correct test cases [37, 47].
between data flow and control flow enhances testing effectiveness
• Alternatively, mutation may happen at an intermediate lan-
for backend engine components. The validity of test cases gener-
guage [20]. Fuzzilli devises a bytecode-level Intermediate
ated by FuzzFlow reaches 72% (18.6% higher than the baselines),
Representation (IR) of the JS as the mutation target, aiming
showcasing a remarkable improvement in code coverage by 4.78%.
to produce high-quality inputs.
Moreover, mutations based on the FlowIR achieve high throughput,
However, both AST and bytecode IR, commonly used for mu- leading to efficient fuzzing. After applying our technique to six
tation, have limitations in exploring vulnerabilities in JS engine mainstream JS engines (V8 in Chrome, SpiderMonkey in FireFox,
backends. AST mutations lack semantic constraints, often yielding JavaScriptCore in Safari, ChakraCore, JerryScript, QuickJS), Fuz-
test cases with semantic errors that cannot reach the backend (C1). zFlow has identified a total of 37 new defects. Our prototype will
Additionally, mutations on the AST generate many test cases with be available at https://github.com/walkcreate/FuzzFlow. We make
altered syntax but unchanged semantics, failing to adequately test the following contributions:
complex interactions of the backend implementations (C2). Byte- • This paper proposes FlowIR, a graph-based program rep-
code IR lacks explicit control and data flow, making semantically resentation used for mutation that directly represents the
meaningful mutation implementation difficult (C2). For instance, control and data flow of JS.
identifying unused data flow in bytecode IR is challenging, leading • Based on FlowIR, this paper proposes mutation operators
to wasteful computing resource consumption due to a substantial with the following advantages: (1) facilitate the generation of
number of mutations on invalid data flows. Designing fine-grained semantically valid test cases, (2) enable semantically mean-
mutation operators for comprehensive data-flow-level mutation ingful mutation, (3) enhance mutation at finer granularity.
proves highly challenging, regardless of whether based on AST or • The experimental results demonstrate that FlowIR serves
bytecode IR (C3). Overall, the inherent characteristics of these repre- as a highly effective mutation target for fuzzing JS engines.
sentations constrain mutation operator design in fuzzing, rendering Based on FlowIR, we implemented a new fuzzer named Flow-
neither AST nor bytecode ideal choices for mutation. Fuzz, which has successfully identified 37 new defects in the
We address the aforementioned gap by designing a new repre- mainstream JS engines.
sentation to support effective mutation operators. The mutation
of a program requires code transformation, and one common sce- 2 Background
nario within this transformation is optimization [27]. Similarly, the
mutation operator is closely related to the IR where the mutation is 2.1 Challenges when Fuzzing JS Engines
carried out. We introduce a graph IR named FlowIR. Our proposed Mainstream JS engines are composed of a parser, bytecode compiler,
IR directly represents the JS program’s control flow and data flow interpreter, JIT compiler, and supporting components. To enhance
for mutation. Specifically, we explicitly model the control flow and the execution efficiency, mainstream JS engines adopt a mixed
data flow of JS programs with FlowIR, capturing the semantics compilation architecture. Specifically, they use a bytecode compiler
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
will change the semantics of the program. Region nodes encapsu- Algorithm 1: Translate JS to FlowIR
late control conditions, grouping nodes with identical conditions. Input : The seed JS code
FlowIR employs region nodes to represent control structures such Output : FlowIR of the seed
1 𝑎𝑠𝑡 ← Parse(JSCode)
as branches, loops, and exceptions. The distinguished entry node 2 𝑎𝑠𝑡 ← ScopeAnalysis(𝑎𝑠𝑡 )
Start representing the beginning of the program. 3 𝑎𝑠𝑡 ← ReferenceResolve(𝑎𝑠𝑡 )
Nodes in the CFG have successor edges indicating their possible 4 𝑎𝑠𝑡 ← LeftValueAnalysis(𝑎𝑠𝑡 )
5 𝑔𝑟𝑎𝑝ℎ ← CreateEmptyGraph ()
next nodes. Edges between control flow nodes denote direct transi-
6 function ProcessTree(node):
tions. Each node in the graph has at most two successors. Nodes 7 if node == VariableDeclaration then
with two successors are assumed to have true and false attributes 8 if node.initialization is not null then
9 flowNode ← ProcessTree (node.initialization)
associated with their outgoing edges. 10 Create a new variable proxy
Definition 3: The DDG demonstrates the flow of values from 11 Set the variable proxy to flowNode
12 AddNodeToGraph ()
definitions of a variable to its uses. The nodes in DDG consist of 13 return flowNode
operators and operands. 14 if node == BinaryOpExpression then
15 leftFlow ← ProcessTree (node.left)
For data flow representation, a node has input edges pointing to 16 rightFlow ← ProcessTree (node.right)
nodes providing its operands. The inputs to a node are inputs to the 17 Create BinaryOpNode
node’s operation. Operands encompass literals, variables, or expres- 18 Create edges between BinaryOpNode and its dependencies
19 AddNodeToGraph ()
sions. Each node defines a value based on its inputs and operation, 20 return BinaryOpNode
available on all output edges. All input edges signify scheduling 21 if node == IfStatement then
22 conditionFlow ← ProcessTree (node.condition)
dependencies. A node must be scheduled post its dependencies 23 Create BeginNodes and EndNodes for the two branches
when lifting FlowIR to JS. 24 Create IfNode
Control-flow structures form a backbone for data-flow nodes. 25 Create edges between IfNode and its dependencies
Data flow nodes are solely restricted by their data dependencies. 26 Set the current control flow as the BeginNode for true branch
27 ProcessTree (node.trueBranch)
Formally, let X and Y be nodes in a DDG. There is a data dependence
28 Set the current control flow as the BeginNode for false branch
from X to Y with respect to a variable v iff: 29 ProcessTree (node.falseBranch)
(1) there is a non-null path p from X to Y with no interventing 30 Create MergeNode
definition of v and either: 31 Merge the two branches control flow into MergeNode
32 AddNodeToGraph ()
(2) X contains a definition of v and Y a use of v; or X contains 33 Set the current control flow as the MergeNode
the use of v and Y a definition of v. 34 return IfNode
35 ...
Edges between CFG and DDG: FlowIR minimizes the coupling be- 36 return node
tween control flow and data flow. Interaction between the CFG and 37 ProcessTree (𝑎𝑠𝑡 )
DDG is restricted to three node types: behavioral CFNs, PhiNode
and IfNode. Behavioral CFNs often take data flow input. Meanwhile,
they may have control predecessors and successors. PhiNode is clas- for other language processors. However, the conceptual foundation
sified into data flow nodes, linked to control condition expressions of FlowIR can be extendable to other languages. Moreover, nodes
and resulting data-flow nodes from two branches. The ith data for fundamental language features are reusable.
input to PhiNode corresponds to the ith branch. Similarly, IfNode Figure 4 illustrates the FlowIR corresponding to a seed triggering
is classified into a control flow node, linked to both the data flow CVE-2021-21220 in V8. Notably, FlowIR directly represents the
node representing the conditional expression and the two control control flow and dependencies in data flow, achieving low coupling.
flow branches. We show the example edges between CFG and DDG Data flow analysis on FlowIR is straightforward. For example, in
in Figure 3. Figure 4c, node-1 is unused by any other, indicating an unused data
Predecessor CallTarget Predecessor Condition Merge Data[0] Data[1]
flow (parameter in function foo). Consequently, mutating this node
is ineffective for uncovering defects in the JS engine.
Invoke IfNode PhiNode
Mutations on FlowIR are highly efficient. For instance, altering
the edge from node-20 to node-19 in Figure 4b and redirecting it
Successor Begin Begin Usages to node-3 (red dashed line) allows mutation of the loop condition
from i < 0x100 to i < 2**31. Notably, the latter expression
Figure 3: Nodes connecting control flow and data flow corresponds to an AST subtree with a height of 2. Generating such
an AST subtree requires 2 steps. In contrast, leveraging the graph,
In our prototype, FlowIR incorporates support for diverse JS only modifying the destination node of an edge is needed.
language features, covering basics like variable operations, bina-
ry/ternary operations, if-else branches, loops, and functions. Addi- 3.2 Translating JS to FlowIR
tionally, FlowIR extends support to advanced features, including To leverage existing high-quality seeds, FuzzFlow supports the
JS object-oriented programming, and the JS exception-handling translation of JS to FlowIR. Notably, Lee et al. [29] underscore the
mechanism. Nodes for fundamental language features are language- effectiveness of regression test cases as fuzzing seeds. Development
independent, while those expressing JS-specific semantics are not. teams for mainstream JS engines continually augment their regres-
This implies that FlowIR cannot be directly used as a mutation target sion test sets with tests that expose historical engine defects, aiding
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.
0 Start
8
6
LoopBegin
NewInstance
0 Start
5 7 21 1 Param(a)
8
CallTarget LoopIf a0
1 arr = new Uint32Array ([2**31]) ; EndNode LoadNode
Uint32Array(arr0)
7
2 10 12 20 9 Add(+)
4 LoopExit Begin Mutation
3 function foo ( a ) { < Return
ArrayLiteral example 3
6
4 var x = 1; 1
Xor(^)
15 19 17 x0
5 x = ( arr [0] ^ 0) + 1; 3 22 Phi
Power(**) EndProgram Invoke 0x100
6 return x ; i1 5
LoadIndex 4
7 } 1 2 9
14 11 18
CallTarget 0 0
8 Add(+) 2
2 31 LoopEnd foo i0 Global
9 for ( let i =0; i <0 x100 ; ++ i ) 13 16 arr0
Independent data flow
10 { foo ( true ) ; } subgraph example true 1
(c) The FlowIR of function
(a) Test case (b) The FlowIR of the main script foo
Figure 4: Test case to trigger CVE-2021-21220 in V8 and the corresponding FlowIR. Blue nodes denote control flow nodes, while
orange nodes signify data flow nodes. Blue edges represent control flow edges, while orange edges represent data flow edges.
Black edges denote connections between control flow and data flow, as well as auxiliary connections.
subsequent development. By mutating these regression test cases, structure of the JS program. However, this mutation is restricted
the fuzzer gains an avenue to delve deeper into historical vulner- by instruction sequence, limiting the available data flow to that
ability patterns. To allow continuous testing and protect against generated by preceding instructions. Consequently, the mutation
regression bugs, we implement the JS2Graph module, designed to space is constrained, hindering full utilization of the seed’s overall
seamlessly convert JS programs into FlowIR. Consequently, Fuz- data flow. Secondly, input mutation of TAC instructions alters vari-
zFlow can navigate the input space surrounding existing test cases, able names as instruction parameters. As noted in the background,
thereby revealing potentially unforeseen neighboring bugs. variable names do not directly map to the data flow. Different vari-
The JS2Graph involves syntax analysis of the JS to acquire the able names may represent the same data flow, posing challenges
AST. Subsequently, a top-down semantic analysis is conducted on for effective data flow mutation. FlowIR effectively addresses the
the AST to establish scope, identify declared symbols, perform re- aforementioned challenges. Firstly, FlowIR’s DFNs are independent
solve resolution, and analyze left values. Finally, syntax-directed of control flow, constrained only by data dependencies. After chang-
translation is performed based on the results of the semantic anal- ing these dependencies through mutation, the Graph2JS module
ysis to convert the AST into FlowIR. Throughout this conversion, can appropriately relocate DFNs within control flow regions. Thus,
the control flow and data flow of the program are analyzed. leveraging FlowIR’s data-flow subgraph enables mutations to span
Algorithm 1 outlines the conversion process of JS2Graph. Given instructions and basic blocks, facilitating comprehensive analysis
the diverse syntax features of JS and space constraints on the page, and mutation of the entire seed’s data flow. Secondly, the input
the algorithm only presents a subset of JS2Graph functionality. for mutation is the DFN, not the variable name. DFNs in FlowIR
Considering that the basic building blocks of JS programs are dec- accurately model data dependencies within the seed, enabling more
larations (e.g., VariableDeclaration), expressions (e.g., BinaryOp, targeted mutations.
represented by the DFN), and statements (e.g., IfStatement, rep- Data-flow subgraph mutation operators can be further catego-
resented by the CFN), Algorithm 1 opts to showcase these three rized into two types. The first category involves node attribute
elements. The prototype implementation encompasses support for mutation. This fine-grained mutation alters the attributes (labels)
additional language features. of DFNs. For example, if an original binary operation node employs
an addition operator, mutating it into a subtraction modifies the
3.3 Mutation Operators on FlowIR data flow of the seed. Similarly, for a terminal node with an original
integer value of 1, changing its value to -1 alters the data flow of the
FuzzFlow encompasses two types of mutation operators: data-flow seed. The second category is input mutation of the node, which is
subgraph mutation and control-flow subgraph mutation. further divided into intra-procedural mutation and inter-procedural
Mutation on Data-flow Subgraph. The data-flow subgraph mu- mutation. In intra-process mutation, the data flow is mutated by
tation entails preserving the control structures of the seed while modifying the connection between data nodes and changing the
solely mutating the data flow. This mutation operator introduces a input of a specific data node within the method. Leveraging FlowIR,
finer granularity. For instance, it enables the retention of conditions this mutation demonstrates high execution efficiency. It involves
that trigger JIT compilation. If the seed induces JIT compilation, changing the starting point of an input edge of the data node to
the mutated test cases under this operator can also similarly trigger another one without having to copy any nodes. Inter-procedural
JIT compilation. mutation on data flow aims to splice the flow across methods. In-
Data flow mutation via FlowIR provides distinct advantages over dependent data-flow subgraphs are extracted from the seeds to
Fuzzilli’s FuzzIL, a sequential IR based on Three Address Code serve as the material for splicing.
(TAC). Firstly, TAC’s data flow is fixed within the control flow
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
Figure 4b shows an independent data-flow subgraph in the gray • If-Else Branch Swapping: This mutation involves swapping
rounded rectangle. An independent data-flow subgraph is defined the if-else branches, thereby changing the execution branch
as a subgraph extracted from the FlowIR. In this context, all nodes under specific branch conditions.
within the subgraph depend solely on the data flow inputs ex-
isting in that subgraph. The extraction of independent data-flow Control flow scheduling proves highly effective in altering the ex-
subgraphs aims to establish a comprehensive data flow 𝑃𝑜𝑜𝑙 for ecution of a seed. For instance, it can shift the control flow from
mutations. During initialization of a fuzzing run, after converting within a loop to outside the loop, or vice versa. Leveraging the
all seed JS programs to FlowIR, we analyze each FlowIR and extract graph data structure, CFN deletion requires only the modification
independent data-flow subgraphs into the 𝑃𝑜𝑜𝑙. Each node within of predecessor and successor edges between CFNs to achieve the
any subgraph in 𝑃𝑜𝑜𝑙 has the potential to be inserted into a seed desired changes. CFN insertion involves incorporating new nodes
as an input node for splicing, thus achieving data flow splicing or node groups into the existing flow. For instance, in a seed con-
between procedures. taining function declarations, one can introduce nodes for new
To extract independent data-flow subgraphs, the traversal ini- function calls with different parameters. Additionally, loops can
tiates from the leaf nodes, typically literals, within the data-flow be inserted into a seed, thereby augmenting the complexity of its
subgraph, ascending along their dependencies. Leaf nodes, without control structure. CFN data input mutation, coupling control flow,
data input, are unequivocally eligible for inclusion in the data-flow and data flow, specifically modifies the data input of CFNs. For
subgraph. Following this, the analysis advances along the data flow instance, altering the data input of a ReturnNode impacts the data
edge of each node, examining the higher-level nodes that utilize the flow that is returned. In most cases, when mutating the control-flow
current node as their data input. For each traversed DFN, whether subgraph, changes in the data-flow subgraph are often implicated,
it is included in the subgraph depends on all the DFNs it depends on indicating a larger mutation granularity.
already being part of the subgraph. If an input (designated as 𝐼 ) on Speeding Up Mutators. We conduct further optimizations on
which the DFN relies is absent from the extracted subgraph, there FlowIR-based mutations to enhance the fuzzer’s efficiency. Instead
is an attempt to recursively incorporate node 𝐼 into the subgraph. of copying the entire graph for a mutation, we conduct the mutation
When executing input mutations of data flow across graphs, an directly on the FlowIR of the seed, keeping a record of the performed
independent subgraph is randomly selected from 𝑃𝑜𝑜𝑙 as the input mutation operators. After mutation, the Graph2JS translation is
for the mutation node. applied to the mutated FlowIR to generate the JS program for testing.
Mutation on Control-flow Subgraph. In comparison to exist- When the engine finishes execution, we copy the mutated FlowIR to
ing mutation platforms, control flow mutation based on FlowIR generate a new seed only when the test case reveals new coverage.
proves advantageous in generating test cases that are both syntac- Irrespective of the interest of the mutated seed, after the execution
tically correct and semantically valid. When mutating the control finishes, a reverse operation is performed according to the recorded
flow in AST or bytecode IR, changes in the positioning of control mutation operators, restoring the seed to its pre-mutated state. This
flow elements can easily break the semantic integrity of the seed. design mitigates unnecessary copying of FlowIR, thereby improving
Conversely, FlowIR, by modeling dependencies between nodes, en- the overall efficiency of the fuzzer.
sures that moving a CFN does not impact other edges of the node,
except for the mutated control flow edge. The reason is that de-
pendencies related to unchanged edges are automatically fulfilled 3.4 Translating FlowIR to JS
in the Graph2JS module. This helps control flow mutation based After mutation, FuzzFlow converts FlowIR back into JS as fuzzing
on FlowIR without compromising semantic dependencies. Such a input. This conversion poses new challenges. Firstly, the IR must
mechanism contributes to the generation of test cases that are not preserve high-level semantic information to facilitate the conver-
only syntactically correct but also semantically valid. sion back to JS programs, distinguishing it from existing IRs. For
The mutation target of the control-flow subgraph may be a single example, in LLVM IR, the control flow of exception handling is
behavioral CFN, or a node group containing a control flow structure, simplified to ordinary branch and jump instructions, resulting in
such as a branch or a loop. Control flow mutation operators can be the loss of crucial high-level semantic information that makes it im-
further divided into four types: CFN scheduling, CFN deletion, CFN practical to restore the IR to source code. Secondly, the conversion
insertion and CFN data input mutation. CFN scheduling involves must ensure the accurate translation of both control and data flow,
alterations in the placement of CFNs, which can markedly modify maintaining consistency between FlowIR and the source code.
the execution of the seed and induce semantic changes. Specific To address the aforementioned challenges, FuzzFlow incorpo-
implementation methods for control flow scheduling include: rates targeted designs. Firstly, in FlowIR, high-level semantics are
preserved as nodes. By utilizing these nodes (e.g., TryNode, Store-
FieldNode), precise restoration of semantics is achieved. Secondly,
FuzzFlow conducts graph traversal along the control flow based
• Individual CFN Movement: This mutation relocates a behav- on the dependencies and completes code generation during this
ioral CFN (e.g., function call or object creation) to a random traversal process. Dependencies are satisfied during this process.
position in the control flow. When lifting the FlowIR, FuzzFlow structures the source code in
• CFN Group Relocation: This method entails moving a control regions. A region represents a nested structure, corresponding to
flow unit, such as conditional branches or loops, to a random a block scope of the program. Organizing the generated code in
location in the control flow. regions helps maintain the relationships between scopes.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.
Algorithm 2: Translate FlowIR to JS first performs syntax analysis on the input JS. We use ANTLR [38]
Input : FlowIR of the seed to implement the parsing component, and implement the rest of
Output : JS code of the seed JS2Graph ourselves. Fuzzer components including Graph2JS and
1 StartNode ← FlowIR
2 topRegion ← CreateRegion ()
Mutation are implemented entirely in C++.
3 function LiftGraphNode(node):
We rigorously tested the JS2Graph and Graph2JS modules, en-
4 if node == BinaryOpNode then suring the accuracy of the JS-to-Graph and Graph-to-JS conversion
5 region ← LoadCurrentRegion ()
6 proxies ← GetVariableProxyOnDataNode ()
processes with numerous unit tests. JS2Graph only operates during
7 if IsFirstVisit then the fuzzing startup phase to handle the initial seed set, encounter-
8 leftCode ← LiftGraphNode (node.left)
9 rightCode ← LiftGraphNode (node.right)
ing high-quality JS test cases, including historical PoCs. Therefore,
10 binaryCode ← leftCode + node.operator + rightCode the primary challenge lies in accurately translating JS language fea-
11 if len(proxies) > 0 then tures. Unit tests help confirm the accurate translation of supported
12 GenAssignmentStatement (region)
13 return proxies.name JS language features.
14 else Conversely, Graph2JS encounters issues with malformed FlowIR
15 return binaryCode
16 if node == IfNode then
due to mutation, resulting in some seeds being unable to convert
17 parentRegion ← SaveCurrentRegion () back to JS. After running FuzzFlow for 24 hours, we found that,
18 ifRegion ← CreateRegion ()
19 conditionCode ← LiftGraphNode (node.condition)
in the current version, 8% of mutated Graph instances cannot be
20 EmitExpression (ifRegion, conditionCode) lifted to JS. We can address this challenge by applying semantic con-
21 branchTrueRegion ← CreateRegion () straints to FuzzFlow’s mutation operators, which will help mitigate
22 Change current region to branchTrueRegion the issue. This optimization will be a focus in future development
23 LiftGraphNode (node.branchTrue)
iterations. It is worting noting that the possible failure of converting
24 branchFalseRegion ← CreateRegion ()
25 Change current region to branchFalseRegion mutated IR back to JS is not exclusive to FuzzFlow. When muta-
26 LiftGraphNode (node.branchFalse) tions are applied based on IRs like AST, fuzzers encounter similar
27 Add two sub-regions to ifRegion
issues, as mutations can disrupt the original well-formed structure,
28 Restore the parentRegion as current region
29 LiftGraphNode (node.next)
rendering conversion back to the source code impossible.
30 return
31 ...
32 return 4 Evaluation
33 LiftGraphNode (𝑔𝑟𝑎𝑝ℎ )
34 JSCode ← MergeRegionsCode (topRegion)
Evaluation Targets. We evaluated a total of six mainstream JS
engines. The targeted engines encompass Chrome V8, Firefox Spi-
derMonkey, Safari JavaScriptCore, and ChakraCore designed for
desktop browsers. Additionally, we tested JerryScript and QuickJS,
Algorithm 2 illustrates the lifting process of certain FlowIR nodes. which are often deployed on IoT devices. There are currently no
The prototype FuzzFlow encompasses support for the full list of benchmarks, such as LAVA-M [11] or Magma [22], specifically de-
FlowIR nodes. The Graph2JS initiates from the control flow start- signed for evaluating JS engine fuzzers. Meanwhile, the number
ing node and proceeds backward along the flow. When encoun- of mature industrial-grade JS engines is limited. Therefore, despite
tering CFNs that generate scopes, such as IfBlock, ForLoop, or ChakraCore not being utilized in the Edge browser, given its sta-
WhileLoop, a new region is created to accumulate the code that tus as an industrial-grade JS engine developed by Microsoft over
needs to be generated within the scope. When a CFN depends on several years, we consider it an interesting target.
any DFN, a depth-first traversal of the data flow is performed along These JS engines have undergone thorough code audits and
the dependencies. If variables are associated with the DFN, variable testing conducted by both the development team and security re-
assignments are generated. After traversing the current CFN and searchers. Any newly detected defects by FuzzFlow have been
the dependent DFNs, the traversal continues backward along the missed by earlier evaluations, affirming the efficacy of the method
control-flow dependencies to subsequent nodes. Once the traversal proposed in this paper.
of the entire graph is completed, the statement sequence within the Experimental Setup. Both V8 and SpiderMonkey already include
nested region is consolidated into a complete program and passed built-in support for Fuzzilli. To ensure compatibility, we adhere to
to the test module. the Fuzzilli interface. For the other four engines, we modified the
engine code slightly based on Fuzzilli’s patch method. This modifi-
3.5 Implementation cation enables communication between the fuzzer and the JS engine
To demonstrate the effectiveness of our proposed method, we imple- through pipelines for test case delivery and coverage feedback. To
ment a fuzzer (FuzzFlow) that leverages FlowIR. Given the pivotal prevent the fuzzer from becoming stuck on non-terminating pro-
role of mutation target and operators in fuzzer implementation, cesses, the timeout mechanism is commonly employed by fuzzers.
crafting FuzzFlow based on existing open-source fuzzers would The timeout interval represents the duration permitted by the fuzzer
require us to change over 90% of the code. Therefore, we implement for executing a single test case. In our experiments, we adopted the
FuzzFlow from scratch with C++. FuzzFlow is a coverage-guided default timeout of Fuzzilli, which is set to 250ms.
grey-box fuzzer that we implement in 19K lines of code (LoC). We To enhance bug detection, JS engine developers have incorpo-
use clang sanitizer coverage as feedback to guide the fuzzer to ex- rated numerous assertions into the engine source code for internal
plore the code coverage of the JS engine. The JS2Graph module checks. Assertion errors within JS engines may indicate significant
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
Table 1: New bugs found by FuzzFlow. The two bugs in V8 are also noticed and patched by developers before our reporting.
security vulnerabilities. The identification of many high-risk vul- mutate the AST of the seed, while Fuzzilli mutates the bytecode IR.
nerabilities, such as CVE-2019-8622, often originates from assertion These four competitors represent the two currently prevalent types
errors. Therefore, similar to Fuzzilli, we utilized the debug configu- of mutation targets.
ration when compiling the engine to capture assertion errors. In addition to the mentioned baselines, open-source JS engine
Additionally, for the triggering conditions of JIT compilation, we fuzzers also include CodeAlchemist [21] and jsfunfuzz [35]. How-
have employed the same processing method as Fuzzilli, specifically ever, these two fuzzers are both generation-based and differ from
by lowering the threshold of repeated executions needed to trigger the mutation target explored in this paper. Furthermore, in previous
JIT compilation. Typically, we configured the thresholds so that evaluations [29, 37], Montage demonstrated superior performance
approximately 100 executions trigger the compilation of a function. compared to CodeAlchemist and jsfunfuzz, while DIE outperformed
This threshold strikes a balance, allowing ample iterations for the CodeAlchemist. Thus, we selected these four advanced mutation-
engine to gather type information while speeding up the fuzzing. based fuzzers as comparison targets.
Initial seeds and Experiment platform. We selected test cases, As the paper concentrates on mutation targets and operators,
independent of external harnesses, from the regression test suites of the evaluation aims to scrutinize their impact on fuzzing. Superion,
the six engines to create the initial seed set. The combined seed set DIE, Montage and FuzzFlow are mutation-based fuzzers. In contrast,
is for all engines. These regression test suites are readily accessible Fuzzilli can create new test cases through two methods: generation
in the engine’s repository. In total, we gathered 1,280 test cases from and mutation. Initial versions of Fuzzilli lacked a JS-to-bytecode
regression test suites. When performing comparative experiments, compiler, using a generative engine to prepare the seed set. This
this seed set served as the initial seeds for baselines. We conducted allowed it to run without initial seeds. Currently, Fuzzilli includes
experiments for RQ1, RQ3, RQ4, and RQ5 on an Ubuntu 20.04 LTS a compiler module for JS-to-bytecode conversion. We provided
system with an Intel Xeon Gold 6238R (56 cores) and 128 GB RAM, Fuzzilli with the same seed set as other fuzzers while deactivating
using an RTX 3090 for neural network baselines. RQ2 was evaluated its generative engine for a fair evaluation.
on an Ubuntu 20.04 system with two AMD EPYC 9654 processors Evaluation design. We aim to answer the following research
(192 cores) and 512 GB RAM. questions through our experiments:
Baselines. We conducted a comparative analysis of FuzzFlow RQ1: Can FuzzFlow find new bugs in real-world JS engines?
against four state-of-the-art mutation-based JS engine fuzzers: Su- RQ2: How does FuzzFlow perform in terms of code coverage and
perion [47], DIE [37], Montage [29], and Fuzzilli [20]. FuzzFlow bug finding against state-of-the-art fuzzers?
and four baselines are all general-purpose JS engine fuzzers, not RQ3: Does FuzzFlow’s improvement in mutation granularity help
targeting specific components [10, 48]. Superion, DIE, and Montage trigger the vulnerable components of the engine?
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.
RQ4: Does FuzzFlow generate correct JS code, both syntactically Table 2: Code Coverage.
and semantically?
RQ5: Does using FlowIR as the mutation target introduce signifi- Subject Metric FuzzFlow Superion DIE Fuzzilli FuzzFlowNH
test the statistical significance of FuzzFlow achieving better perfor- Average 19.32% 15.62% 13.88% 15.13% 18.41%
Improvement - 3.70% 5.44% 4.19% 0.91%
mance than baselines using Vargha Delaney 𝐴ˆ12 and Mann Whitney V8
𝐴ˆ12 - 0.99 0.99 0.99 0.99
U test (𝑈 ). 𝑈 tests whether a list of observations is stochastically 𝑝𝑈 - <0.01 <0.01 <0.01 0.02
greater than the other list, while 𝐴ˆ12 measures the magnitude of Average
Improvement
22.56%
-
18.38%
4.18%
18.81%
3.75%
19.03%
3.53%
21.51%
1.05%
the difference (effect size). Finally, we analyze the bug-triggering JSC
𝐴ˆ12 - 0.99 0.99 0.99 0.99
test cases generated by FuzzFlow through case studies to further 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 19.52% 16.77% 17.19% 18.44% 18.68%
discuss the effectiveness of our method. Improvement - 2.75% 2.33% 1.08% 0.84%
CH
𝐴ˆ12 - 0.95 0.92 0.92 0.90
- 0.01
4.1 RQ1: Identified Bugs 𝑝𝑈 <0.01 <0.01 <0.01
Average 67.84% 62.30% 63.55% 63.86% 65.68%
We first evaluate FuzzFlow’s bug-finding capability on real-world JERRY
Improvement - 5.54% 4.29% 3.98% 2.16%
𝐴ˆ12 - 0.99 0.99 0.99 0.99
engines and demonstrate the defects detected by FuzzFlow. To eval- 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
uate FuzzFlow’s ability to find unknown defects, we use FuzzFlow to Average 52.05% 40.52% 46.34% 45.13% 50.33%
conduct testing on six JS engines for 120 days. We allocate 15 cores QJS
Improvement - 11.53% 5.71% 6.92% 1.72%
𝐴ˆ12 - 0.99 0.99 0.99 0.99
to each engine. FuzzFlow has detected a total of 37 new defects, 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
including 2 in V8, 2 in JSC, 8 in SpiderMonkey, 14 in ChakraCore, Average(among subjects) - 5.80% 3.92% 4.62% 1.32%
9 in JerryScript, and 2 in QuickJS. Developers have confirmed 16
bugs, two bugs were duplicates. 12 have been fixed. All confirmed
bugs in SpiderMonkey and QuickJS have been fixed. Two bugs in
V8, JavaScriptCore, SpiderMonkey, and QuickJS engines during the
V8 were concurrently patched.
test period. This indicates that JS engines are adequately audited
Table 1 shows details of all detected bugs. Many of the identified
and tested. On the more vulnerable ChakraCore and JerryScript
defects originate from assertion errors. Some of these assertion er-
engines, FuzzFlow demonstrated the highest bug-finding capability.
rors can be classified as security-related based on the bug locations
For industrial engines like V8, SpiderMonkey, and JavaScriptCore,
and crash messages (e.g., Issue-198 in QuickJS). However, a com-
dynamic testing is already integrated into their development pro-
prehensive analysis is required for the remaining assertion errors
cesses. Due to extensive testing, comparing bug finding capabilities
to ascertain their exploitability, which is a task beyond the scope of
over weeks yields few bugs. Thus, our focus shifts to evaluating
this paper. We have submitted all bug information to the developer
fuzzing based on code coverage, exemplified by Fuzzilli [20].
teams for thorough examination.
Branch coverage reflects the percentage of the branches exer-
Note that FuzzFlow discovered bugs in various components of
cised by the test cases over the total number of branches. A higher
the JS engine. For instance, we have detected defects in the bytecode
branch coverage implies a more thorough examination of the tar-
emitter, debugger, and runtime of SpiderMonkey. Besides, FuzzFlow
get’s state space. For each tool and each test target, we evaluate the
also detects deep bugs in the JIT compilers. We have detected 11
reached branch coverage after 24 hours of fuzzing, and count the
bugs in the JIT compilers of 4 browser engines. Among them, 8
number of test cases executed to reflect the throughput.
out of 14 defects detected in the ChakraCore are located in the JIT
For FuzzFlow and Fuzzilli, we can directly obtain the number
compiler. This result demonstrates the advantages of FlowIR as a
of branches triggered by the engine through the API. For AFL-
mutation target in finding deep vulnerabilities.
based Superion and DIE, directly obtaining branch coverage is not
feasible. Superion records code coverage using 1 « 20 bytes of
4.2 RQ2: Bug-finding Ability and Exploring shared memory, recording coverage in one byte per instrumented
Code Coverage location. DIE uses one bit in shared memory to record the coverage
General-purpose fuzzers (e.g., AFL and its variants) are evaluated of each instrumentation location. The shared memory used by DIE
through ground-truth benchmarks like Magma or FuzzBench [34]. is 1 « 16 bytes. Following the method of prior work [48], we obtain
However, no such benchmark exists for JS engines, necessitating the coverage recorded in the shared memory as the branch coverage
validation on real software without ground truth. To evaluate the explored by Superion and DIE. Montage, being a black-box fuzzer
bug-finding capabilities of various fuzzers, we tested FuzzFlow and without coverage feedback, was omitted from this evaluation.
baselines for 360 hours each on V8, SpiderMonkey, JavaScriptCore, To evaluate the impact of high-level semantic nodes in FlowIR
ChakraCore, JerryScript, and QuickJS. Ten instances were run per on exploring the state space of JS engines, we conducted an ablation
fuzzer-target pair to reduce randomness. After deduplication based experiment on FuzzFlow. We selected high-level semantic nodes in-
on crashing stacks, FuzzFlow, Superion, DIE, Montage and Fuzzilli cluding TryNode, CatchNode and ForInNode, disabled support for
detected an average of 5.7, 3.5, 2.6, 2.8, and 4.9 defects in Chakra- these nodes in FuzzFlow, and labeled this version as FuzzFlowNH.
Core engine, and 2.7, 1.1, 1.9, 1.3, and 2.2 defects in the JerryScript Without these nodes, FuzzFlowNH cannot process the correspond-
engine, respectively. However, none of them detected defects in the ing AST nodes in the JS2Graph stage or generate test cases involving
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
Table 3: The semantic correctness of tests generated by Fuz- Table 4: The total number of tests executed during 24-hour
zFlow and baselines fuzzing by FuzzFlow and baselines
Subject Metric FuzzFlow Superion DIE Montage Fuzzilli Subject Metric FuzzFlow Superion DIE Montage Fuzzilli
Average 69.18% 39.43% 60.07% 34.38% 58.20% Average 656.40k 1,121k 143.58k 213.08k 337.17k
Improvement - 29.75% 9.11% 34.80% 10.98% SM
SM 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝐴ˆ12 - 0.99 0.99 0.99 0.99
Average 530.25k 1,970k 134.88k 229.00k 250.32k
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 V8
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 72.04% 38.96% 58.26% 35.54% 51.53%
Improvement - 33.08% 13.78% 36.50% 20.51% Average 422.91k 3,218.40k 294.02k 294.3k 85.16k
V8 JSC
𝐴ˆ12 - 0.99 0.99 0.99 0.99 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 Average 265.98k 1,688.76k 177.90k 204.25k -
CH
Average 70.23% 32.97% 61.83% 35.15% 50.93% 𝑝𝑈 - <0.01 <0.01 <0.01 -
Improvement - 37.26% 8.40% 35.08% 19.30% Average 3,641.91k 18,973.78k 1,257.79k 137.85k 1,575.27k
JSC JERRY
𝐴ˆ12 - 0.99 0.99 0.99 0.99 𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Average 7,629.24k 19,056.02k 703.82k 219.40k 4,345.47k
Average 72.18% 35.52% 56.40% 35.09% - QJS
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
Improvement - 36.66% 15.78% 37.09% -
CH
𝐴ˆ12 - 0.99 0.99 0.99 -
𝑝𝑈 - <0.01 <0.01 <0.01 -
Average 64.55% 36.52% 57.79% 38.90% 57.03% detection. Records indicate that the applied mutation operators
Improvement - 28.02% 6.76% 25.65% 7.52%
JERRY
𝐴ˆ12 - 0.99 0.99 0.99 0.99
encompass five data-flow subgraph mutations. Notably, the first
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01 two parameters of the Object.defineProperty method call at line
Average 69.28% 37.41% 61.95% 38.56% 63.09% 4, the method name assign, and the parameters at line 6, are all
Improvement - 31.87% 7.33% 30.72% 6.19% derived from data flow mutation. By altering data-flow semantics
QJS
𝐴ˆ12 - 0.99 0.99 0.99 0.99
𝑝𝑈 - <0.01 <0.01 <0.01 <0.01
through the amalgamation of independent data-flow subgraphs
Average Improvement - 32.77% 10.19% 33.31% 12.90%
across the seed set, FuzzFlow successfully triggers this edge case.
1 var x = Object () ;
2 try {
11.55× and 2.62× improvement of throughput compared to AST- 3 var a = {};
based DIE, Montage, and bytecode-based Fuzzilli but falls short of 4 var b = Object . defineProperty ( Object , 1 , a );
5 } catch ( e ) {}
the throughput achieved by Superion. The elevated throughput of
6 var y = Object [ " assign " ]( x , Object ) ;
FuzzFlow indicates that the mutation on FlowIR does not introduce
significant additional performance overhead. This is a crucial factor Listing 3: Test case produced by FuzzFlow which triggers
for efficiently identifying defects within a specified time interval. issue-261949 in JavaScriptCore
FuzzFlow’s superior throughput is attributed to two key factors. Semantic meaningful mutation. CVE-2023-5728 denotes a vul-
Firstly, stored seeds in the queue remain in FlowIR format. Conse- nerability identified by FuzzFlow within Firefox’s SpiderMonkey.
quently, during the fuzzing process, each mutation only necessitates This bug specifically resides in the engine’s garbage collector, stem-
a one-way conversion from FlowIR to JS. Secondly, FuzzFlow is im- ming from an issue in the engine update process related to weakRefMap
plemented in C++. This choice of a system-level language provides when a WeakRef target is cleared. Consequently, the test case trig-
a notable efficiency advantage compared to DIE, which employs gers a crash upon a validity check detecting the presence of a dead
TypeScript for mutation. It is noteworthy that, in addition to observ- wrapper within the weakRefMap.
ing the throughput of a fuzzer, the quality of generated test cases FuzzFlow generates the test case in Listing 4 through effective se-
holds more significance. Test cases of inferior quality may fail to mantic mutations. The attribute names (representing data flows in
adequately cover the critical path of the engines. Despite FuzzFlow FlowIR), namely nukeAllCCWs, WeakRef, and transplantableOb-
exhibiting a lower throughput than Superion, it is important to ject, are sourced from three distinct seeds. It is noteworthy that the
highlight that the validity ratio of test cases produced by FuzzFlow initial seed set lacks a seed containing all the aforementioned data
and the exploration of code coverage surpass those achieved by flows. The control flows new newGlobal and bar.deref are both
Superion to a significant extent. extracted from SpiderMonkey’s regression test case tests_gc_
weakRefs.js. The bug has existed for over three years. FuzzFlow’s
4.6 Case Studies semantic mutation successfully merges the specified control flows
Mutation on data-flow subgraph. Issue-261949 represents a de- and data flows in the final test case.
fect identified by FuzzFlow in Safari’s JavaScriptCore. Listing 3
1 const g = newGlobal ({ newCompartment : true }) ;
illustrates a simplified test case that triggers the bug. The initial 2 const domObj = this . transplantableObject () . object ;
seed for this test case originates from Safari’s regression test case 3 const bar = new g . WeakRef ( domObj ) ;
regress-176485.js. The seed includes both try-catch and invo- 4 bar . deref () ;
cation of the Object.defineProperty method, encompassing two 5 this . nukeAllCCWs () ;
critical control-flow semantics. Listing 4: Test case produced by FuzzFlow which triggers
FuzzFlow preserves the existing control flow within the seed, CVE-2023-5728
with data-flow subgraph mutation playing a pivotal role in defect
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
5 Discussion Specifically, aiming to improve the testing of the JIT compiler, DIE
In this section, we discuss the limitations of our approach and the endeavors to preserve the structure and type semantic aspects of
possible future works. the seed AST throughout the mutation. These two aspect features,
Intra-procedural Analysis. FlowIR emerges from intra-procedural structure and type, serve as approximations of specific control-flow
analysis, where each procedure is represented by a distinct graph. structures and data flow features. Preserving these semantic fea-
The inherent limitation of intra-procedural analysis lies in its dis- tures proves to be beneficial for testing deep locations within the
regard for the global context of the program, concentrating solely engine. In contrast to DIE, this paper proposes the FlowIR, which
on dependencies within individual methods. This narrow focus directly represents control flow and data flow as a mutation target,
may lead to the oversight of global dependencies, including those facilitating preserving the semantic features more seamlessly.
between different procedures. Inter-procedural analysis [36], on Instead of mutating the AST, specific IRs have been proposed.
the other hand, proves adept at addressing this issue by considering Samuel et al. [20] propose to operate at bytecode level, which is
dependencies between procedures and offering a holistic view of closer to the engine’s internal representation of the code. They
the entire program. Nonetheless, the adoption of inter-procedural introduce Fuzzilli and employ a new IR called FuzzIL to stress the
analysis often comes with an associated increase in computational JIT compiler. PolyGlot [5] is a fuzzing framework that generates
resources. FuzzFlow is a first but substantial step in this direction. high-quality test cases for processors of different programming
Bug Oracle. In fuzzing, bug oracle determines whether a given language. To achieve the generic applicability, PolyGlot neutralizes
execution of the target program violates a specific security policy the difference in syntax and semantics of programming languages
[33]. The most frequently employed bug oracle involves monitoring with a uniform IR. The IR used in PolyGlot consists of a list of
whether the input causes an execution crash. Additionally, the JS statements. Each statement includes an order, a type, an operator,
engines detect defects through internal self-checks (i.e., assertions). no more than two operands, a value and a list of semantic prop-
After the engine detects an assertion error, it will deliberately ini- erties. In contrast to the bytecode IR used by Fuzzilli, our FlowIR
tiate a crash. However, there are still defects that may not lead to supports the direct mutation on the semantics of the seed, as the
a crash. For instance, incorrect optimization by the JIT compiler mutation target itself embodies the semantics. Compared with Poly-
might not result in memory corruption or assertion error. Improv- Glot, this paper recognizes the difference (e.g., type system, memory
ing the bug oracle can consequently enhance the effectiveness of management, concurrency) among programming languages, and
vulnerability detection. therefore focuses on the IR of JS. The idea of FlowIR-based mutation
A commonly utilized solution is differential testing. Differen- is extensible to other programming languages.
tial testing relies on another implementation of the exact same Existing fuzzers have targeted specific components like binding
functionality as a reference. It involves comparing the execution layers or JIT compilers. Favocado [10] focuses on fuzzing binding
outcomes of different JS engines or optimization levels, enabling layers of JS runtime systems. It aims to generate syntactically and
the detection of non-crash defects. Differential testing has demon- semantically correct test cases and reduce the size of the input space
strated significant efficacy in identifying JIT-related optimization for fuzzing. FuzzJIT [48] leverages differential testing as an bug
defects [3] and conformance bugs [52]. oracle to detect non-crashing JIT compiler bugs. To facilitate each
The mutation target and bug oracle are two orthogonal chal- test case in triggering the JIT compilation, FuzzJIT introduces an
lenges in fuzzing. The FlowIR-based mutation target outlined in input-wrapping template based on human knowledge. In contrast
this paper is adaptable and can be extended to support other testing to FuzzJIT, FuzzFlow is oriented towards leveraging the inherent
oracles. Our upcoming research direction entails the integration of characteristics of the seeds themselves. Section 4.1 verifies the vul-
differential testing into FuzzFlow. However, these aspects are not nerability detection efficacy of FuzzFlow on non-JIT components.
central to the present paper. Our key contribution is introducing a Mutation Operators. Mutation-based grey-box fuzzing offers dis-
graph-based IR that allows the representation of JS programs with tinctive advantages in detecting vulnerabilities. Combined with
efficient fuzzing mutators. evolutionary algorithms, mutation-based fuzzers explore the state
spaces of the target gradually. Established fuzzers like AFL and
Honggfuzz [17] have pre-defined a set of mutation operators. For
6 Related Work instance, AFL regards test cases as byte sequences and applies muta-
Fuzzing JS Engines. Existing JS engine fuzzers fall into two cat- tors such as byte insertion, modification, or deletion. The mutation
egories based on the construction of new test cases: generation- operator stands as the core component of the fuzzer [50].
based and mutation-based. Notable examples of generation-based Researchers have conducted studies on the scheduling of muta-
fuzzers include Jsfunfuzz [35] and CodeAlchemist [21]. Jsfunfuzz is tion operators. MOPT [32] introduces a scheduling scheme based
a black-box fuzzer developed by Mozilla, it creates new test cases on the particle swarm optimization algorithm. Building upon AFL,
using pre-defined grammars. CodeAlchemist learns the language MOPT introduces a comprehensive mutation operator scheduling
semantics from a corpus of JS seed files, it extracts code bricks, and algorithm designed to orchestrate operators predefined by AFL. The
subsequently reassembling them. findings indicate that, in comparison with AFL, which selects muta-
The Superion study [47] shows that the mutation operator based tion operators based on fixed probability, MOPT exhibits advantages
on AST, owing to its syntax awareness, is more effective in ex- in exploring code coverage and uncovering software defects.
ploring the JS engine compared to vanilla AFL. DIE [37] employs However, there are limited studies on mutation operator design.
AST as its mutation target and advocates for an enhancement in The key insight of this paper is that the representation of mutation
the utilization of high-quality seeds through aspect preservation.
CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA Haoran Xu et al.
targets influences the design of the operators significantly. There- closely aligned with machine instructions, foregoing some source
fore, we focus on the mutation target itself and subsequently design code-level semantics, rendering conversion back to JS unfeasible.
several effective mutation operators based on it. As described in Section 3.1, FlowIR differs from existing graph
Graph-based Program Representation. Representing the seman- IRs in two key aspects. First, FlowIR is unique as the first graph-
tics of source programs with graphs is a long-standing research based IR supporting bidirectional conversion with source code,
problem. FlowIR differs from existing graph IR paradigms such as achieved through a careful redesign of nodes and edges. High-level
PDG [14, 49] and CPG [51]. semantics in the source code can be expressed in FlowIR, which are
PDG and FlowIR represent different control relationships. PDG essential for triggering specific processing logic of target language
consists of a Control Dependency subGraph (CDG) and a Data processors. Second, we precisely define FlowIR’s functional scope
Dependency subGraph (DDG), with two types of edges: control to avoid unnecessary information for mutations. Selecting mutation
dependency edge and data dependency edge. The CDG in PDG, positions is a critical research topic in itself [30]. Each node and
which is designed to detect potential parallel optimization, is ob- edge can be a mutation point, so redundant elements complicate
tained through post-dominator analysis of the Control Flow Graph effective mutation. For example, IRs in PDG paradigm, designed for
(CFG). The control dependency edges indicate that the execution parallelism detection, includes control dependency edges. These
of a particular statement is conditionally dependent on another. In edges are crucial for its purpose but unnecessary for semantic mu-
contrast, the control flow edges in FlowIR indicate the direct flow tation. Thus, FlowIR excludes these edges and uses control flow
of control from one statement to another. These two types of edges graphs. Unlike IRs in CPG paradigm, which includes an AST and
have distinct purposes in program analysis: CFG edges represent for static vulnerability detection, FlowIR focuses on control flow
the possible paths of execution within a program, CDG edges repre- and data dependency graphs to convey program semantics directly.
sent the dependencies based on control conditions, specifically how The AST in CPG is redundant for semantic mutation, adding un-
the execution of certain parts of the code depends on the outcomes necessary complexity without benefit. We highlight the potential
of conditional statements. of graph IRs in fuzzing mutation and consider this work a first step,
Converting between PDG and source code is more challenging anticipating future advancements.
than with FlowIR. Firstly, constructing a JS PDG is more complex
than FlowIR, as it involves additional post-dominator analysis be- 7 Conclusion
yond the CFG. Secondly, converting a PDG back to JS is more In this paper, we introduce a new graph IR to implement effective
challenging than converting from FlowIR. While no current work mutation operators for fuzzing JS engines. One key contribution lies
addresses converting PDG back to JS, we believe it is feasible. How- in the proposal of new mutations for JS programs that are carried out
ever, the reconstruction of control flow from the exact CDG is more on the control and data flow directly. Instead of mutating the AST
difficult as noted in the PDG paper [14]. or bytecode-level IR, FlowIR is developed and mutations are defined
The CPG stands out as a popular choice for detecting vulner- on it. Our evaluation shows that FuzzFlow achieves 18.6% higher
abilities. A CPG integrates graph structures including AST, CFG validity of generated test cases and 4.78% higher code coverage.
and PDG for static analysis. The integration holds all information More importantly, FuzzFlow has found 37 new bugs in mainstream
relevant to security analysis but is less suitable for program transfor- JS engines.
mation. Firstly, subgraphs like AST and PDG contain overlapping
semantic content. While this redundancy is manageable in static 8 Acknowledgments
graphs, it complicates the design of mutation operators due to syn-
The authors would like to thank the anonymous referees for their
chronization issues in dynamic contexts. For instance, mutations
valuable comments and helpful suggestions. This project has re-
applied to CPG’s AST necessitate the re-establishment of the CFG
ceived funding from the National Natural Science Foundation of
and PDG to preserve semantic consistency. Similarly, mutations in
China (grant 62402509), the European Research Council (ERC) un-
the PDG necessitate updates to the AST and CFG graphs, raising
der the European Union’s Horizon 2020 research and innovation
similar synchronization concerns. Secondly, converting the mutated
program (grant agreement No. 850868), and SNSF PCEGP2_186974.
composite graph back to JS source code poses another challenge.
Any findings are those of the authors and do not necessarily reflect
Currently, there is no existing research to guide this process. Addi-
the views of our sponsors.
tionally, the conversion between the mutation target and JS must be
fast for effective fuzzing, but current approaches lack performance
optimization. For the reasons discussed, we chose not to develop
References
[1] Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig,
FuzzFlow based on Joern [26], despite Joern’s ability to construct Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep
PDG and CPG for JS using the GraalVM JS project. Joern stores Bugs with Grammars.. In NDSS.
graphs in a database for static analysis applications. To the best of [2] astexplorer. 2017. A web tool to explore the ASTs generated by various parsers.
https://astexplorer.net
our knowledge, there is no research example of converting these [3] Lukas Bernhard, Tobias Scharnowski, Moritz Schloegel, Tim Blazytko, and
graphs back into source code. Thorsten Holz. 2022. JIT-picking: Differential fuzzing of JavaScript engines.
In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communica-
There are other open-source implementations of graph IR. The tions Security. 351–364.
GraalVM IR [12] is a graph IR initially conceived for Java but has [4] Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang,
been expanded to include support for multiple languages. Both Yuyan Bao, and Tiark Rompf. 2023. Graph IRs for Impure Higher-Order Lan-
guages: Making Aggressive Optimizations Affordable with Precise Effect Depen-
GraalVM IR and TurboFan IR [16] are derived from bytecode and dencies. Proceedings of the ACM on Programming Languages 7, OOPSLA2 (2023),
subsequently optimized into machine instructions. They are more 400–430.
Fuzzing JavaScript Engines with a Graph-based IR CCS ’24, October 14–18, 2024, Salt Lake City, UT, USA
[5] Yongheng Chen, Rui Zhong, Hong Hu, Hangfan Zhang, Yupeng Yang, Dinghao [31] Xiao Liu, Xiaoting Li, Rupesh Prajapati, and Dinghao Wu. 2019. DeepFuzz:
Wu, and Wenke Lee. 2021. One engine to fuzz’em all: Generic language processor Automatic Generation of Syntax Valid C Programs for Fuzz Testing. In Proceedings
testing with semantic validation. In 2021 IEEE Symposium on Security and Privacy of the... AAAI Conference on Artificial Intelligence.
(SP). IEEE, 642–658. [32] Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and
[6] Cliff Click and Michael Paleczny. 1995. A simple graph-based intermediate Raheem Beyah. 2019. { MOPT } : Optimized mutation scheduling for fuzzers. In
representation. ACM Sigplan Notices 30, 3 (1995), 35–49. 28th USENIX Security Symposium (USENIX Security 19). 1949–1966.
[7] Keith D Cooper and Linda Torczon. 2011. Engineering a compiler. Elsevier. [33] Valentin JM Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel
[8] Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Egele, Edward J Schwartz, and Maverick Woo. 2019. The art, science, and engi-
Zadeck. 1989. An efficient method of computing static single assignment form. neering of fuzzing: A survey. IEEE Transactions on Software Engineering 47, 11
In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of (2019), 2312–2331.
programming languages. 25–35. [34] Jonathan Metzman, László Szekeres, Laurent Simon, Read Sprabery, and Abhishek
[9] Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Arya. 2021. Fuzzbench: an open fuzzer benchmarking platform and service. In
Zadeck. 1991. Efficiently computing static single assignment form and the control Proceedings of the 29th ACM joint meeting on European software engineering
dependence graph. ACM Transactions on Programming Languages and Systems conference and symposium on the foundations of software engineering. 1393–1403.
(TOPLAS) 13, 4 (1991), 451–490. [35] Mozilla. 2007. A collection of fuzzers in a harness for testing the SpiderMonkey
[10] Sung Ta Dinh, Haehyun Cho, Kyle Martin, Adam Oest, Kyle Zeng, Alexandros JavaScript engine. https://github.com/MozillaSecurity/funfuzz
Kapravelos, Gail-Joon Ahn, Tiffany Bao, Ruoyu Wang, Adam Doupé, et al. 2021. [36] Flemming Nielson, Hanne R Nielson, and Chris Hankin. 2015. Principles of
Favocado: Fuzzing the Binding Code of JavaScript Engines Using Semantically program analysis. Springer.
Correct Test Cases.. In NDSS. [37] Soyeon Park, Wen Xu, Insu Yun, Daehee Jang, and Taesoo Kim. 2020. Fuzzing
[11] Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, JavaScript engines with aspect-preserving mutation. In 2020 IEEE Symposium on
Wil Robertson, Frederick Ulrich, and Ryan Whelan. 2016. Lava: Large-scale Security and Privacy (SP). IEEE, 1629–1642.
automated vulnerability addition. In 2016 IEEE symposium on security and privacy [38] Terence Parr. 1992. ANTLR. https://www.antlr.org
(SP). IEEE, 110–121. [39] projectzero. 2021. CVE-2021-37975: Chrome v8 garbage collector logic bug
[12] Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wim- causing live objects to be collected. https://googleprojectzero.github.io/0days-
mer, and Hanspeter Mössenböck. 2013. Graal IR: An extensible declarative inter- in-the-wild//0day-RCAs/2021/CVE-2021-37975.html
mediate representation. In Proceedings of the Asia-Pacific Programming Languages [40] ProjectZero. 2022. V8 0-day In-the-Wild 2021-2022. https://docs.google.com/
and Compilers Workshop. 1–9. spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/view
[13] Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug [41] saelo. 2018. Safari RCE, sandbox escape, and LPE to kernel for macOS. https:
Simon, and Hanspeter Mössenböck. 2013. An intermediate representation for //github.com/saelo/pwn2own2018
speculative optimizations in a dynamic compiler. In Proceedings of the 7th ACM [42] saelo. 2022. Attacking JavaScript Engines in 2022. https://saelo.github.io/
workshop on Virtual machines and intermediate languages. 1–10. presentations/offensivecon_22_attacking_javascript_engines.pdf
[14] Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. 1987. The program de- [43] James Stanier and Des Watson. 2013. Intermediate representations in imperative
pendence graph and its use in optimization. ACM Transactions on Programming compilers: A survey. ACM Computing Surveys (CSUR) 45, 3 (2013), 1–27.
Languages and Systems (TOPLAS) 9, 3 (1987), 319–349. [44] Spandan Veggalam, Sanjay Rawat, Istvan Haller, and Herbert Bos. 2016. Ifuzzer:
[15] Github. 2022. JavaScript stays as the 1st most used language. https://octoverse. An evolutionary interpreter fuzzer using genetic programming. In European
github.com/2022/top-programming-languages Symposium on Research in Computer Security. Springer, 581–601.
[16] Google. 2015. TurboFan is one of V8’s optimizing compilers. https://v8.dev/ [45] W3Techs. 2023. Usage statistics of JavaScript as client-side programming lan-
docs/turbofan guage on websites. https://w3techs.com/technologies/details/cp-javascript
[17] Google. 2016. Honggfuzz. https://github.com/google/honggfuzz [46] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven
[18] Google. 2017. V8 features an interpreter called Ignition. https://v8.dev/docs/ seed generation for fuzzing. In 2017 IEEE Symposium on Security and Privacy (SP).
ignition IEEE, 579–594.
[19] Rahul Gopinath, Philipp Görz, and Alex Groce. 2022. Mutation analysis: Answer- [47] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-
ing the fuzzing challenge. arXiv preprint arXiv:2201.11303 (2022). aware greybox fuzzing. In 2019 IEEE/ACM 41st International Conference on Soft-
[20] Samuel Groß, Simon Koch, Lukas Bernhard, Thorsten Holz, and Martin Johns. ware Engineering (ICSE). IEEE, 724–735.
2023. FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities. In Network [48] Junjie Wang, Zhiyi Zhang, Shuang Liu, Xiaoning Du, and Junjie Chen. 2023.
and Distributed Systems Security (NDSS) Symposium. FuzzJIT: Oracle-Enhanced Fuzzing for JavaScript Engine JIT Compiler. (2023).
[21] HyungSeok Han, DongHyeon Oh, and Sang Kil Cha. 2019. CodeAlchemist: [49] Daniel Weise, Roger F Crew, Michael Ernst, and Bjarne Steensgaard. 1994. Value
Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines.. dependence graphs: Representation without taxation. In Proceedings of the 21st
In NDSS. ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 297–
[22] Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A ground- 310.
truth fuzzing benchmark. Proceedings of the ACM on Measurement and Analysis [50] Mingyuan Wu, Ling Jiang, Jiahong Xiang, Yanwei Huang, Heming Cui, Lingming
of Computing Systems 4, 3 (2020), 1–29. Zhang, and Yuqun Zhang. 2022. One fuzzing strategy to rule them all. In Pro-
[23] Xiaoyu He, Xiaofei Xie, Yuekang Li, Jianwen Sun, Feng Li, Wei Zou, Yang Liu, Lei ceedings of the 44th International Conference on Software Engineering. 1634–1645.
Yu, Jianhua Zhou, Wenchang Shi, et al. 2021. SoFi: Reflection-Augmented Fuzzing [51] Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and
for JavaScript Engines. In Proceedings of the 2021 ACM SIGSAC Conference on discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium
Computer and Communications Security. 2229–2242. on Security and Privacy. IEEE, 590–604.
[24] Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code frag- [52] Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Xi-
ments. In Presented as part of the 21st { USENIX } Security Symposium ( { USENIX } aoyang Sun, Lizhong Bian, Haibo Wang, and Zheng Wang. 2021. Automated
Security 12). 445–458. conformance testing for javascript engines via deep compiler fuzzing. In Proceed-
[25] Sanghoon Jeon and Jaeyoung Choi. 2012. Reuse of JIT compiled code in JavaScript ings of the 42nd ACM SIGPLAN international conference on programming language
engine. In Proceedings of the 27th Annual ACM Symposium on Applied Computing. design and implementation. 435–450.
1840–1842. [53] Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and Xu Zhou.
[26] Joern. 2021. Honggfuzz. https://github.com/joernio/joern 2020. { EcoFuzz } : Adaptive { Energy-Saving } Greybox Fuzzing as a Variant of the
[27] Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for Adversarial { Multi-Armed } Bandit. In 29th USENIX Security Symposium (USENIX
lifelong program analysis & transformation. In International symposium on code Security 20). 2307–2324.
generation and optimization, 2004. CGO 2004. IEEE, 75–86. [54] Nicholas C Zakas. 2005. Professional JavaScript for Web Developers. John Wiley
[28] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, & Sons.
Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Olek- [55] Michal Zalewski. 2017. american fuzzy lop. http://lcamtuf.coredump.cx/afl
sandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific [56] G Zhang, P Wang, T Yue, X Kong, S Huang, X Zhou, and K Lu. 2022. Mob-
computation. In 2021 IEEE/ACM International Symposium on Code Generation and fuzz: Adaptive multi-objective optimization in gray-box fuzzing. In Network and
Optimization (CGO). IEEE, 2–14. Distributed Systems Security (NDSS) Symposium, Vol. 2022.
[29] Suyoung Lee, HyungSeok Han, Sang Kil Cha, and Sooel Son. 2020. Montage:
A Neural Network Language Model-Guided JavaScript Engine Fuzzer. In 29th
USENIX Security Symposium (USENIX Security 20). 2613–2630.
[30] Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy
for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE
International Conference on Automated Software Engineering. 475–485.