2020MemLock - Memory Usage Guided Fuzzing
2020MemLock - Memory Usage Guided Fuzzing
ABSTRACT Engineering (ICSE ’20), May 23–29, 2020, Seoul, Republic of Korea. ACM, New
Uncontrolled memory consumption is a kind of critical software York, NY, USA, 13 pages. https://doi.org/10.1145/3377811.3380396
security weaknesses. It can also become a security-critical vulner-
ability when attackers can take control of the input to consume
1 INTRODUCTION
a large amount of memory and launch a Denial-of-Service attack.
However, detecting such vulnerability is challenging, as the state- Time and space complexities are two main concerns in software
of-the-art fuzzing techniques focus on the code coverage but not design and development. If they are not implemented well, unex-
memory consumption. To this end, we propose a memory usage pected behaviors and even troublesome security issues can happen.
guided fuzzing technique, named MemLock, to generate the exces- In real-world programs, lots of such security vulnerabilities have
sive memory consumption inputs and trigger uncontrolled memory been found (e.g., [17–23, 74]). For example, if the termination con-
consumption bugs. The fuzzing process is guided with memory ditions of recursive functions are not implemented correctly, an
consumption information so that our approach is general and does infinite number of recursive function calls can occur and thus ren-
not require any domain knowledge. We perform a thorough evalu- der the stack memory exhausted. The adversaries can exploit this
ation for MemLock on 14 widely-used real-world programs. Our vulnerability to launch a Denial-of-Service (DoS) attack with some
experiment results show that MemLock substantially outperforms well-crafted inputs [18, 21]. Recently, researchers have started to
the state-of-the-art fuzzing techniques, including AFL, AFLfast, pay attention to these issues. For example, SlowFuzz [58], Perf-
PerfFuzz, FairFuzz, Angora and QSYM, in discovering memory Fuzz [37] and ReScue [63] are developed to generate pathological
consumption bugs. During the experiments, we discovered many inputs to stress the time complexity issues (i.e., algorithmic com-
previously unknown memory consumption bugs and received 15 plexity vulnerabilities). However, it still leaves untouched for auto-
new CVEs. matically generating pathological inputs to stress space complexity
issues (namely memory consumption bugs) thus far.
CCS CONCEPTS Although a number of works (e.g., the popular fuzzing tech-
niques [11, 28, 45, 61, 84]) have devoted to detecting memory issues,
• Security and privacy → Software security engineering. they mostly focus on memory corruption vulnerabilities such as
buffer overflow and use-after-free. Memory corruption occurs in a
KEYWORDS program when the contents of the memory are modified due to some
Fuzz Testing, Software Vulnerability, Memory Consumption unexpected program behavior that exceeds the original intention
ACM Reference Format: of the program [65, 67, 72]. When the corrupted memory contents
Cheng Wen, Haijun Wang, Yuekang Li, Shengchao Qin, Yang Liu, Zhiwu are used later by the program, it may lead to unexpected behav-
Xu, Hongxu Chen, Xiaofei Xie, Geguang Pu, and Ting Liu. 2020. MemLock: iors (e.g., program crash). However, memory consumption bugs are
Memory Usage Guided Fuzzing. In 42nd International Conference on Software essentially different from memory corruption vulnerabilities. As de-
fined by CWE-400 [49], the software does not properly control the
∗ Corresponding authors: Shengchao Qin and Haijun Wang allocation and maintenance of a limited resource thereby enabling
an actor to influence the amount of resources consumed, eventually
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed leading to the exhaustion of available resources. To make it explicit,
for profit or commercial advantage and that copies bear this notice and the full citation this paper focuses on three types of memory consumption bugs:
on the first page. Copyrights for components of this work owned by others than ACM uncontrolled-recursion [52], uncontrolled-memory-allocation [51],
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a and memory leak [50]. Uncontrolled-recursion may exhaust stack
fee. Request permissions from [email protected]. memory when the program does not properly control the amount of
ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea recursion that takes place. Uncontrolled-memory-allocation refers
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7121-6/20/05. . . $15.00 to the situation whereby the program allocates memory based on an
https://doi.org/10.1145/3377811.3380396 untrusted size value, but it does not validate or incorrectly validates
1
ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Cheng Wen et al.
Figure 1: Code Snippet from cp-demangle.c in Binutils v2.31 Figure 2: Code Snippet from jp2image.cpp in Exiv2 v0.26
the size, allowing arbitrary amounts of memory to be consumed. MemLock then employs branch coverage as well as memory con-
Moreover, if the software does not track and release allocated mem- sumption information to guide the fuzzing process. The branch
ory after it has been used, it causes a memory leak. coverage information guides to explore different program paths,
Existing detection techniques for memory consumption bugs and the memory consumption information guides the program
usually use domain- or implementation-specific heuristics or rules path to consume more and more memory. If an input covers new
[15, 24, 46, 70, 79]. For example, Radmin [24] learns and executes branch compared to previous inputs, it is considered as interesting
multiple probabilistic finite automata, and then confines the re- and added into the seed queue. Besides, although an input has no
source usage of target programs to the learned automata and de- new branch coverage, if it leads to more memory consumption, we
tects resource usage anomalies at their early stages. Thus, their also retain it as an interesting input through a novel seed updat-
effectiveness heavily depends on the completeness of heuristics ing scheme. This input can be further mutated so that the newly
and rules. To create and maintain such rules requires substantial generated input leads to more memory consumption. After some
manual efforts and expertise. In this paper, we employ the grey- mutations, MemLock is expected to generate an input whereby the
box fuzzing [84] technique to develop an automated and general memory consumption exceeds the available memory.
technique to detect memory consumption bugs. We have evaluated MemLock’s effectiveness using a set of real-
Grey-box fuzzing is one of the most effective techniques to find world open source programs. The experiment results show that
vulnerabilities [39, 41], which typically adopts the coverage infor- MemLock substantially outperforms six state-of-the-art tools (i.e.,
mation as guidance to explore different program paths. However, AFL [84], AFLfast [8], PerfFuzz [37], FairFuzz [38], Angora [12] and
existing grey-box fuzzing techniques are not designed for detecting QSYM [83]), in discovering the memory consumption vulnerabil-
memory consumption bugs, because such bugs often depend not ities. MemLock finds 40.5% more unique crashes and 17.9% more
only on the program path but also on some interesting program vulnerabilities, than the second best counterpart. In particular, Mem-
states in that path (i.e., amount of memory consumption). For ex- Lock can discover a certain memory consumption vulnerability at
ample, the real-world program in Figure 2 allocates the memory at least 2.07 times faster than the other baseline fuzzers. Besides, the
Line 4, however, this memory allocation may fail if no additional generated test cases in MemLock usually lead to 150 times memory
memory can be allocated for use. To detect this bug, the grey-box consumption compared to the other state-of-the-art tools. In addi-
fuzzer needs to execute a program path that touches Line 4, as tion, we have responsibly disclosed several previously unknown
well as a large value for variable size to exceed the available heap memory consumption bugs, and received 15 new CVE1 for them,
memory. Existing coverage-based fuzzing techniques can easily demonstrating MemLock’s effectiveness in practice.
cover Line 4, but it may be difficult to produce test cases that have In summary, this paper makes the following contributions:
a large value for variable size. • We present MemLock, the first, to the best of our knowledge,
To address the aforementioned challenges, we present MemLock dedicated fuzzing technique to automatically discover memory
to enhance grey-box fuzzing to find memory consumption bugs. consumption bugs without requiring any domain knowledge.
MemLock works in two steps. Firstly, MemLock performs the static • We design a new dimension of guidance engine to deeply exploit
analysis, which identifies the statements and operations relevant the memory consumption in a program path, which is comple-
to memory consumption. We would qualitatively analyze the call mentary to the coverage guidance.
graph, which determines the stack memory usage, and quantita- • We have implemented and evaluated MemLock on various widely-
tively analyze memory usage operations, which determines the heap used real-world programs. The experimental results have shown
memory usage. Besides, we also analyze the control flow graph of that MemLock substantially outperforms five state-of-the-art
the program, which provides branch coverage for guiding to explore fuzzing techniques in discovering memory consumption bugs.
different program paths. With the memory consumption analyzed,
1 The Common Vulnerabilities and Exposures (CVE) system provides a reference for
tracking publicly known information-security vulnerabilities and exposures.
2
MemLock: Memory Usage Guided Fuzzing ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea
Static Analysis
• We have discovered 15 security-critical memory consumption Control Flow
Graph
vulnerabilities in widely-used real-world programs, and most of
Source Instrumented
these vulnerabilities have been patched by the developers. Code Static Analysis Call Graph Instrumentation
Program
Memory Usage
Operations
value ‘P’ for peek as it covers the different branch. When i 1 is further 3.1.1 Control Flow Graph. MemLock collects branch coverage
mutated, it generates i 2 , which may produce four consecutive ‘P’s information in the control flow graph (CFG) of the program to guide
for peek (i.e., “PPPP”) in its recursion. Since i 2 has different branch program path explorations as AFL [84]. It inserts instrumentation
hits in the sense of “loop bucket” from i 1 , it is added into the into every branch of the program CFG, assigning a pseudo-unique
seed pool. When i 2 is selected for mutation, it generates i 3 that ID to every branch. During program execution, the instrumentation
may produce five consecutive ‘P’s for peek (i.e., “PPPPP”) in its uses an 8-bit counter to keep track of the number of times that
recursion. The coverage guidance uses the concept of “loop bucket”, a branch has been executed. MemLock groups the hit counts of
and considers that i 3 does not offer new branch coverage compared each branch execution into several buckets to denote different
to i 1 and i 2 . In this case, existing coverage-based grey-box fuzzers magnitudes2 . Consequently, the branch coverage information in an
would discard i 3 , and thus miss the chance to generate an input that executed program path can be defined as follows.
can produce more consecutive ‘P’s. On the other hand, MemLock
Definition 3.1 (Trace Bits [84]). For an executed program path,
introduces memory consumption as the guidance, under which i 3 is
its trace bits are represented by an 8-bit array with size 2K , and the
considered to cause more memory consumption (than i 1 or i 2 ). Thus,
it retains i 3 as an interesting test case, and adds it into the seed pool. value of the IDth element is stored in an 8-bit counter (In AFL, K = 16).
It can further mutate i 3 , and generate inputs that may produce more The trace bits record the accumulated branches executed in a
consecutive ‘P’s. After some mutations, MemLock may generate an program path, and they can represent a program path roughly.
input that would produce a sufficiently large number of consecutive
‘P’s (i.e., “PPP. . . ”) to run out the stack memory. Definition 3.2 (Path-ID). For an executed program path, its
Example in Figure 2. For illustration, let us assume that the avail- path-ID is the hash value of its trace bits (see Definition 3.1).
able heap memory is 10000 bytes. Suppose the initial value of 3.1.2 Call Graph. In addition to branch coverage, MemLock also
subBox.length is 100, which is produced from user input at Lines collects the memory consumption information. One important con-
11-12. At Line 13 in Figure 2, the memory is allocated successfully, struct that may cause a large bulk of stack memory consumption is
and the program executes the true branch of the while statement the recursive function call. When a function call occurs, the pro-
at Line 11. Based on the coverage guidance, MemLock performs gram automatically allocates the stack memory for use (e.g., local
the mutation and can generate a new input i 1 that produces a variables). On the other hand, when a function call is finished (re-
larger value for subBox.length. In this case, we assume the value turned), the program automatically reclaims the allocated stack
is 150. The input i 1 still executes the true branch of the while memory for reuse. To monitor the stack memory consumption of
statement, and thus there is no new branch coverage. At this time, function calls, MemLock injects the instrumentation into both the
the coverage-based grey-box fuzzers would discard i 1 , therefore entry and the exit of the function call.
missing the chance to generate an input consuming more memory. We use ft to denote the length (i.e., consumption) of call stack
On the other hand, MemLock’s memory consumption guidance during the program execution. This value changes with the execu-
considers that i 1 consumes more memory (i.e., 150 > 100), and tion of the program. When the program execution enters a function,
keeps it as an interesting input. When i 1 is further mutated, Mem- the value ft is increased by one; likewise, when a function call is
Lock can generate an input (e.g., len = 250) that consumes more returned, the value ft is decreased by one. In the following, we use
memory. After some mutations, MemLock can generate an input fm to denote the peak value of ft during the program execution.
(e.g., len = 11000) that runs out of memory. The value fm thus qualitatively reflect the maximum (stack) mem-
Note that we have not elaborated memory leaks separately ory consumption by recursive function calls during the program
as MemLock deals with them in the same way as uncontrolled- execution. We do not differentiate the memory consumption caused
memory-allocation, using the same memory usage guidance during by different functions, because usually the stack memory can be ex-
fuzzing. hausted only under infinite recursive function calls. Thus, we only
need the peak length of call stack to guide MemLock to approach
infinite recursive function calls.
3.1.3 Memory Usage Operations. Memory usage operation state-
3 METHODOLOGY ments (e.g. malloc and free) may also contribute to the consumption
3.1 Static Analysis of a large bulk of memory. In a program path, the memory opera-
The static analysis in MemLock decides how to instrument the tar- tion statements may be affected by the program inputs. When this
get program. Based on the instrumentation, MemLock collects the happens, it is possible to guide this program path to consume more
guidance information, and then uses it to drive the fuzzing process. memory by controlling the program inputs. To this end, MemLock
After analyzing the control flow graph, MemLock instruments the uses instrumentation to quantitatively obtain the size of the mem-
target program to capture branch (edge) coverage, guiding program ory operation. Due to the lack of freed memory size in deallocation
path explorations. Additionally, based on the qualitative and quan- statements, MemLock maps them to their corresponding allocation
titative analysis of call graph and memory usage operations, it also statements to obtain the size of the freed memory.
instruments the target program to collect the memory consumption In particular, we insert instrumentation into the memory allo-
information, guiding the fuzzing process towards consuming more cation/deallocation functions in the standard libraries, and obtain
memory for each program path. To facilitate the description of our 2 In
AFL, the hit counts of each branch execution are divided into 8 buckets: 1 time, 2
methodology, we define the following concepts. times, 3 times, 4-7 times, 8-15 times, 16-31 times, 32-127 times, and 128-255 times [78].
4
MemLock: Memory Usage Guided Fuzzing ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea
Algorithm 1: Memory Usage Guided Fuzzing that MemLock additionally adopts memory consumption guidance
input :an instrumented program P, and set of initial seeds T to retain interesting inputs.
output : test cases S triggering memory consumption bugs The algorithm takes the instrumented program P (see Section 3.1)
and a set of initial seeds T as the inputs, and outputs a set of test
1 S ← Φ; cases S that trigger the memory consumption bugs. The variable
2 Queue ← T ; Queue represents the seed pool, and is initialized as the initial seeds
3 while time and resource budget do not expire do T at Line 2. MemLock first selects an input t from the seed pool
4 for each input t in Queue do Queue (Line 4), and computes its probability on whether or not to
5 if with probability FuzzProbt to select t then be mutated at Line 5 (see Section 3.2.1). Upon deciding to mutate
6 numChildren ← AssiдnEnerдy(t); the input t, MemLock assigns the energy (i.e., numChildren) to it at
7 for 0 ≤ i < numChildren do Line 6, which determines the number of children to produce from
8 childi ← Mutate(t); t. MemLock uses the same heuristics to determine numChildren
9 (traceBitsi , fmi , omi ) ← Run(childi , P); as AFL [84]. It produces more children for inputs that have wider
10 k = Hash(traceBitsi ); code coverage or that are discovered later in the fuzzing process. At
11 if it triggers memory consumption bugs then Lines 4-17, MemLock mutates the input t to generate numChildren
12 S ← S ∪ childi ; children, monitors their executions, and determines their affiliations.
MemLock first performs mutation to generate the new input childi
else
(Line 8). At Line 9, MemLock then runs the input childi on the
13
if N ewCov(traceBitsi ) then
instrumented program P, and collects its branch coverage (i.e.,
14
Queue ← Queue ∪ childi ;
traceBitsi ), function memory consumption (i.e., fm), and operation
15
Path 1 Path 2 Path 3 Path 4 seed with the generated input childi , we well exploit the advantage
of childi as it is better in terms of finding memory consumption
Original Seed Queue Seed 1 Seed 2 Seed 3
bugs. This seed updating policy ensures MemLock to gradually
improve/increase the overall memory consumption, and it could
New Path Seed 1 Seed 2 Seed 3 Seed 4 avoid getting stuck in local maxima like SlowFuzz [37], and brings
long-term stable improvements.
To tailor for our guidance mechanism, MemLock also optimizes
Larger Memory Seed 1 Seed 2 Seed 3 Seed 4
Consumption the seed selection probability (Line 5 in Algorithm 1) for the muta-
Seed 5 tion as follows.
Definition 3.7 (Favored Input). An input t is favored for muta-
Figure 4: Dynamic Seed Updating tion, if t has new branch coverage (i.e. NewCov) or t leads to maximum
memory consumption (i.e., N ewMax).
Definition 3.4 (Maximum Operation Memory). Given a path Definition 3.8 (Selection Probability). An input t is selected
k and a set I of inputs that all execute k, the maximum operation for mutation with the following probability:
memory consumption ommap[k] in k is the maximum peak value of 1 if t is favored
FuzzProbt =
memory consumption by memory usage operations, among all the a otherwise
inputs I :
ommap[k] ← max omi That is, the favored inputs are always selected, and a is the
i ∈I probability of selecting a non-favored input. In our experiments we
where omi denotes the peak value of memory consumed by memory use a = 0.01 like PerfFuzz [37].
usage operations during the execution of input i (see Section 3.1.3).
Definition 3.5 (NewCov). Given a set I of inputs and an input
4 EVALUATION
t, we say t hits a new coverage, if it either (1) executes a branch that We have built a prototype of MemLock. Our implementation adds
has not been touched by I ; or (2) hits a branch touched by I but with around 1.6k lines of C/C++ code to the file containing AFL’s core im-
a different bucket number. plementation. In particular, the static analysis and instrumentation
components are implemented based on the LLVM framework [36],
The function N ewCov (Line 14) will check whether a newly and the fuzzer engine is implemented based on the AFL-2.52b frame-
generated input childi hits a new coverage with respect the current work [84]. We have conducted thorough experiments to evaluate
Queue or not. That is, the function N ewCov considers the branch MemLock with a set of real-world programs. More detailed ex-
coverage and guides MemLock to explore different program paths. perimental results can be found on our website [48]. With these
Definition 3.6 (NewMax). Given a set I of inputs and an input t experiments, we aim to answer the following research questions:
that all execute k, we say t hits a new maximum memory consumption, RQ1. How capable is MemLock in memory consumption crash
if either fmt > fmmap[k] or omt > ommap[k]. detection?
The function N ewMax (Line 16) determines whether the input RQ2. How capable is MemLock in memory consumption real-
childi leads to the maximum memory consumption among the cur- world vulnerability detection?
rent seed set. It actually checks two kinds of memory consumption. RQ3. Do the strategies of MemLock help to trigger memory leaks
It first determines whether childi leads to the maximum function with more leakage?
memory consumption (see Definition 3.3). It also considers whether RQ4. Do the strategies of MemLock help to generate inputs with
childi leads to the maximum operation memory consumption (see more memory consumption?
Definition 3.4). If the input childi satisfies either of the above two
cases, MemLock update the seed queue with childi at Line 17 (see 4.1 Experiment Setup
Section 3.2.2).
Following the suggestions in [35], we conducted the experiments
3.2.2 Dynamic Seed Updating. In order to efficiently support re- carefully, to draw conclusions as objective as possible.
taining the most interesting input for each path, we propose a Baseline Fuzzers to Compare against. We compare MemLock
novel seed updating scheme. In MemLock, the seed queue is kept against six state-of-the-art fuzzers, namely AFL [84], AFLfast [8],
in a linked list, where each node represents a seed that explores PerfFuzz [37], FairFuzz [38], Angora [12] and QSYM [83]. The base-
a program path, as shown in Fig. 4. MemLock updates the seed line fuzzers are selected based on the following considerations. AFL
queue in the following two cases. (1) New Path. If the test input is the widely-used coverage-based greybox fuzzer, and selected
results in new branch coverage, then it will be added to the seed as baseline fuzzer in the most work. AFLfast is an advanced vari-
queue as a new node, as shown in the second row of Fig. 4. (2) ant of AFL, specially equipped with a better power schedule [8].
Larger Memory Consumption. If the input, e.g., seed2 in the third PerfFuzz [37] is to stress the time complexity issues in the pro-
row of Fig. 4, generates an input seed5, which does not result in gram, while MemLock seeks to detect space complexity issues.
new branch coverage, but it leads to larger memory consumption FairFuzz [38] leverages a targeted mutation strategy to execute
than the corresponding input. When seed2 and seed5 execute the towards rare branches. Further, Angora [12] utilizes taint analy-
same path, seed2 is replaced with seed5. With replacing the original sis to track information flow, and then uses gradient descent to
6
MemLock: Memory Usage Guided Fuzzing ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea
break through the hard branches. Lastly, QSYM [83] is a popular experiments, MemLock performs best in 10 (58.8%) groups of exper-
symbolic execution assisted fuzzer. Note that we haven’t selected iments among 7 different fuzzers, as shown in column MemLock. In
MemFuzz [16] as baseline fuzzer, because MemFuzz is not open total, MemLock finds 2009 unique memory consumption crashes in
source and it resorts to memory accesses (instead of memory con- the benchmark programs, improving by 59.2%, 70.5%, 76.9%, 98.1%,
sumption). In a word, we selected various kinds of representative 40.5% and 66.7% respectively, compared to state-of-the-art fuzzers
state-of-the-art fuzzers as baseline fuzzers, and they are widely AFL, AFLfast, PerfFuzz, FairFuzz, Angora and QSYM. Especially,
used to discover vulnerabilities in practice. MemLock is able to find unique crashes in all benchmark programs,
Evaluation Benchmarks. We select evaluation benchmarks con- while other 6 state-of-the-art fuzzers may find no crashes in some
sidering several factors, e.g., popularity, frequency of being tested, benchmark programs. For example, none of the other 6 state-of-
development activeness, and functional diversity. Finally, we use the-art fuzzers could find any unique crashes in the program flex,
14 widely-used real-world programs, which all contain memory but MemLock was able to find 61 unique crashes within 24 hours.
consumption bugs, to evaluate MemLock, including well-known To better compare different fuzzers, we also use the plots to de-
development tools (e.g., nm, cxxfilt, readelf ), code processing tools pict the performance over time in some benchmark programs, as
(e.g., nasm, flex, yaml-cpp, mjs), graphics processing libraries (e.g., shown in Figure 5. It shows that MemLock has a steady and strong
openjpeg, jasper, exiv2), video processing tools (e.g., bento4 and growth trend in finding unique crashes, and MemLock is also the
libming), and data processing libraries (e.g., libsass and yara), etc. first fuzzer that reported crashes.
These programs have also been widely tested by existing state-of- Following Klees’ recommendation [35], we also conduct the
the-art greybox fuzzers [28, 35, 38, 82]. statistic test for the results. The Â12 [68] statistic measures the
Performance Metrics. To compare against state-of-the-art fuzzers, probability that one fuzzer (in this case MemLock) outperforms
the most direct measurement is the capability to find the vulnera- another fuzzer. The value of Â12 means by what chance the result of
bilities. With this regard, we consider both unique bugs and unique MemLock is better than the competitor, as shown in columns with
crashes each fuzzer finds in the fuzzing process. Since MemLock is the heading Â12 . Further, we apply the Mann-Whitney U -test [2]
to stress the space complexity issues of programs, we also distill with a significance level of 0.05 to check the statistical significance
the memory consumption of each seed in the pool. differences of experimental results. A smaller statistical significance
Configuration Parameters. Since the fuzzers heavily rely on the difference (a.k.a p-value) indicates a more significant difference
random mutation, there could be performance jitter during fuzzing between MemLock and the competitor. In Table 1, we mark the
process. We took two actions to mitigate the randomness caused by corresponding Â12 values in bold for those with a p-value smaller
the nature of fuzzing techniques. First, we test each program for a than the significance level (0.05) (for simplicity, we do not include
longer time, until the fuzzer reaches a relatively stable state. We run p-values here but they are available at the companion website [48]).
each fuzzer for 24 hours. Second, we perform each experiment for Out of 102 Â12 values in the table, 72 (70.6%) Â12 values exceed the
5 times, and evaluate their statistical performance. Besides, we run conventionally large effect size (0.71) and are marked in bold. Thus,
all the fuzzers with the -d option to skip the deterministic mutation we can conclude that MemLock significantly outperforms other 6
stage, following the configuration of PerfFuzz [37]. state-of-the-art fuzzers in most benchmark programs.
Memory Consumption Bugs. The uncontrolled-recursion bug
usually causes stack-overflow, thus we can directly use Address- From the analysis of Table 1 and Figure 5, we can positively an-
Sanitizer [62] to detect it. The uncontrolled-memory-allocation bug swer RQ1 that MemLock significantly outperforms the start-
consumes a large amount of memory so that the program runs of-the-art fuzzers in terms of memory consumption crashes
out of the memory. Thus, we can detect it by setting the “alloca- detection.
tor_may_return_null” [29] flag of AddressSanitizer. In addition, we
use LeakSanitizer [60] to detect memory leakage.
Experiment Infrastructure. All our experiments have been per- 4.3 Real-world Vulnerability Evaluation (RQ2)
formed on machines with an Intel (R) Xeon (R) E5-1650 v3 Processor In this section, we compare the capability of MemLock to find real-
(3.40GHz) and 16GB of RAM under 64-bit Ubuntu LTS 16.04. world known vulnerabilities against baseline fuzzers, as suggested
by Klees [35].
Table 2 shows the statistic results in MemLock as well as other 6
different state-of-the-art fuzzers. The benchmark programs totally
4.2 Unique Crashes Evaluation (RQ1) contain 34 unique vulnerabilities, out of which MemLock performs
To evaluate the effectiveness of fuzzers, a direct measurement is best in the 25 vulnerabilities among other 6 state-of-the-art fuzzers,
the number of unique crashes found by different fuzzers. It is be- as shown in column MemLock. MemLock averagely takes about
lieved that more unique crashes usually indicate higher chances of 5.4 hours to find each unique vulnerability, which is 2.15, 2.15,
covering more unique vulnerabilities. 2.20, 2.69, 3.76, 2.07 times faster than the state-of-the-art fuzzers
Table 1 shows the number of unique crashes, which is caused by AFL, AFlfast, PerfFuzz, FairFuzz, Angora and QSYM respectively. In
memory consumption vulnerabilities, found by 7 different fuzzers particular, MemLock finds 33 out of 34 unique vulnerabilities within
within 24 hours in the benchmark programs. It is worth noting, we 24 hours, while other fuzzers AFL, AFLfast, PerfFuzz, FairFuzz,
identify unique crashes related to memory consumption bugs by Angora and QSYM only find 26, 28, 20, 17, 6 and 25, respectively.
reproducing the crashes and analyzing their crash stacks. And we The three unique vulnerabilities (i.e., issue#106, CVE-2018-18701
discuss other types of crashes in Section 4.6. Out of the 17 groups of and CVE-2019-6293) in mjs, nm and flex can be found only by
7
ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Cheng Wen et al.
10
15
20
0
10
15
20
10
15
20
10
15
20
time (hour) time (hour) time (hour) time (hour)
Figure 5: The growth trend of unique crashes found in different fuzzers; higher is better
MemLock within 24 hours. Therefore, it is proved that our memory- More interestingly, MemLock takes only 5.4 hours on average to
consumption guided strategy is very effective in finding memory discover this vulnerability, while other fuzzers all fail. We can also
consumption bugs. see the peak length of call stack of flex in Figure 6. AFL does not
In addition, we also conduct the statistic test for unique vulner- retain any seed over 5000 lengths, as those inputs do not increase
ability evaluation. Out of 204 Â12 values in the table, 139 (68.1%) coverage. Comparing to AFL, MemLock intentionally keeps seeds
Â12 values are bold and exceeding the conventionally large effect that increase the peak length of call stack, and finally triggering
size (0.71). Thus, MemLock significantly outperforms other 6 state- stack-overflow. This explains the reason why MemLock can find
of-the-art fuzzers in finding unique vulnerabilities. the vulnerability, while AFL can not detect it in all 5 runs.
Case Study. To demonstrate the reason behind MemLock’s superi- New Vulnerabilities MemLock Found. With MemLock, we have
ority, we present the case of CVE-2019-6293. It is an uncontrolled- discovered many previously unknown security-critical vulnera-
recursion vulnerability in flex, which is a lexical analyzer generator. bilities. These vulnerabilities were not previously reported. We
The lexical analyzer generated by flex has to provide “beginning” informed the maintainers, and Mitre assigned 15 CVEs. Among
state and “ending” states. The mark_beginning_as_normal func- these 15 CVEs, 8 CVEs are uncontrolled-recursion vulnerabilities,
tion mark each “beginning” state in a machine as being a “normal” 5 are vulnerabilities due to uncontrolled-memory-allocation issues,
state, and the “beginning” states are the epsilon closure of the first and 2 are about memory leak vulnerabilities. An attacker might
state. The mark_beginning_as_normal function would call to it- leverage these vulnerabilities to launch an attack, by providing well-
self if there is a state reachable from the first state through epsilon. conceived inputs that trigger excessive memory consumption. The
We investigate MemLock’s mutation history and identify a key mu- developers actively patched the vulnerabilities with our reports. At
tation step. The test case triggers the mark_beginning_as_normal the time of writing, 12 of these vulnerabilities have been patched.
function calling itself for multiple times, through havoc mutation Detailed information on our newly discovered vulnerabilities is
operation. Then, the recursive depth of this function is multiplied available on our website [48]. We are confident that MemLock is
by splice operation, and finally leading to stack-overflow. effective and viable in practice.
8
MemLock: Memory Usage Guided Fuzzing ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea
From the analysis of Table 2, the case study and new vul- from 234% to 3753163%, compared to other baseline fuzzers. This is
nerabilities MemLock found, we can positively answer RQ2 because MemLock tries to maximize each allocation and generates
that MemLock significantly outperforms the state-of-the-art inputs with high memory consumption. When the memory leak
fuzzers in terms of real-world memory consumption vulnera- happens, those memory-consuming inputs will often cause more-
bility detection. bytes memory leakage.
Figure 6: Seed distribution based on memory consumption. The larger the value on the right side is better.
a certain sample bias. Further studies on more real-world programs usage of functions. Duc-Hiep et al. [15] presents a worst-case mem-
can help better evaluate MemLock. Besides, MemLock also suffers ory consumption analysis, which uses symbolic execution to ex-
from the difficulty in breaking through hard comparisons (e.g., haustively unroll loops and compute memory consumption of each
magic bytes) as most work [7, 11, 28]. Adopting some program iteration. He et al. [31] and Chin et al. [14] employ static verification
analysis techniques (e.g., symbolic execution) might help mitigate to check a program’s memory usage is within the memory bounds,
this threat. while Chin et al. [13] uses static analysis to compute the mem-
ory usage bounds for assembly level programs. These approaches
rely on type theory or symbolic execution, thus they often suffer
5 RELATED WORK from the scalability issue. SMOKE [26] is a path-sensitive memory
Coverage-based Grey-box Fuzzing. Coverage-based grey-box leak detector for millions of lines of code. It first uses a scalable
fuzzing [3, 39, 41, 44, 47, 57, 66] is one of the most effective tech- but imprecise analysis to compute a set of candidate memory leak
niques to find vulnerabilities and bugs, and has attracted a great paths and then verifies the feasibility of the candidates using a more
deal of attention from both academic and industry. Coverage-based precise analysis. While SMOKE can demonstrate the existence of
grey-box fuzzers typically adopt the coverage information to guide memory leak, MemLock can generate an input that produces the
different program path explorations. For example, Google has built memory leak.
an OSS-FUZZ platform [61] by incorporating several state-of-the- Dynamic Analysis. Yuku et al. [46] proposes an improved real-
art coverage-based grey-box fuzzers: libFuzzer [45], honggfuzz [9], time scheduling algorithm to reduce maximal heap memory con-
AFL [84] and ClusterFuzz [30]. sumption by controlling multitask scheduling. Different from Mem-
Since a coverage guidance engine is a key component for the Lock, this technique aims at reducing memory consumption by
grey-box fuzzers, much effort has been devoted to improve their dynamic online scheduling while MemLock is to find memory con-
coverage. Steelix [40], Vuzzer [59] and REDQUEEN [3] use program- sumption bugs. BLEAK [69] is a system to debug memory leaks in
state analysis or taint analysis to penetrate some paths protected by web applications. It leverages the observation that users often re-
magic bytes comparisons. QSYM [83], Driller [64] and SAFL [76] peatedly return to the same visual state. Sustained growth between
equips grey-box fuzzing with a symbolic execution engine to reach round trips is a strong indicator of a memory leak. BLEAK is only
deeper program code. Angora [12] adopts a gradient descent tech- applicable to memory leak of web applications, while MemLock can
nique to solve path constraints so as to break some hard compar- find several kinds of memory consumption bugs. Radmin [24] is a
isons. MemFuzz [16] augmenting evolutionary fuzzing by addi- system for early detection of application-level resource exhaustion
tionally leveraging information about memory accesses (instead and starvation attacks. It first learns and executes multiple proba-
of memory consumption) performed by the target program. Pro- bilistic finite automata from its benign executions. It then restricts
Fuzzer [82], GRIMOIRE [6], Superion [75] and Zest [56] leverage the resource usage to the learned automata and detects resource
the knowledge in highly-structured files to generate syntactically usage anomalies. Radmin uses some heuristics to detect resource
and semantically valid test inputs, and thus be able to touch deeper usage anomalies, while MemLock employs the fuzzing technique to
program code. CollAFL [28] proposes a coverage sensitive fuzzing automatically generate the inputs for memory consumption bugs.
solution to mitigate the path collisions. FairFuzz [38] leverages
a targeted mutation strategy to execute towards rare branches.
UAFL [73] incorporates typestate properties and information flow 6 CONCLUSION
to their fuzzing engine to guide the detection of use-after-free In this paper, we propose MemLock, an enhanced grey-box fuzzing
vulnerabilities. Besides, AFLgo [7] and Hawkeye [11] use the dis- technique to find memory consumption bugs. MemLock employs
tance metrics to execute towards user-specified target sites in the both coverage and memory consumption information to guide the
program. The main difference between MemLock and these state- fuzzing process. The coverage information guides the exploration
of-the-art fuzzers is that, MemLock aims at memory consumption of different program paths, while the memory consumption infor-
bugs while the others are to find memory corruption vulnerabilities. mation guides the search for those program paths that exhibit more
Thus, MemLock is orthogonal to these state-of-the-art fuzzers. and more memory consumption. Our experimental results have
Recently, researchers have paid attention to the algorithmic com- shown that MemLock outperforms state-of-the-art fuzzing tech-
plexity vulnerabilities (i.e., time complexity issues) such as Slow- niques (i.e., AFL, AFLfast, PerfFuzz, FairFuzz, Angora and QSYM)
Fuzz [58], Singularity [77] and PerfFuzz [37]. They use the number in detecting memory consumption bugs. We also found 15 security-
of executed instructions as the guidance to explore the program critical vulnerabilities in some real-world programs. At the time of
path with a longer path length. In contrast with MemLock, they writing, 12 of these vulnerabilities have been patched.
stress the time complexity issues while MemLock considers space
complexity issues. The space complexity issues have its own unique
characteristics, as the amount of memory consumption can increase ACKNOWLEDGEMENTS
(e.g., function entry, memory allocation) and decrease (e.g., function This work was supported in part by the National Natural Sci-
exit, memory free), MemLock takes both of them into consideration. ence Foundation of China under Grants No. 61772347, 61836005,
Static Analysis. Static analysis is also used to analyze memory 61972260, 61772408, 61721002, Ant Financial Services Group through
consumption [1, 10, 13, 14, 31, 34, 70]. Wang et al. [70] presents a Ant Financial Research Program, Guangdong Basic and Applied
type-guided worst-case input generation by using automatic amor- Basic Research Foundation under Grant No. 2019A1515011577, Na-
tized resource analysis to derive symbolic bounds on the resource tional Key R&D Program of China under Grant No. 2018YFB0803501.
11
ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Cheng Wen et al.
Proceedings of the Network and Distributed System Security Symposium. [73] Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yang Liu, Shengchao Qin, Hongxu
[60] Alexey Samsonov and Kostya Serebryany. 2013. New features in addresssanitizer. Chen, and Yulei. Sui. 2020. Typestate-Guided Fuzzer for Discovering Use-after-
(2013). Free Vulnerabilities. In 2020 IEEE/ACM 42nd International Conference on Software
[61] Kostya Serebryany. 2017. OSS-Fuzz-Google's continuous fuzzing service for open Engineering. Seoul, South Korea.
source software. (2017). [74] Haijun Wang, Xiaofei Xie, Shang-Wei Lin, Yun Lin, Yuekang Li, Shengchao Qin,
[62] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy Yang Liu, and Ting Liu. 2019. Locating vulnerabilities in binaries via memory
Vyukov. 2012. AddressSanitizer: A fast address sanity checker. In Presented as layout recovering. In Proceedings of the 2019 27th ACM Joint Meeting on European
part of the 2012 USENIX Annual Technical Conference. 309–318. Software Engineering Conference and Symposium on the Foundations of Software
[63] Yuju Shen, Yanyan Jiang, Chang Xu, Ping Yu, Xiaoxing Ma, and Jian Lu. 2018. ReS- Engineering. 718–728.
cue: crafting regular expression DoS attacks. In Proceedings of the 33rd ACM/IEEE [75] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-
International Conference on Automated Software Engineering. ACM, 225–235. Aware Greybox Fuzzing. In Proceedings of the 41st International Conference on
[64] Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Software Engineering, ICSE, Gothenburg, Sweden.
Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. [76] Mingzhe Wang, Jie Liang, Yuanliang Chen, Yu Jiang, Xun Jiao, Han Liu, Xibin
2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution.. In Zhao, and Jiaguang Sun. 2018. SAFL: increasing and accelerating testing cov-
NDSS, Vol. 16. 1–16. erage with symbolic execution and guided fuzzing. In Proceedings of the 40th
[65] Laszlo Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. Sok: Eternal war International Conference on Software Engineering: Companion Proceeedings. ACM,
in memory. In Security and Privacy, 2013 IEEE Symposium on. IEEE, 48–62. 61–64.
[66] Ari Takanen, Jared D Demott, Charles Miller, and Atte Kettunen. 2018. Fuzzing [77] Jiayi Wei, Jia Chen, Yu Feng, Kostas Ferles, and Isil Dillig. 2018. Singularity:
for software security testing and quality assurance. Artech House. Pattern fuzzing for worst case complexity. In Proceedings of the 2018 26th ACM
[67] Victor Van der Veen, Lorenzo Cavallaro, Herbert Bos, et al. 2012. Memory errors: Joint Meeting on European Software Engineering Conference and Symposium on
The past, the present, and the future. In International Workshop on Recent Advances the Foundations of Software Engineering. ACM, 213–223.
in Intrusion Detection. Springer, 86–106. [78] Technical whitepaper for afl fuzz. 2019. american fuzzy lop. http://lcamtuf.
[68] András Vargha and Harold D Delaney. 2000. A critique and improvement of coredump.cx/afl/technical_details.txt. accessed: 2019-08-01.
the CL common language effect size statistics of McGraw and Wong. Journal of [79] Zhiwu Xu, Cheng Wen, and Shengchao Qin. 2018. State-taint analysis for detect-
Educational and Behavioral Statistics 25, 2 (2000), 101–132. ing resource bugs. Science of Computer Programming 162 (2018), 93–109.
[69] John Vilk and Emery D Berger. 2018. BLeak: automatically debugging memory [80] yaml cpp. 2019. A YAML parser and emitter in C++. https://github.com/jbeder/
leaks in web applications. In Proceedings of the 39th ACM SIGPLAN Conference on yaml-cpp. accessed: 2019-08-01.
Programming Language Design and Implementation. ACM, 15–29. [81] Yara. 2019. The pattern matching swiss knife for malware researchers. http:
[70] Di Wang and Jan Hoffmann. 2019. Type-Guided Worst-Case Input Generation. //virustotal.github.io/yara/. accessed: 2019-08-01.
Proceedings of the ACM on Programming Languages (2019). [82] Wei You, Xueqiang Wang, Shiqing Ma, Jianjun Huang, Xiangyu Zhang, XiaoFeng
[71] Haijun Wang, Yun Lin, Zijiang Yang, Jun Sun, Yang Liu, Jin Song Dong, Qinghua Wang, and Bin Liang. 2019. Profuzzer: On-the-fly input type probing for better
Zheng, and Ting Liu. 2019. Explaining Regressions via Alignment Slicing and zero-day vulnerability discovery. In Security and Privacy, 2019 IEEE Symposium
Mending. IEEE Transactions on Software Engineering (2019), 1–1. on. IEEE.
[72] Haijun Wang, Ting Liu, Xiaohong Guan, Chao Shen, Qinghua Zheng, and Zijiang [83] Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A
Yang. 2016. Dependence guided symbolic execution. IEEE Transactions on Software Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In 27th USENIX
Engineering 43, 3 (2016), 252–271. Security Symposium. 745–761.
[84] Michal Zalewski. 2017. American Fuzzy Lop 2.52b. http://lcamtuf.coredump.cx/
afl/.
13