Program Vulnerability Repair Via Inductive Inference
Program Vulnerability Repair Via Inductive Inference
ABSTRACT 1 INTRODUCTION
Program vulnerabilities, even when detected and reported, are not In recent years, we have seen the rise of automated program repair
fixed immediately. The time lag between the reporting and fixing (APR) techniques [16] and tools that automatically fix software bugs
of a vulnerability causes open-source software systems to suffer and vulnerabilities. These techniques fix program bugs by making
from significant exposure to possible attacks. In this paper, we pro- patched programs satisfy a given correctness criterion. In the most
pose a counter-example guided inductive inference procedure over commonly studied problem formulation, the correctness criterion
program states to define likely invariants at possible fix locations. is given as a test-suite. Such APR techniques are called test-driven
The likely invariants are constructed via mutation over states at automated program repair. Specifically, when fixing vulnerabilities,
the fix location, which turns out to be more effective for inductive given (1) a vulnerable program Prog, (2) a set of failing tests 𝑇𝑓
property inference, as compared to the usual greybox fuzzing over that can trigger the vulnerability, and (3) a set of passing tests 𝑇𝑝
program inputs. Once such likely invariants, which we call patch representing the functionality that should be preserved, test-driven
invariants, are identified, we can use them to construct patches APR fixes Prog at a fix location 𝐿 to pass both 𝑇𝑓 and 𝑇𝑝 .
via simple patch templates. Our work assumes that only one fail- A prominent group of APR techniques fix vulnerabilities by (1)
ing input (representing the exploit) is available to start the repair inferring a repair constraint at the fix location 𝐿 under which the
process. Experiments on the VulnLoc data-set of 30 vulnerabilities, vulnerability cannot be triggered, and (2) generating a patch to
which has been curated in previous works on vulnerability repair, ensure the repair constraint is always satisfied at 𝐿 [12, 21, 26, 31].
show the effectiveness of our repair procedure. As compared to In this work, we examine the possibility of finding probable or likely
proposed approaches for vulnerability repair such as CPR or SenX repair constraints via inductive (as opposed to deductive) inference.
which are based on concolic and symbolic execution respectively, These likely repair constraints are called patch invariants.
we can repair significantly more vulnerabilities. Our results show Formally, suppose 𝑆 is the program states seen at location 𝐿 in
the potential for program repair via inductive constraint inference, program executions, 𝑆 benign represents benign program states of
as opposed to generating repair constraints via deductive/symbolic the passing tests, and 𝑆 vul are the vulnerable states of the failing
analysis of a given test-suite. tests. The inferred patch invariant 𝐼 holds on observed benign
state 𝑠 ∈ 𝑆 benign , but does not hold on observed vulnerable state
CCS CONCEPTS 𝑠 ′ ∈ 𝑆 vul . A patch disables vulnerable executions by ensuring 𝐼
• Software and its engineering → Automatic programming; always holds at 𝐿. Such patch invariant can be inferred by either
Software testing and debugging; • Security and privacy → static or dynamic program analyses. The static approaches reason
Software security engineering. about all the feasible program paths soundly by inspecting program
code directly. However, doing so usually relies on symbolic program
KEYWORDS analysis, leading to expensive computations [10, 23, 26]. In contrast,
Automated program repair, Snapshot fuzzing, Inductive inference dynamic approaches infer the patch invariant according to a set
of program execution traces over a sample of test cases. These
ACM Reference Format: approaches limit their attention to the given test cases, and thus
Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury.
can scale to large programs. However, the inferred invariant and
2022. Program Vulnerability Repair via Inductive Inference. In Proceedings
the generated patches may work on the given test suite, but cannot
of ACM SIGSOFT International Symposium on Software Testing and Analysis
(ISSTA 2022). ACM, New York, NY, USA, 12 pages. be generalized to the other tests. In other words, the inferred patch
invariant 𝐼 only holds on given 𝑆 benign , but not on other benign
∗ corresponding author states. In program repair literature, this is called overfitting problem.
To alleviate the overfitting problem, one idea is to generate more
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed test cases, so that, we can infer more precise patch invariants and
for profit or commercial advantage and that copies bear this notice and the full citation generate higher-quality patches. Grey-box fuzzing, e.g., AFL [32]
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
and LibFuzzer [1], is an efficient test generation approach in detect-
to post on servers or to redistribute to lists, requires prior specific permission and/or a ing software bugs/vulnerabilities. These techniques rely on light-
fee. Request permissions from [email protected]. weight instrumentation to collect coverage information to guide
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
© 2022 Association for Computing Machinery.
the test generation. The test generation goal is to maximize code
coverage and hence the possibility to detect bugs. Coverage-based
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury
greybox fuzzing can be applied to repair vulnerabilities via (1) gen- 1 sect = find_section_by_type ( filedata , SHT_OPTIONS );
2 + if (sect->sh_size < sizeof (* eopt))
erating a test suite by fuzzing the program, (2) classifying the test 3 + return FALSE; // developer patch
suites into vulnerable and benign inputs depending on whether a 4 eopt = get_data ( NULL , filedata , options_offset , 1,
test triggers the vulnerability, (3) inferring patch invariant using sect - > sh_size , (" options ") ) ;
5 if ( eopt ) {
the augmented vulnerable and benign test suite, and (4) using the 6 ...
invariant to generate a patch. However, we argue this approach is 7 while ( offset <= sect - > sh_size - sizeof (* eopt )) {
ineffective for the following two reasons. First, to infer a precise 8 Elf_External_Options * eoption ;
9 eoption = ( Elf_External_Options *) (( char *) eopt
patch invariant to discriminate vulnerable and benign executions, + offset ) ;
we need the fuzzer to explore the program states at the fix location 10 option - > kind = BYTE_GET ( eoption - > kind ) ;
11 option->size = BYTE_GET (eoption->size);
- generating representative benign and vulnerable states. However, 12 ...
traditional grey-box fuzzing is mainly designed to maximize code 13 offset += option - > size ;
coverage, instead of exploring the program states at certain points. 14 ++ option ;
15 }
Second, to infer a patch invariant at a certain point (the fix loca- 16 }
tion), the fuzzer is required to generate a large number of tests that
can reach this location (reachability problem). However, solving the Figure 1: Simplified code snippet for CVE-2019-9077.
reachability problem is considered challenging even for directed
grey-box fuzzing tool [5]. According to [5], generating a test to only one failing input representing the exploit available to our tool.
reach a certain point in large programs takes around two hours, With Daikon and cvc5 as backend, VulnFix correctly fixes 19 and
which is not efficient enough for our purposes, since we have to 20 vulnerabilities out of 39 subjects, outperforming state-of-the-art
generate many such tests. vulnerability repair tools. When comparing with program input
To address the above challenges, we propose snapshot fuzzing to fuzzers AFL [32] and ConcFuzz [27], our approach is more efficient
efficiently explore program states with the goal of inferring precise in generating counterexamples for refining the inferred invariants.
patch invariant. Specifically, instead of mutating the test inputs Contributions The contributions of this paper include:
at the entry point of a program, snapshot fuzzing heuristically
• We propose an approach for fixing vulnerabilities based on
mutates the program state (i.e., snapshot) at some certain program
counterexample-guided inductive inference. This helps reduce
points. We remark that these mutated program states (donated as 𝑆)
the over-fitting problem in automated program repair, without
may not be reachable from the beginning of the program, meaning
any significant deductive machinery.
that 𝑆 is the super-set of all the reachable program states 𝑆 feasible
• We implemented our technique in a tool called VulnFix to gen-
(𝑆 feasible ⊆ 𝑆). If an inferred invariant is valid on 𝑆, it must be
erate patches in the form of conditions and evaluated it on 39
valid on 𝑆 feasible . Our main intuition is that by inferring invariants
real-world vulnerabilities. Evaluation results show that our snap-
using both feasible and infeasible program states, the less restrictive
shot fuzzing outperformed traditional grey-box fuzzing in gener-
artificial program states lead to stronger invariants, meaning the
ating useful test cases, and VulnFix outperforms state-of-the-art
inferred patch invariants is not only satisfied on all reachable states
vulnerability repair tools.
but also on some non-reachable artificial states. Although stronger
invariants are not precise, they can be useful in many scenarios,
such as debugging, program repair, program hardening, etc. The
2 MOTIVATING EXAMPLE
impact of infeasible states will be examined in details in Section 3. In this section, we illustrate the workflow of VulnFix for inferring
The workflow of inferring patch invariant is as follows: with patch invariants to repair a security vulnerability in a real-world
some initial candidate invariant generated from a limited test suite application. The vulnerability used in this section is CVE-2019-
(the given tests plus the tests obtained from traditional fuzzing), 9077 1 , which is a heap-based buffer overflow vulnerability in the
snapshot fuzzing attempts to invalidate the current invariant by GNU Binutils. Figure 1 shows the code snippet of this bug.
mutating program states to find counterexamples. Given a candidate At line 4, the function call get_data allocates a buffer of size
invariant, the mutation step invokes an SMT solver to obtain new sect->sh_size, which is pointed to by eopt. As the two variables
values for variables that appeared in the invariant. Such mutation used in the right-hand-side of the while condition at line 7 are of
finds a counterexample if the program execution result is different type unsigned long, if sect->sh_size is less than sizeof(*eopt),
from what the candidate invariant suggests - if a program state the subtraction operation can underflow to a very large number.
satisfying candidate invariant leads to a failure in execution, this This causes the while condition to unexpectedly pass, resulting
state is considered to be a counterexample to the candidate invariant. in buffer overflow read at line 11 with the call to BYTE_GET. The
These new counterexample program states are then used to refine developer fixed this bug by adding a check at line 3 to prevent the
the candidate invariant, which in turn guides the next round of integer underflow from happening in the while condition, thereby
mutation. We realized our idea in a tool called VulnFix for fixing preventing the buffer overflow. In the rest of this section, we de-
vulnerabilities using Daikon [8] and cvc5 [3] as backend invariant scribe how VulnFix generates a patch invariant for this example
inference engine. Note that, we did not change the inference engine and how it can help fix this bug.
itself, instead, we just focus on producing more valuable tests/states Input-level Fuzzing. Given one exploit input that triggers the bug
for inferring high-quality invariants. We evaluated VulnFix on a and the target location Lpatch for inferring invariants, input-level
dataset including 39 real-world vulnerabilities. We assume there is
1 https://sourceware.org/bugzilla/show_bug.cgi?id=24243
Program Vulnerability Repair via Inductive Inference ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
Snapshot fuzzing. To further refine the generated invariant, we Infeasible States. As we mentioned above, snapshot fuzzing could
use snapshot fuzzing to generate counterexamples by directly mu- generate infeasible states. In this example, the variable do_segments
tating the program states. A program state, denoted as snapshot, is of type int, but the program uses it as an implicit boolean type
is a mapping from all visible program variables at Lpatch to their and only assigns 0 or 1 to it, so all feasible states can only have
corresponding values. Compared with input-level fuzzing, the main the value of do_segments to be 0 or 1. Since VulnFix does not
advantage of snapshot fuzzing is that it could bypass the reachabil- perform any static analysis on the code, it has no information about
ity problem and mutate the program states directly in a controlled this restriction of state feasibility, and it can potentially change the
way. So that a large number of representative program states could value of do_segments to other values, resulting in an infeasible
be generated at Lpatch efficiently, which can drive the inference en- state. Such an infeasible state is shown in SF Round_2 in Table 1.
gine to infer a high-quality invariant. Specifically, given an existing However, the infeasible states would not affect the correctness of
snapshot, we mutate it with the goal of generating counterexample the inferred invariant. Instead, the patch invariants generated based
states that can refine the current patch invariants. For instance, on both feasible and infeasible states are stronger. Meaning that
given the current patch invariant (e_shnum < sect->sh_size), the inferred invariant is not only satisfied when do_segments are
in the first round (SF Round_1), snapshot fuzzing generates a new 0 or 1, but also on other values of do_segments.
state {e_shnum=32, sect->sh_size=32, ...}. This new state is
a counterexample since it violates the above patch invariant but 3 METHODOLOGY
does not trigger the buffer overflow. The refined candidate invari- The workflow of VulnFix is shown in Figure 2. VulnFix takes as
ants then guide the next round of snapshot fuzzing. This process input a vulnerable program Prog, an “exploit” input Iexploit that
continues until a stable solution is reached or time out. In this exam- triggers a known target vulnerability, and a patch location Lpatch
ple, after SF Round_3, no candidate invariant is removed (number that indicates where a patch should be applied. We assume that the
of candidate invariants (#Inv) is not reduced), and the remaining target vulnerability can be observed via abnormal program termi-
invariant (sect->sh_size >= 8) is no longer changed even with nation or crash, including hardware exceptions (SIGSEGV, SIGFPE,
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury
Entry point 1 𝑝𝑎𝑡ℎ! 𝑝𝑎𝑡ℎ" 𝑝𝑎𝑡ℎ# most of states unchanged (e.g., the overall memory layout), it may
miss some valid program states. Fortunately, traditional coverage-
1 1 1
2 3 4 guided input-level fuzzing can fill this gap by exploring different
2 3 4 paths from the entry to Lpatch . As illustrated in Figure 3, input-
5 level fuzzing explores different paths to the patch location, while
5 5 snapshot fuzzing further explores the program states along with
Patch location 6
each path by directly mutating the states.
6 6 6
Bug location 7
Initial Test Suite. Our current design builds the first phase on top of
Input-level fuzzing Snapshot fuzzing standard coverage-based greybox fuzzing tools, namely AFL [32],
with a few modifications. Standard fuzzing generates new inputs
Figure 3: Input-level fuzzing is to explore the paths from the by mutating the existing inputs, with a higher priority assigned
entry point to patch location (the green solid lines), while to inputs that increase code coverage. The prioritised inputs are
snapshot fuzzing directly explores the states over the iden- further mutated in the next rounds. This process continues with
tified paths (the red dot lines). the goal of increasing code coverage. However, our goal is to find a
diversity of inputs that reach the patch location Lpatch . In addition
etc.), assertion failure, or failed sanitizer check (e.g. AddressSani- to code coverage, we thus modify AFL to prioritize inputs that reach
tizer [25]). We also assume a patch location that indicates where Lpatch . The tests that drive execution to Lpatch are saved as the
the vulnerability should be fixed. In practice, the patch location can initial test suite.
decided using fix localization [29] or provided manually.
Snapshot Logging. Once the initial test suite T is generated, the
VulnFix tries to infer a patch invariant at the given patch lo-
next step is to generate a set of program states (a.k.a., snapshots) 𝑠 at
cation Lpatch according to the set of observed program states 𝑆
the patch location Lpatch for each test 𝑡 ∈ T . From each snapshot
(donated as snapshot) at Lpatch . Snapshots 𝑆 can be partitioned
𝑠, we log information useful for invariant inference, including:
into the set of benign program states 𝑆 benign (that do not trigger
the target vulnerability) and the set of vulnerable program states (1) a name-value mapping of live variables at the patch location
𝑆 vul (that do trigger the target vulnerability). The output of our (Lpatch ), including global variables, function parameters, and
workflow is a patch invariant that holds for all observed benign pro- local variables within the current scope;
gram states 𝑠 ∈ 𝑆 benign , but do not hold for all observed vulnerable (2) a name-value mapping of pre-defined ghost variables that con-
program states 𝑠 ′ ∈ 𝑆 vul . The patch invariants capture the under- tain useful values not explicitly represented by the set of live
lying conditions that are observed necessary to avoid triggering variables; and
the vulnerability. Enforcement of the patch invariants can be used (3) a classification of whether the snapshot 𝑠 triggers the vulnera-
to guide program repair. Note that, in this paper, we focus on the bility (𝑠 ∈ 𝑆 vul ) or not (𝑠 ∈ 𝑆 benign ).
patch invariant inference, and the way of using patch invariants to The name-value pairs including basic type variables (e.g., int, bool,
generate patches follows existing techniques [10, 12]. char, etc.), pointer variables (e.g., ptr), pointer dereference (e.g.,
VulnFix consists of two main phases input-level fuzzing and *ptr), and struct/class/union member variables. Since structs,
snapshot fuzzing. Input-level fuzzing is used to collect an initial set of unions and pointers can be nested (e.g., x->y.z, etc.), the snapshot
state observations at the patch location Lpatch . To avoid overfitting, logger recursively retrieves the nested member variables up to a
the second phase snapshot fuzzing increases the diversity of states configurable depth. Pointer values also have a special representation
with the aim of generalizing the initial set of patch invariants. as discussed below.
In addition to the live variables, we also log implicit (a.k.a. ghost)
3.1 Input-Level Fuzzing variables that may contain useful information at the patch location
Lpatch . Such ghost variables may be necessary for inferring a use-
Our workflow begins with an initial exploit (Iexploit ) which triggers
ful invariant that separates the benign and vulnerable cases. For
the vulnerability. In the initial phase, the goal of input-level fuzzing
instance, the size of arrays or buffers is usually important when
is to expand the initial exploit into a test suite that exhibits a diversity
classifying out-of-bound access, however, the size of arrays may not
of both vulnerable and benign program states. The purpose of the
be saved in a live variable. Currently, the snapshot logger supports
initial test-suite is twofold: (1) help to infer an initial set of patch
the following ghost variables:
invariants based on observed states at the patch location Lpatch ,
which acts as a starting point for the alternating loop of invariant • The size of a global, stack, or heap-allocated buffer. If a buffer is
refinement and inference, and (2) generate an initial sets of snapshots pointed by a visible pointer variable ptr, this ghost variable is
that will be mutated in the second phase for invariant refinement. denoted by size(ptr).
Specifically, input-level fuzzing plays the role of exploring dif- • The base address of the buffer pointed to by a visible pointer
ferent paths from the entry point to Lpatch as shown in Figure 3 variable ptr. This ghost variable is denoted by base(ptr). In this
(the green solid lines). As we mentioned in Section 2, snapshot case, ptr can point to any address within a buffer, and base(ptr)
fuzzing directly mutates the program states at Lpatch , while not is the base address of the buffer.
changing the execution paths between the entry and Lpatch . Be- To obtain the values for size(ptr) and base(ptr) from the value
cause of the fact that snapshot fuzzing just mutates a small part of of a pointer ptr, we retrieve the meta information associated with
program states (e.g., integer values, boolean values) while keeping the corresponding memory defined by sanitizers at runtime. In the
Program Vulnerability Repair via Inductive Inference ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
current design, we utilise the allocation meta-data from Address- Algorithm 1: Basic snapshot fuzzing Loop
Sanitizer [25] to derive the values of the ghost variables. Input: initial snapshot corpus 𝑆, candidate invariants Φ
Our snapshot logger also specially represents pointer values (e.g., Output: Refined invariants Φ
ptr) in terms of ghost variables. Specifically, a ptr is represented 1 while !Timeout() do
as the offset between ptr and base(ptr) values, which means ptr 2 𝑠 ← Select(𝑆 )
is transformed into offset(ptr)=ptr−base(ptr) in the snapshot. 3 𝑠 ′ ← Mutate(𝑠 )
For example, if ptr=base(ptr)+8, then ptr is represented by the
4 𝑟 ← Execute(𝑃, 𝑠 ′ )
offset +8, regardless of the actual value of ptr interpreted as an
5 if isCounterExample(𝑟, Φ) then
integer. This is because, for most programs, the vulnerability de-
6 𝑆 ← 𝑆 ∪ {𝑠 ′ }
pends on the pointer offset rather than the absolute pointer value
at runtime. 7 Φ ← GenerateInv(𝑆 )
The final logged information is the classification. For a single 8 end
Mutation Strategy. Algorithm 1 generates new states 𝑠 ′ by directly In-place extension In-place contraction
mutating an existing state selected from the current snapshot corpus
Object Redzone Object Redzone
𝑆. Since the goal is to refine the current patch invariants Φ, we just
mutate the visible variables observed by snapshot logging from
In-place extension and contraction.
Section 3.1 (e.g., local variables, members, etc.) at location Lpatch .
Other program states (e.g., arbitrary memory addresses, etc.) are
4 IMPLEMENTATION
not considered by Φ, and will not be mutated.
In principle, the snapshot variables can be mutated arbitrarily The current implementation of VulnFix consists of three com-
within specific constraints (see below). However, we can optimize ponents: (1) an instrumentation module for snapshot logging and
the mutation strategy with the reference to the current patch in- mutation, (2) a driver module for counterexample generation, and (3)
variants Φ which is assumed to be “mostly” correct. Therefore, in a backend for invariant inference. The instrumentation and driver
addition to random perturbation, we bias mutation towards points modules form a frontend that generates snapshots for the backend.
that are closer to the boundary defined by Φ (the boundary between Instrumentation. The instrumentation module (written in C) is
benign and vulnerable executions). Here, given a patch invariant built on the static binary rewritter e9patch [7]. At the patch loca-
𝜙 ∈ Φ over variables → −𝑥 = ⟨𝑥 , .., 𝑥 ⟩, then 𝑝 = ⟨𝑣 , .., 𝑣 ⟩ is a bound-
1 𝑛 1 𝑛 tion Lpatch of the vulnerable program, the instrumentation module
ary point if there exists another point 𝑝 ′ ≠ 𝑝 such that 𝜙 (𝑝), ¬𝜙 (𝑝 ′ ) inserts a function to record current values of the variables in scope,
hold (or vice versa) and there exists no intermediate point between and optionally mutate some of the variable values based on a given
𝑝 and 𝑝 ′ w.r.t. Euclidean distance. The intuition behind this strategy argument. To read and write program variable values at runtime,
is that any inaccuracies within 𝜙 are more likely to be exposed by the instrumented code parses the DWARF debugging information
points close to the boundary, as opposed to arbitrary points that to establish a mapping between variable names and their corre-
comfortably satisfy either 𝜙 or its negation. We use an SMT solver sponding runtime locations.
to generate boundary points in order to guide mutation.
Driver. The driver module (written in Python) invokes various
Mutation constraints. Mutations are constrained by variable types components and communicates data (snapshots, patch invariants)
and other assumptions. For example, a variable 𝑐 of type char can between them. It processes the snapshots produced by the instru-
only be mutated to values within the range CHAR_MIN..CHAR_MAX. mentation code and classifies them based on the program execution
Mutations to ghost variables are similarly constrained to make sure status. It also implements the core of snapshot fuzzing for generat-
their semantics are preserved. For example, given a pointer ptr, ing counterexample snapshots based on the given patch invariants
Program Vulnerability Repair via Inductive Inference ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
and test inputs. We use z3 [6] SMT solver for finding boundary online bug reports. VulnFix infers a patch invariant classifying
values to guide mutation. the benign and vulnerable execution. We using the patch invariant
to disable the vulnerable execution by either (1) integrating the
Backend. The backend component takes in sets of benign and
invariant to the original condition if the target location is an if,
vulnerable snapshots and perform invariant inference based on
for or while statement; or (2) generating an if-guard in the form
them. The current implementation of VulnFix supports two back-
of
ends: Daikon-based and cvc5-based. For the Daikon backend, we
first use Daikon to infer a set of invariants Φ0 from 𝑆𝑏𝑒𝑛𝑖𝑔𝑛 , and then if(!constraint) exit(ERROR_NUM);
perform a filtering step which only returns 𝜙 ∈ Φ0 if 𝜙 is violated by
all 𝑠 ∈ 𝑆 𝑣𝑢𝑙 . The filtering step is implemented on top of the Daikon 5.1 RQ1: Efficacy with different backends
InvariantChecker utility. Since Daikon initiates invariants based
We evaluated the efficacy of VulnFix with two different backends
on templates, we add a few extra templates applicable for patching
Daikon and cvc5. Daikon uses pre-defined templates for instantiat-
security vulnerabilities:
ing invariant candidates and enumerates the candidates to find the
• 𝑥 − 𝑦 >= 𝑎, where 𝑥, 𝑦 are variables and 𝑎 is a constant; ones that are satisfied on the given traces. While cvc5 synthesizes
• 𝑥 < 2𝑛 , where 𝑥 is a variable, and 2𝑛 is power-of-two con- an expression based on a given grammar via Satisfiability Modulo
stant representing boundary values for integers. Theories (SMT) solving. Given cvc5 is built on top of SMT, it is less
Cvc5 is a program synthesizer, which takes as input a set of input- scalable than Daikon and takes more time to run especially when
output pairs {𝑖 1 ↦→ 𝑜 1, . . . , 𝑖𝑛 ↦→ 𝑜𝑛 } and synthesizes a function the number and size of snapshots grow. In order to obtain mean-
𝑓 such that 𝑓 (𝑖𝑘 ) = 𝑜𝑘 for 𝑘 ∈ {1, . . . , 𝑛}. In our context, we use ingful results, for each vulnerability in the benchmark, we set the
cvc5 backend to synthesize a function 𝑓 , such that 𝑓 (𝑠) = True for total timeout to be 30 minutes for Daikon-backend and 3 hours for
𝑠 ∈ 𝑆𝑏𝑒𝑛𝑖𝑔𝑛 and 𝑓 (𝑠 ′ ) = False for 𝑠 ′ ∈ 𝑆 𝑣𝑢𝑙 . The grammar used for cvc5-backend. The first 10 minutes are allocated for the input-level
synthesis includes all variables in snapshot, arithmetic operators fuzzing phase, and the remaining is allocated to snapshot fuzzing
(+, −, ×), relational operators (⩾, ⩽, =), logical operators (and, or, and invariant inference.
not), and constants (1 to 100, power-of-two values). Since VulnFix infers a patch invariant over existing program
variables (as well as ghost variables), which is then used to disable
Use of sanitizers. Since the snapshots need to be classified into be-
vulnerable executions, VulnFix is not applicable to some vulnera-
nign and vulnerable based on observing program execution status,
bilities in the benchmark. These vulnerabilities include those that
we use AddressSanitizer (ASan) [25] and UndefinedBehaviorSani-
(1) cannot be fixed by modifying or inserting conditions, or (2)
tizer (UBSan) [2] to transform the vulnerabilities into crashes. We
require addition of new program variables that are not included
also read and write to the ASan redzone metadata for logging and
in our ghost variable scheme. We identified 9 such vulnerabilities
mutating ghost variable values.
according to their developer patches and marked them as “NA” (not
applicable). These 9 vulnerabilities are included in the results for
5 EVALUATION
completeness.
In this section, we aim to answer the following research questions: For the remaining vulnerabilities, we evaluate the correctness of
• RQ1: How effective is VulnFix (with different backends) in syn- the generated patches by manually comparing them with developer
thesizing conditions for fixing real-world CVEs? patches. “Correct (equiv)” means that the result of VulnFix is se-
• RQ2: What are the strength and weakness of VulnFix compared mantically equivalent to the developer patch; “Correct (not equiv)”
with other APR tools? means that the produced patch is not semantically equivalent to
• RQ3: How effective is snapshot fuzzing in refining patch invari- developer patch, but still correctly fixes the vulnerability (see exam-
ants compared to input-level fuzzing? ples in the following). “Wrong” means that VulnFix fails to produce
a correct patch before timeout. We only regard a result as correct if
Benchmark subjects. We evaluate VulnFix on a subset of the
it is the only patch produced by VulnFix and the produced patch
VulnLoc [27] benchmark. The VulnLoc benchmark is extended
correctly fixes the vulnerability.
from the ExtractFix [10] and SenX [12] benchmark, and contains 43
Results. Table 2 shows the evaluation results, where columns
real-world CVEs. Out of the 43 vulnerabilities in the VulnLoc bench-
“Daikon backend” and “cvc5 backend” list the result of VulnFix
mark, 4 vulnerabilities cannot be reproduced in our environment
when the corresponding backend is used. Overall, both backends
(ubuntu-18.04 and gcc-7.5/clang-10) because they are incompat-
show similar results in producing correct patches (both produce 19
ible with the experimental system or libraries. The remaining 39
correct patches). On CVE-2017-14745, Daikon backend fails because
vulnerabilities are used in our evaluation.
it produces two patches in the end, while cvc5 backend produces
Experiment setup. All of our experiments are performed on a exactly one correct patch. On Gnubug-25003, cvc5 backend fails
40-core 2.60GHz 64GB RAM Intel Xeon machine, Ubuntu 18.04. We while Daikon backend produces the correct patch.
note that the current implementation of VulnFix does not support VulnFix produces correct but not equivalent patches on 6 vul-
parallelism and the experiments are performed with sequential nerabilities. The main reason is that the patch produced by Vuln-
algorithms. For most vulnerabilities in the evaluation, we use the Fix is strictly based on whether a vulnerable program behavior is
following configuration: (1) the developer patch location is used observed, while the developer patch may also take insights from
as the target location to infer invariants; (2) the initial input set program-specific semantic information. For example, Libtiff con-
supplied to VulnFix only includes one exploit input obtained from sists of an integer overflow vulnerability (CVE-2017-7601), and
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury
its relevant code snippet is shown in Figure 5. The bug is trig- 1 switch ( sp - > photometric ) {
2 case PHOTOMETRIC_YCBCR :
gered when the value of td->td_bitspersample is greater than 3 ...
62, causing the left shift on line 10 to overflow. The developer patch 4 + if( td->td_bitspersample > 16 ) {
on line 4-6 adds a check on its value and returns if the value is 5 + return (0);
6 + }
too big, with the bound 16 chosen based on file format specifica- 7 {
tion. On the other hand, VulnFix produces the patch invariant 8 float * ref ;
td->td_bitspersample <= 62, where 62 is the maximum value 9 if (! TIFFGetField ( tif , TIFFTAG , & ref ) ) {
10 top = 1L « td->td_bitspersample; // !integer overflow!
allowed for the left shift on line 10 to not overflow. In this case, 11 }
VulnFix produces a patch that correctly separates the benign and 12 }}
vulnerable behaviors, while the developer patch additionally con-
siders other program semantics. Figure 5: Simplified code snippet of CVE-2017-7601.
As discussed in Section 3.2, it is possible that the patch invariant
generated from inductive inference is unsatisfied by certain feasible
benign states, if they are not observed. Such patch variants can lead
to patches that disable more feasible program behavior than desired
Program Vulnerability Repair via Inductive Inference ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
(if the patches are generated in the form of if-guard), thereby re- Results. The evaluation results of CPR are shown in the “CPR”
stricting the benign functionality of the program for making it more columns in Table 2. Column “Rank” shows the rank of the correct
secure. To understand the effect of such patch invariants experi- patch in the final patch pool. Column “Ratio” shows the patch pool
mentally, we examined the 6 correct but not equivalent patches, and reduction ratio, which is the percentage of initial patches that are
found 1 of them (CVE-2017-6965) restricts more behavior than the discarded by co-exploration of the patch space and input space.
developer patch. Furthermore, these two patches - one produced by “Timeout” indicates that CPR did not generate patches before the
VulnFix and the other from developers - are applied to the vulner- 30-minute timeout, and “Error” indicates that an error occurred dur-
able program, which then undergoes a 24-hour differential fuzzing ing concolic execution and CPR aborted. Overall, with a 30-minute
campaign to check whether the two patches exhibit different be- timeout, CPR ranks the correct patch at the top-1 position for 4 out
haviors. After 24 hours of fuzzing, there was no input executions of 39 vulnerabilities. For 16 vulnerabilities, CPR discards more than
that evaluate the VulnFix patch and developer patch differently, 40% of the initial patch candidates by performing concolic execu-
which means no significant restriction of benign functionality was tion. However, for 13 other vulnerabilities, CPR has 0% patch space
observed from our experimentation. reduction potentially due to the longer paths from loop unrolling
Besides, there are 11 vulnerabilities marked as “Wrong” or “Wrong [26]. In other words, concolic execution cannot find any test input
(not spt)”. “Wrong (not spt)” means that the current VulnFix im- that can reach patch location or discard plausible patches. Instead
plementation does not support generating the correct invariant. of relying on concolic execution, VulnFix performs snapshot mu-
For example, the correct invariant for CVE-2016-9532 involves in- tation to discard overfitting patch invariants, which is shown to be
equality with scalar multiplication (e.g., x * y * z <= constant), more efficient based on the experimental results.
which is not supported by Daikon. Daikon infers invariants based
on a set of templates, and invariants that cannot be represented as SenX. We also performed a comparison with the security vul-
one of the templates cannot be inferred. As cvc5 synthesizes invari- nerability repair tool SenX [12], which generates patches based
ants based on grammar instead of fixed templates, it was expected on a pre-specified set of safety properties. The same benchmark
that cvc5 outperforms Daikon. However, the experimental result consisting of 39 vulnerabilities is used, and the timeout for each
shows otherwise: cvc5 backend fails to produce correct patches vulnerability is set to be 30 minutes. Since SenX currently only sup-
on the vulnerabilities that Daikon does not support (marked as ports repairing buffer overflows, bad casts, and integer overflows
“Wrong”). This is because these patches usually consist of complex [12], vulnerabilities that are not of these types are not applica-
expressions, and cvc5 backend times out before synthesizing such ble. There are 8 such vulnerabilities in the benchmark, which are
expressions. marked as “NA” (not applicable).
To generate a patch that enforces a safety property, SenX uses
Overall, the Daikon and cvc5 backend each produces 19 techniques such as expression translation and loop cloning. These
correct patches, with a time budget of 30 minutes and 3 techniques can potentially generate a different patch than the one
hours. From this comparison, Daikon appears to be a more from developers, making it non-trivial to manually compare the
practical backend. generated patch and the developer patch for correctness. There-
fore, to check for correctness, we examine the generated patch by
applying it on the vulnerable program, re-compiling the patched
program, and executing the patched program with the exploit input.
5.2 RQ2: Comparison with other APR tools If the original vulnerable behavior is no longer observed on the
CPR. To understand the strength and weakness of VulnFix in patched program, the generated patch is considered as correct.
repairing security vulnerabilities, we perform a comparison with Results. The evaluation results of SenX are shown in the “SenX”
CPR [26], a state-of-the-art program repair tool. CPR works by first columns in Table 2. Column “Patch detail” shows the detail of
synthesizing a pool of patch candidates from a given set of patch examining the generated patch. In this column, “-” indicates that
ingredients, then discarding overfitting patches from the pool by no patch is generated by SenX, “Wrong (comp)” indicates that the
exploring the input space with concolic execution, and finally rank- patched program could not be compiled, “Wrong (exec)” indicates
ing the remaining patches. This workflow is conceptually similar to that the vulnerable behavior is still observed when executing the
counterexample-guided inductive synthesis (CEGIS), i.e., infer ini- patched program with exploit input, and “Correct” indicates that
tial candidates and then generate new test inputs (counterexample) the patch is correct based on the criteria discussed above. Overall,
to rule out incorrect candidates. SenX produces correct patches for 4 out of 39 vulnerabilities.
In our experiments, we set the timeout for each vulnerability
VulnFix generated 19 correct patches with 30 minutes,
to be 30 minutes for CPR. Since CPR requires patch ingredients
while CPR and SenX produces 4 correct patches each (by
to be provided for patch synthesis, we supply five variables at the
just checking the top ranked patch for CPR).
patch location (including all the variables used in the developer
patch) and necessary arithmetic/comparison operators as patch
ingredients to the synthesizer used in CPR. Besides, since CPR
currently repairs only boolean and integer expressions [26], and 5.3 RQ3: Comparison with input-level fuzzing
does not automatically introduce new program variables, it is not To understand whether snapshot fuzzing can generate states that
applicable to some vulnerabilities in the benchmark. We identified refine the invariants effectively, we also compare it with traditional
5 such vulnerabilities and marked them as “NA” (not applicable). input-level fuzzing. Specifically, we replace the snapshot fuzzing
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury
VulnFix VulnFix𝐶 VulnFix𝐴 ConcFuzz, which is the same as the total time budget for VulnFix.
Bug ID
#Inv result #Inv result #Inv result The 9 vulnerabilities in the benchmark which are not applicable to
CVE-2017-6965 1 ✓ 1 ✗ 1 ✓ VulnFix are excluded from this experiment, as they are also not
CVE-2017-14745 2 ✗ 0 ✗ 5 ✗ applicable when snapshot fuzzing is replaced by input-level fuzzing
CVE-2017-15025 1 ✓ 0 ✗ 5 ✗
techniques.
Gnubug-19784 1 ✓ 1 ✓ 1 ✓
Results. The evaluation results of input-level fuzzing are shown
Gnubug-25003 1 ✓ 34 ✗ 23 ✗
Gnubug-25023 1 ✓ 8 ✗ 7 ✗
in Table 3. Column VulnFix𝐶 represents the result when replacing
Gnubug-26545 0 ✗ 1 ✗ 0 ✗ the snapshot fuzzing module in VulnFix with ConcFuzz, while
CVE-2016-8691 1 ✓ 25 ✗ 17 ✗ VulnFix𝐴 is the result when snapshot fuzzing is replaced by AFL.
CVE-2016-9557 0 ✗ 0 ✗ 0 ✗ The column “#Inv” shows the number of invariants produced when
CVE-2016-5844 1 ✓ 0 ✗ 60 ✗ time budget is exhausted and column “result” indicates whether a
CVE-2012-2806 1 ✓ 6 ✗ 6 ✗ single correct patch is produced in the end. Overall, VulnFix pro-
CVE-2017-15232 1 ✓ 0 ✗ 15 ✗ duces 19 correct patches out of 39 vulnerabilities, while VulnFix𝐶
CVE-2018-19664 2 ✗ 0 ✗ 18 ✗ and VulnFix𝐴 only produce 2 and 6 correct patches, respectively.
CVE-2016-9264 1 ✓ 4 ✗ 6 ✗ VulnFix𝐶 and VulnFix𝐴 just produce very few correct patches be-
Bugzilla-2633 2 ✗ 0 ✗ 50 ✗ cause 1) they generate multiple candidate invariants (#Inv is greater
CVE-2016-5321 1 ✓ 3 ✗ 5 ✗
than 1), and some of them are incorrect; 2) Although they produce
CVE-2016-9532 36 ✗ 38 ✗ 36 ✗
CVE-2016-10094 9 ✗ 24 ✗ 23 ✗
only one candidate invariant on some vulnerabilities, the produced
CVE-2017-7595 1 ✓ 14 ✗ 3 ✗ invariant is incorrect and overfit to the generated test cases. In con-
CVE-2017-7599 0 ✗ 0 ✗ 0 ✗ trast, directly mutating the snapshot enables VulnFix to generate
CVE-2017-7600 0 ✗ 0 ✗ 0 ✗ fewer but more precise invariants and hence more correct patches.
CVE-2017-7601 1 ✓ 2 ✗ 1 ✗
CVE-2012-5134 1 ✓ 6 ✗ 4 ✗
Compared to input-level fuzzing AFL and ConcFuzz, snap-
CVE-2016-1838 3 ✗ 0 ✗ 3 ✗ shot fuzzing enables VulnFix to generate fewer but more
CVE-2016-1839 1 ✓ 0 ✗ 1 ✓ precise invariants and hence more correct patches.
CVE-2017-5969 1 ✓ 1 ✓ 1 ✓
CVE-2013-7437 1 ✓ 0 ✗ 1 ✓
CVE-2017-5974 1 ✓ 8 ✗ 5 ✗ 5.4 Threats to Validity
CVE-2017-5975 1 ✓ 0 ✗ 1 ✓
A few threats may affect the validity of our evaluation. The main
CVE-2017-5976 0 ✗ 0 ✗ 0 ✗
Total - 19/30 - 2/30 - 6/30
threat to validity is that the correctness of generated invariants/-
patches cannot be guaranteed. Although snapshot fuzzing can ex-
Table 3: Comparison with input-level fuzzing, where Vuln-
plore the program states in a more controlled way, it still cannot
Fix𝐶 represents replacing the snapshot fuzzing module with
ensure that all reachable program states at a fix location are ex-
ConcFuzz, while VulnFix𝐴 means that snapshot fuzzing is
haustively explored. Fortunately, the incompleteness does not seem
replaced by AFL.
to have a big impact on the effectiveness of VulnFix. The second
threat is that we manually inspect whether the generated patches
are semantically equivalent to developer patches, which might be
step in VulnFix with traditional input-level fuzzing techniques and error-prone. To mitigate this, two authors of the paper double-
compare their effectiveness in generating correct patches. For the checked the generated patches.
tests generated by input-level fuzzing, we collect the benign/vul- Another threat to validity is that our selection of subject pro-
nerable snapshots by considering the non-redundant tests that can grams may not generalize to all programs. To mitigate this threat
reach the fix location. we used a data-set of subjects developed in a previous work [27].
We consider two input-level fuzzing tools: AFL [32] and Conc- We evaluated our technique on this existing dataset (filter out some
Fuzz [27]. AFL is a widely used grey-box fuzzing tool, which has vulnerabilities that cannot be reproduced). In the future, it may be
been proved to be efficient in detecting software vulnerabilities worthwhile to evaluate VulnFix on more CVEs and vulnerabilities.
and bugs. For AFL, we re-use the modified version described in
Section 3.1 to generate input tests. ConcFuzz “concentrates” on the 6 RELATED WORK
neighborhood of the given exploit. Specifically, it builds a “con- The contributions of this paper are related to several areas of re-
centrated” test suite that drives the program execution to reach search: automated program repair, vulnerability repair and coun-
each branch location of the given exploit execution trace. Based terexample guided invariant inference. In this section, we present
on the “concentrated” test suite, ConcFuzz can then estimate the the related work as follows.
probability of each branch being executed by vulnerable inputs and
hence determine the fault locations. For the application of invariant Automated program repair. Automated program repair techniques
inference, exploring the neighborhood of patch location instead of take in a buggy program, and a set of specifications, and aim to
the entire trace is sufficient. Therefore, we implement a modified generate a patched program satisfying the given specifications [16].
version of ConcFuzz that only “concentrates” on the patch location. Test-driven automated program repair treats the provided test suite
In the experiment, we set a 30-minute timeout for both AFL and as the specification of intended behavior and generates patches to
Program Vulnerability Repair via Inductive Inference ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea
make the patched program pass all the given tests [15, 18, 19, 21]. location via state mutations, and inductively inferring a likely in-
Since test cases are incomplete program specifications, the gener- variant, which is then used to construct patches. Evaluation on a
ated patches may overfit the given tests, i.e., the patched program previously proposed data-set of vulnerabilities show higher effec-
works on the given tests but cannot be generalized to other tests. tiveness compared to state-of-the-art vulnerability repair engines
VulnFix is designed to alleviate the overfitting problem. like SenX and CPR. While our approach is currently focused on
Existing work alleviates the overfitting issue by ranking the fixing vulnerabilities, it shows that inductive inference approaches
patches according to their probability of being correct [14, 18], re- can be promising for general-purpose program repair. This would
ferring to reference implementation [20] or designing customized contrast with deductive or symbolic approaches for program repair
repair strategies [28]. Typically, those approaches try to generate [21] which deduce a repair constraint by symbolically analyzing a
correct patches by referring to additional program artifacts. Com- given test-suite.
pared with those approaches, VulnFix does not rely on additional
inputs (such as reference programs), which gives VulnFix more REFERENCES
flexibility. Besides, some approaches alleviate overfitting problem [1] 2022. LibFuzzer. https://llvm.org/docs/LibFuzzer.html
[2] 2022. UndefinedBehaviorSanitizer. https://clang.llvm.org/docs/
by generating more test cases [9, 30]. Compared to those approaches UndefinedBehaviorSanitizer.html
that generate test inputs, snapshot fuzzing directly mutates program [3] Clark Barrett, Cesare Tinelli, and et al. 2022. CVC5. https://cvc5.github.io
states, which enable VulnFix to bypass the reachability problem [4] Marcel Böhme, Valentin J. M. Manès, and Sang Kil Cha. 2020. Boosting Fuzzer
Efficiency: An Information Theoretic Perspective. In Proceedings of the 28th ACM
in test case generation. Joint Meeting on European Software Engineering Conference and Symposium on
the Foundations of Software Engineering. New York, NY, USA, 678–689.
Vulnerability repair. In recent years, we have seen a rising trend [5] Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury.
of research on automatically fixing vulnerabilities. SenX [12] aims to 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference
repair vulnerabilities relying on vulnerability-specific and human- on Computer and Communications Security. 2329–2344.
[6] Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In
specified safety properties. Some other repair approaches are de- International conference on Tools and Algorithms for the Construction and Analysis
signed to repair a specific type of vulnerabilities, such as fixing of Systems. Springer, 337–340.
[7] Gregory J Duck, Xiang Gao, and Abhik Roychoudhury. 2020. Binary rewriting
memory errors [17] or concurrency bugs [13]. Compared to SenX without control flow recovery. In Proceedings of the 41st ACM SIGPLAN Conference
and these approaches which are limited to specific classes of bugs, on Programming Language Design and Implementation. 151–163.
VulnFix does not rely on pre-defined safety properties and is not [8] Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco,
Matthew S Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic
limited to certain vulnerabilities. ExtractFix [10] fixes vulnerabilities detection of likely invariants. Science of computer programming 69, 1-3 (2007),
by first inferring crash-free constraints, propagating the constraints 35–45.
to fix location, and synthesizing patches to satisfy the constraints. [9] Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding
program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium
CPR [26] fixes vulnerabilities by (1) generating a candidate patch on Software Testing and Analysis. 8–18.
space, and (2) detecting and discarding overfitting patches via a [10] Xiang Gao, Bo Wang, Gregory J Duck, Ruyi Ji, Yingfei Xiong, and Abhik Roy-
choudhury. 2021. Beyond Tests: Program Vulnerability Repair via Crash Con-
systematic co-exploration of the patch space and input space. It straint Extraction. ACM Transactions on Software Engineering and Methodology
leverages concolic execution to systematically traverse the input (TOSEM) 30, 2 (2021), 1–27.
space (and generate inputs), and uses the produced test inputs to [11] Pranav Garg, Daniel Neider, Parthasarathy Madhusudan, and Dan Roth. 2016.
Learning invariants using decision trees and implication counterexamples. ACM
rule out the overfitting patches from the patch space. Compared to Sigplan Notices 51, 1 (2016), 499–512.
ExtractFix and CPR, VulnFix does not rely on heavy symbolic and [12] Zhen Huang, David Lie, Gang Tan, and Trent Jaeger. 2019. Using safety properties
concolic executions, enabling it to scale to large programs. to generate vulnerability patches. In 2019 IEEE Symposium on Security and Privacy
(SP). IEEE, 539–554.
[13] Guoliang Jin, Wei Zhang, and Dongdong Deng. 2012. Automated concurrency-
Counter-example guided invariant inference. Recent works (e.g., bug fixing. In 10th {USENIX } Symposium on Operating Systems Design and Im-
PIE [24], ICE [11], CEGIR [22]) present CounterExample Guided plementation ( {OSDI } 12). 221–236.
Invariant geneRation (CEGIR), i.e., infer a initial set of candidate [14] Xuan Bach D Le, David Lo, and Claire Le Goues. 2016. History driven program
repair. In 2016 IEEE 23rd international conference on software analysis, evolution,
invariants and then improve them using counterexamples. Specif- and reengineering (SANER), Vol. 1. IEEE, 213–224.
ically, if an initial invariant is invalid for some input, these ap- [15] Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2011.
proaches search for counterexamples which can help to refine the Genprog: A generic method for automatic software repair. Ieee transactions on
software engineering 38, 1 (2011), 54–72.
invariant. Such approaches are more efficient than traditional dy- [16] Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated
namic or static invariant inference, However, they still cannot get Program Repair. Commun. ACM 62 (2019). Issue 12.
[17] Junhee Lee, Seongjoon Hong, and Hakjoo Oh. 2018. Memfix: static analysis-based
rid of the dependence on heavy program analysis. For instance, repair of memory deallocation errors for c. In Proceedings of the 2018 26th ACM
they still rely on symbolic execution or concolic execution [22, 33] Joint Meeting on European Software Engineering Conference and Symposium on
to discover counterexamples. Instead of relying on heavy symbolic the Foundations of Software Engineering. 95–106.
[18] Fan Long and Martin Rinard. 2016. Automatic patch generation by learning
analysis, VulnFix investigates using light-weight test generation correct code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium
to verify the candidate invariants. Therefore, VulnFix is largely on Principles of Programming Languages. 298–312.
independent of the complexity or size of the programs and thus [19] Sergey Mechtaev, Xiang Gao, Shin Hwei Tan, and Abhik Roychoudhury. 2018.
Test-Equivalence Analysis for Automatic Patch Generation. ACM Trans. Softw.
can scale to large programs. Eng. Methodol. 27, 4 (2018). https://doi.org/10.1145/3241980
[20] Sergey Mechtaev, Manh-Dung Nguyen, Yannic Noller, Lars Grunske, and Abhik
7 DISCUSSION Roychoudhury. 2018. Semantic program repair using a reference implementation.
In Proceedings of the 40th International Conference on Software Engineering. 129–
In this work, we have presented an approach for automatically 139.
repairing program vulnerabilities from a single exploiting test in-
put. Our approach is based on obtaining more states at the fix
ISSTA 2022, 18-22 July, 2022, Daejeon, South Korea Yuntong Zhang, Xiang Gao, Gregory J. Duck, and Abhik Roychoudhury
[21] Hoang D.T. Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. 537–549.
SemFix: Program Repair via Semantic Analysis. In ACM/IEEE International Con- [28] Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016.
ference on Software Engineering (ICSE). Anti-patterns in search-based program repair. In Proceedings of the 2016 24th
[22] ThanhVu Nguyen, Timos Antonopoulos, Andrew Ruef, and Michael Hicks. 2017. ACM SIGSOFT International Symposium on Foundations of Software Engineering.
Counterexample-guided approach to finding numerical invariants. In Proceedings 727–738.
of the 2017 11th Joint Meeting on Foundations of Software Engineering. 605–615. [29] W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A
[23] ThanhVu Huy Nguyen. 2014. Automating program verification and repair using survey on software fault localization. IEEE Transactions on Software Engineering
invariant analysis and test input generation. The University of New Mexico. 42, 8 (2016), 707–740.
[24] Saswat Padhi, Rahul Sharma, and Todd Millstein. 2016. Data-driven precondition [30] Qi Xin and Steven P Reiss. 2017. Identifying test-suite-overfitted patches through
inference with learned features. ACM SIGPLAN Notices 51, 6 (2016), 42–56. test case generation. In Proceedings of the 26th ACM SIGSOFT International Sym-
[25] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy posium on Software Testing and Analysis. 226–236.
Vyukov. 2012. Addresssanitizer: A fast address sanity checker. In 2012 {USENIX } [31] J Xuan, M Martinez, F Demarco, M Clement, SL Marcote, T Durieux, D Le Berre,
Annual Technical Conference ( {USENIX } {ATC } 12). 309–318. and M Monperrus. 2016. Nopol: Automatic repair of conditional statement bugs
[26] Ridwan Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. in java programs. IEEE Transactions on Software Engineering 43 (2016). Issue 1.
Concolic program repair. In Proceedings of the 42nd ACM SIGPLAN International [32] Michał Zalewski. 2022. American fuzzy lop. https://lcamtuf.coredump.cx/afl/
Conference on Programming Language Design and Implementation. 390–405. [33] Lingming Zhang, Guowei Yang, Neha Rungta, Suzette Person, and Sarfraz Khur-
[27] Shiqi Shen, Aashish Kolluri, Zhen Dong, Prateek Saxena, and Abhik Roychoud- shid. 2014. Feedback-driven dynamic invariant discovery. In Proceedings of the
hury. 2021. Localizing Vulnerabilities Statistically From One Exploit. In Proceed- 2014 International Symposium on Software Testing and Analysis. 362–372.
ings of the 2021 ACM Asia Conference on Computer and Communications Security.