0% found this document useful (0 votes)
23 views17 pages

Pythia: Compiler-Guided Defense Against Non-Control Data Attacks

Uploaded by

ameerrabie2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Pythia: Compiler-Guided Defense Against Non-Control Data Attacks

Uploaded by

ameerrabie2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Pythia: Compiler-Guided Defense Against

Non-Control Data Attacks


Sharjeel Khan Bodhisatwa Chatterjee Santosh Pande
[email protected] [email protected] [email protected]
Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology
Atlanta, GA, USA Atlanta, GA, USA Atlanta, GA, USA
Abstract Keywords: Control-Flow Bending, ARM-PA, Program Slices,
Modern C/C++ applications are susceptible to Non-Control Stack Canaries
Data Attacks, where an adversary attempts to exploit mem- ACM Reference Format:
ory corruption vulnerabilities for security breaches such Sharjeel Khan, Bodhisatwa Chatterjee, and Santosh Pande. 2024.
as privilege escalation, control-flow manipulation, etc. One Pythia: Compiler-Guided Defense Against Non-Control Data At-
such popular class of non-control data attacks is Control- tacks. In 29th ACM International Conference on Architectural Support
flow Bending, where the attacker manipulates the program for Programming Languages and Operating Systems, Volume 3 (ASP-
data to flip branch outcomes, and divert the program control LOS ’24), April 27-May 1, 2024, La Jolla, CA, USA. ACM, New York,
flow into alternative paths to gain privileges. Unfortunately, NY, USA, 17 pages. https://doi.org/10.1145/3620666.3651343
despite tremendous advancements in software security, state-
of-art defense mechanisms such as Control-flow Integrity 1 Introduction
(CFI), are ineffective against control-flow bending attacks Real-world applications are vulnerable to various data at-
especially those involving flipping of branch predicates. tacks, where an adversary with a malicious intent attempts
In this work, we propose a performance-aware defensive to exploit software memory corruption vulnerabilities. This
scheme, Pythia, which utilizes ARM’s pointer authentica- includes targeting instances of stack/string buffer overflow
tion (PA) hardware support for dynamic integrity checking [19, 41, 47, 78], integer overflow [9, 75, 76], heap corruption
of forward slices of vulnerable program variables that can be [29, 59, 64] , use-after-free [45, 81], etc with the objective
affected by input channels, and backward slices of branch of either gaining privileged access (privilege escalation), ex-
variables (including pointers) that may be misused by the ecuting unwanted code segments (control flow manipula-
attacker. We first s how t hat a n aive s cheme o f protecting tion), read/write values from memory (information leakage)
all vulnerabilities can suffer from an average runtime over- or other ways to hinder intended process execution. These
head of 47.88%. We then demonstrate how overheads can vulnerabilities are predominantly featured in the Common
be minimized by selectively adding ARM PA-based canaries Weakness Enumeration (CVE) 2023 [1] list.
for statically-allocated program variables and creating se- The attacks resulting from memory vulnerabilities can be
cure sections for dynamically-allocated variables to avoid broadly classified into two groups - Control-Data Attacks
overflows by input channels. Our experiments show that em- and Non-Control Data Attacks. Instances of control-data
ploying this hybrid approach results in an average overhead attacks typically involve an adversary corrupting a program
to 13.07%. We empirically show that Pythia offers better secu- code-pointer (function pointers, return statements, etc), to
rity than state-of-the-art data flow integrity (DFI) technique, ‘hijack’ the control-flow of a program and divert it to a desig-
especially in pointer-intensive code and C++ applications nated target. To defend against such control-flow hijacking
with 92% branches secured on an average and 100 % secured attacks, a common methodology is to monitor the targets of
in case of 3 applications. indirect control-transfer instructions and restrict them to a
set of feasible targets. This forms the basis of Control-Flow
CCS Concepts: • Software and its engineering → Com- Integrity (CFI) mechanism and over the past two decades,
pilers; • Security and privacy → Software and applica- CFI and its variants have evolved significantly over time and
tion security. have been a focus of extensive research in security literature
[2, 11, 12, 20–23, 25, 33, 35, 37–39, 52, 58, 61, 73, 77, 80, 82, 84].
However, recent works have highlighted that the current
CFI mechanisms are often inadequate and can even create
This work is licensed under a Creative Commons Attribution-ShareAlike additional security vulnerabilities [13, 18, 49].
International 4.0 License. On the other hand, Non-Control Data Attacks occur
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA
when an adversary corrupts program data that does not di-
© 2024 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0386-7/24/04
rectly manipulate program control such as function calls and
https://doi.org/10.1145/3620666.3651343 return addresses. A popular instance of non-control data
attacks is Control-flow Bending [13], where an attacker can

850
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

‘change’ the control-flow of program in such a way that it checking mechanisms in software solving both problems of
follows a valid path in the program control-flow graph (CFG). precision and overheads.
Such attacks typically involve flipping the branch outcomes Over the last few years, CPU vendors have started to add
to divert the branch target to a code region of interest (such new hardware extensions to enable cryptographic defense
as privileged or sensitive code). Such attacks are credible mechanisms for pointers. ARM pointer authentication (PA) [5]
threats [13, 16, 32, 34], and are harder to defend against, is one such mechanism, that allows verification of a pointer’s
since CFI techniques are unable to distinguish the “correct” integrity based on its address bits. One naive solution to
program execution path if there are multiple feasible exe- utilize the ARM-PA mechanism for preventing control-flow
cution paths in the CFG. In other words, analyzing static bending attacks, would be to simply convert all variables
artifacts such as return addresses of functions or branch tar- associated with conditional branches into pointers, and then
gets are not adequate to detect such attacks, which calls for sign and authenticate them. However, our experiments show
dynamic monitoring of the program data-flow. Such moni- that this leads to substantial overheads (47.88%), as every
toring incurs significant runtime overheads especially since variable requires to be encrypted when stored from memory,
program branches are very frequent (typically every 10𝑡ℎ and then authenticated before subsequent loads.
instruction is a branch). Prior works have proposed defense On the other hand, another diametrically opposite way
mechanisms for non-control data attacks either by isolating to tackle to most data attacks is simply to prevent an at-
sensitive parts of the program [27, 43, 67, 70, 85], introducing tacker from mitigating any program variables that can lead
language extensions to enforce constraints [36, 54–56, 66], to any subversion of control-flow. This means that we need
or by monitoring ‘critical’ program variables [83]. to eliminate the notion of dispatcher functions or gadgets
Tainting branch variables through memory corruptions [13], i.e functions that can overwrite their return address in
and overflow leading to privilege escalation are the tradi- the presence of attacker-supplied arguments. Usually, such
tional ways of control-flow bending attacks. In this work, functions involve program input channels, through which
we show that the problem and the range of attacks in this attackers can manipulate program variables. The presence
category could be broader. of dispatcher gadgets enables an attacker to overwrite mem-
Pointer Misdirection & Exploiting Pointer Dualism ory locations that comprise the branch predicate variables,
(§3): The first contribution of this work is to present a new thereby flipping the branch outcome. Thus, to establish a se-
class of non-control data attacks that are based on data pointer cure defense mechanism against control flow bending attacks
manipulation and exploiting pointer arithmetic. We show that with low overheads, we propose Pythia, a compiler-guided de-
an adversary can successfully carry out control-flow bend- fense mechanism that is based on a hybrid model of ARM-PA
ing attacks by either tainting variable(s) that contribute to and the elimination of dispatcher functions.
the computation of conditional branch predicates, or by ma- Complete Defense against Non-Control Data Attacks:
nipulating data pointer to point towards branch predicate The second contribution of this paper is a conservative scheme
variables. Such attacks cannot be satisfactorily defended by that protects against all known control-flow bending attacks.
employing static program analysis techniques such as Data- This is achieved by first determining two categories of “vul-
flow Integrity (DFI) [14] because of the inability to deal with nerable” program variables: a) forward-slices of input chan-
pointer arithmetic and lack of field sensitive analysis. nels variables (input channel construction), that can be lever-
To establish an effective defensive mechanism against aged by an attacker to cause buffer overflows into branch
data attacks stemming from input channels, entire program variables, and b) backward-slices of branch predicate vari-
execution path from input channel to the branch must be ana- ables (branch decomposition) that can be tainted with the
lyzed and protected. This also allows early detection of these malicious intent to cause control-flow bending. Program vari-
attacks and paves the way for adopting necessary mitigat- ables from both these “vulnerable” categories are encrypted
ing mechanisms. However, detecting the dynamic execution with ARM-PA across the board to ensure complete defense
path of an application is an extremely challenging problem, against control-flow bending attacks, the new class of pointer
because of frequent branch instructions and the presence misdirection, and pointer exploitation attacks. Additionally,
of pointers in the program. This also makes the program this scheme also performs alias analysis to handle pointer
path input-dependent, and it can change dynamically across variables in both categories.
various execution instances. Performing pointer-based path Performance-Aware Complete Defensive against Non-
analysis at individual branch-level granularity can be quite Control Data Attacks: The third contribution of this paper
challenging limiting the extent of protection offered by the is an end-to-end compiler framework for defending against
technique. In this work, we show that relying on the protec- non-control data attacks with low overheads. Pythia improves
tion of individual variables encountered along these paths upon the conservative defense mechanism by selectively us-
leveraging crypto-based integrity checks in hardware is a ing ARM-PA to minimize the runtime overheads. To achieve
more viable solution than developing data-flow integrity

851
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

this, Pythia further categories the vulnerable program vari- 1 void Access ( char pwd [20]) {
ables into two sub-categories: a) Statically-Allocated Vari- 2 char str [ SIZE ] , user [ SIZE ];
ables that reside in the program’s stack memory space, and 3 char * someinput ;
b) Dynamically-Allocated Variables that originate in the pro- 4
5 verify_user ( user , pwd ) ;
gram’s heap memory space. Pythia tackles the former by per-
6 if ( strncmp ( user , " admin " ,5) ) {
forming a re-orientation of stack variables, and then adding
7 // super user code
stack canaries with ARM-PA to detect any overflows, while 8 ...
the latter is handled with heap sectioning, that transfers 9 } else {
dynamically-allocated vulnerable program variables into a 10 // normal user code
“secure” section of the heap. This reduces the number of 11 ...
ARM-PA instructions by 4.25x, and brings down the runtime 12 }
overhead significantly. 13 strcpy ( str , someinput ) ;
Pythia has been tested on C/C++ benchmarks from SPEC 14 if ( strncmp ( user , " admin " ,5) ) {
2017 Benchmark Suite [10], popular Real-world examples 15 // super user code
[15], and in additional benchmarks such as Nginx [57]. We 16 ...
17 } else {
show that Pythia can detect these possible control-flow bend-
18 // normal user code
ing attacks in these workloads and achieved an average run-
19 ...
time overhead of only 13.07%, compared to 47.88% from CPA. 20 }
Compared to the conservative scheme, we also show that 21 }
Pythia reduces the number of ARM-PA instructions by a
Listing 1. Simple example [86] of a Non-Control Data
factor of total number of branches present in the program.
Attack exploiting string buffer-overflow leading to Privilege
2 Background and Motivation Escalation
2.1 Control-Flow Bending ProFTPd Attack leading to Information Leakage: The
Control-Flow Bending [13] is a generalization of non-control ProFTPd attack [34] is a Data-Oriented Programming (DOP)
data attack, where the attacker manipulates the program data based attack to eventually leak the private key by breaking
(non-code pointer) which results in diverting the control flow ASLR through the 𝑠𝑟𝑒𝑝𝑙𝑎𝑐𝑒 function in ProFTPd shown in
into alternative paths in the program CFG. CFI is completely Code 2. The vulnerability in the function is because of a
ineffective against this as the "bended" target is always in faulty check at Line 24. The attacker first triggers this over-
the set of feasible targets in the program CFG. Although flow check by constructing inputs through CWD (change
these attacks have been known to exist for a while, this directory) semantics. When ‘cp’ points to the last charac-
problem has remained untackled mostly and state-of-the- ter of the buffer 𝑏𝑢 𝑓 , that is (𝑐𝑝 − 𝑏𝑢 𝑓 + 1) equals blen, the
art approaches such as (DFI) [14] leaves important practical check returns 𝑓 𝑎𝑙𝑠𝑒 and Line 27 overwrites string termina-
cases of pointer intensive code and C++ application bases tor inside the buffer. During subsequent iterations of the
untackled prompting this work. while loop, at Line 14, 𝑠𝑡𝑟𝑙𝑒𝑛(𝑝𝑏𝑢 𝑓 ) > 𝑏𝑙𝑒𝑛 and invoking
sstrncpy overflows the buffer into the stack overwriting lo-
2.2 Motivating Examples: Non-Control Data Attacks cal variables such as ‘rarr’ and ‘cp’. Since both the source
String-Buffer Overflow leading to Privilege Escalation: and destination in the string copy function 𝑠𝑠𝑡𝑟𝑛𝑐𝑝𝑦 at Line
A simple example for a non-control data attack [86] leading 14 are corrupted, the attacker controls the number of bytes
to privilege escalation, in shown in Listing 1. In this example, copied in the successive iterations of the while loop.
the user is verified by an input password, and the result is as- It can be seen that the root cause of the above attacks is
signed to the variable “user”. The user-variable is frequently the ability to flow (taint) values into branch predicates or
checked to provide access to certain operations. However, in position the respective pointers involved by manipulating
between the checks, the function interacts with the user to them. Data flow integrity techniques such as DFI fail in
get some other inputs. Input pointer “someinput” can be ma- the presence of pointers when it comes to field insensitive
nipulated by the user, causing a buffer overflow vulnerability analysis and do not deal with pointer arithmetic. This leads
in line 13, and leading to superuser access. This attack is not us to propose defensive mechanisms that first identify these
handled by any CFI mechanism, because technically both vulnerabilities and defend against these attacks by selectively
the targets in lines 15 and 18 are feasible. This constitutes a leveraging ARM-PA.
classic instance of control-flow bending where CFI is unable 1 char * sreplace ( char *s , ...) {
to distinguish the ‘possible’ static target from the ‘actual’ 2 ...
dynamic target, where line 15 should only be a target when 3 char *m , *r , * src = s , * cp ;
4 char ** mptr , ** rptr ;
the user has privileged access.
5 char * marr [33] , * rarr [33];

852
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

6 char buf [ BUF_MAX ] = { '\0 '}, * pbuf = NULL ; respectively. This allows us to capture the entire set of “vul-
7 size_t mlen =0 , rlen =0 , blen ; cp = buf ; nerable” program variables that can be tainted and result
8 ... in control-flow bending. For pointer-type variables, an alias
9 while (* src ){ analysis is performed to determine all possible variables that
10 for ( mptr = marr , rptr = rarr ;* mptr ; mptr ++ , rptr ++) {
can be pointed-to by a specific pointer that has a potential
11 mlen = strlen (* mptr );
alias with these variables. The conservative scheme simply
12 rlen = strlen (* rptr );
13 if ( strncmp ( src ,* mptr , mlen ) == 0) {
encrypts all such variables with ARM-PA instruction, result-
14 sstrncpy (cp ,* rptr , blen - strlen ( pbuf ));
ing in complete defense against control-flow bending attacks,
15 if ((( cp + rlen ) - pbuf + 1) > blen ){ but with substantial runtime overheads.
16 cp = pbuf + blen - 1; On the other hand, Pythia follows a performance-aware
17 } /* Overflow Check */ approach to reduce the runtime overheads incurred by the
18 ... conservative scheme. After the branch and input channel
19 src += mlen ; decomposition, Pythia then further classifies the vulnera-
20 break ; ble variables into statically-allocated (stack) variables and
21 } dynamically-allocated (heap) variables. The rationale behind
22 }
such an approach is that statically allocated variables in stack
23 if (!* mptr ) {
memory have a fixed address associated with them, and
24 if (( cp - pbuf + 1) > blen ){ // off -by - one
error
their integrity can be checked by adding a canary to them,
25 cp = pbuf + blen - 1;
which acts as an indicator for potential overflows. However,
26 } /* Overflow Check */ this approach does not work satisfactorily for dynamically-
27 * cp ++ =* src ++; allocated program variables that reside in the heap memory,
28 } because of limited control over the memory allocation at
29 } the user level. To tackle such variables, Pythia sections the
30 } heap memory into an isolated section where the vulnerable
Listing 2. Example of ProFTPd Vulnerability leading to program variables reside, and into a shared section where
Information Leakage other variables are allocated. To achieve this, Pythia uses
a custom implementation of malloc that is combined with
ARM-PA checks. This allows Pythia to selectively use ARM-
2.3 Background: ARM Pointer Authentication PA, instead of applying it across the board and allows it to
The ARM Pointer Authentication (ARM-PA) [5] is a special- minimize the runtime overheads significantly.
ized hardware mechanism that ensures the integrity of data
and code pointers associated with the program. It was first
introduced in the ARMv8-A architecture. The intuition here 2.5 Threat Model & Attacker Goals
is that modern architectures allocate a larger number of bits In this paper, we assume that the attacker can corrupt any
for defining the address space of pointers. However, not all program variable, at any point in time, with unlimited at-
bits are required to define the address space. For instance, in tempts. This essentially means that a control-flow bending
64-bit architectures, the address space of pointers does not attack can occur at any point during program execution.
require more than 40-bits. Thus, these unused bits are uti- In our threat model, we assume that either an attacker can
lized to assign a Pointer Authentication Code (PAC). Based on directly corrupt the value that participates in the branch
the PAC bits, it’s possible to determine if a pointer’s integrity predicate or can corrupt a value that comprises a backward
has been compromised or not. Recently, this mechanism has slice of a branch ie, a value that participates in the computa-
gained popularity as memory safety mechanisms for both tion of branch predicate value through an input channel. A
control and non-control data attacks [31, 48, 51, 65]. third mechanism is also available to the attacker to position
a pointer to point to the branch variable so that the attacker
2.4 Overview of Pythia can load a malicious value into the branch variable using the
We now describe Pythia (Fig. 1) a compiler-guided defen- alias of the pointer (ie, by leveraging the l-value of pointer
sive framework that utilizes a performance-aware approach dereference encountered before the branch). Thus, the objec-
to combat the problem of non-control data attacks, namely tive of Pythia is to detect such attacks as early as possible,
control-flow bending. Pythia improves on a conservative de- since corrupt program variables can lead to erroneous pro-
fensive mechanism that first analyzes all conditional branch gram states as a cascading effect.
statements, and input channels present in the application. The goal of the attacker is to divert the application’s
The conditional branch variables are decomposed, and input control-flow into alternate execution paths, either to ob-
channel variables are mapped, into their constituent vari- tain privileged access, leak information or hinder program
ables by taking their program back-slices and forward-slices execution in any manner. For the purposes of this paper,

853
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

Figure 1. Pythia: Compiler-Guided Defense Framework against Control-Flow Bending Attacks. The vulnerable program variables
are segregated based on whether they are statically allocated (stack) or dynamically allocated (heap) in the program memory layout, and
Pythia leverages ARM PA for constructing canaries for stack-allocated variables, and for isolating the heap.

we consider any diversion in control-flow as a ‘successful 3 Pointer-Based Control-Flow Bending


attack’. Attacks
2.6 Basic Terminologies In this section, we describe a class of attack that involve ma-
nipulation of program pointers and use of pointer arithmetic
Definition 2.1 (Input Channel (IC)). The input channel to achieve control flow bending.
is any function that is vulnerable to memory corruption.
Attackers manipulate these functions to modify the variables 3.1 Exploiting Pointers and Array Dualism
of the program’s memory.
Often, programmers write optimized code that exploits the
In this paper, we consider six different categories of input dualism between program pointers and array pointers. Con-
channel functions: print, scan (Reads strings specifically in sider the code snipped presented in Listing 3. In this case,
a format - scanf ), move/copy, get (e.g. fgets), put (e.g. strcpy, an input channel variable 𝑘 (line 3) is used to increment the
memmove, memcpy) and map (maps files or devices to the base address of array 𝐴𝑟𝑟 (line 4) through 𝑙. In the normal,
virtual address space - mmap) non-malicious execution of the code (when p is not aliased
to m), the privileged code is bypassed. In this code snippet,
Definition 2.2 (Def-Use (DU) Chains). Use-Def Chains is a
however, an attacker can input a malicious value of 𝑘 over-
widely used data-flow graph that links a variable definition
flowing into 𝑙, which can set the pointer 𝑝 point to 𝑚. In
with its corresponding definition.
such an event, the attacker can make a pointer 𝑝 point to
Definition 2.3 (Upwards-Exposed Use). A program variable m, setting a new value of m to 𝑛 + 1, bending the predicate
v has an upwards-exposed use at a program point p, if there 𝑚 > 𝑛, and gaining privileged access.
exists only one path from its definition to p. In summary, these attacks arise from two possible vulnera-
bilities: 1) Input channel variables gaining access to program
To quantify the early detection capabilities of a security
pointers, that can potentially point across the entire range
scheme, we define attack distance:
of variables, 2) Variables participating in branch predicates
Definition 2.4 (Attack Distance). Attack distance represents can be tainted to flip the branch outcomes.
the number of static program instructions between the begin-
1 int *p , Arr [100] , l , k ;
ning of a backward slice where the protection starts and the
2 int m , n;
branch predicate. 3 p = Arr ; // p stores the base address of Arr
4 scanf ( " % d " , & k ) ;
The attack distance shows how high the protection must
5 p = p + l ; // l represents the stride for an
start to secure the input channel in terms of the number of element
instructions. Intuitively, if a technique’s attack distance is 6 ...
not greater than or equal to the attacker’s attack distance 7 m = n -1;
(which is the input channel), it will not be able to protect 8 * p = n +1; // p aliased to m by setting right value
of k and this alias sets m = n +1
the branch. In such cases, an attacker can taint the branch
9 if (m > n ) {
predicate’s backward slice through the unprotected values 10 // privilaged execution
in the program without being detected by the protection 11 }
scheme, leading to successful attacks. Due to these reasons,
an input channel based attack can be detected only if the Listing 3. Simple example of an adversary exploiting the
defensive mechanism has a large enough attack distance that dualism between array pointers and program pointers
is higher than the input channel.

854
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

4 Pythia: Design and Implementation of control flow. The intuition here is that each branch vari-
This section describes the Pythia Compiler Framework, which able essentially is an upwards-exposed use of other program
incorporates lightweight defense mechanism against control- variable(s) above the branch predicate in the program [53].
flow bending attacks. We first describe a conservative de- This backwards traversal is performed transitively from the
fensive mechanism that can thwart all known instances of branch prediction to the start of its function. This process is
control-flow bending attacks (§4.1 - 4.2), which serves as illustrated with a simple example in Fig 2.
a baseline. We then illustrate the performance-aware de- Algorithm 1: Branch Decomposition Algorithm
fensive approach taken by Pythia (§4.3), which minimizes 1 Input: Conditional Branch Instruction 𝐵𝑟 𝐼𝑛𝑠𝑡
ARM-PA checks over the conservative baseline. An upper- Result: Branch sub-variable set 𝐵𝑠𝑢𝑏 (𝐵𝑟 𝐼𝑛𝑠𝑡 )
bound probability estimate for brute force attacks targeting 2 𝑤𝑜𝑟𝑘𝑙𝑖𝑠𝑡, 𝐵𝑠𝑢𝑏 ← 𝜙
stack canaries, and a qualitative analysis of their security 3 𝑝𝑎𝑟𝑒𝑛𝑡 _𝑓 𝑢𝑛𝑐𝑡 ← 𝑔𝑒𝑡𝑃𝑎𝑟𝑒𝑛𝑡 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝐵𝑟 𝐼𝑛𝑠𝑡 )
𝐵𝑣𝑎𝑟𝑠 ← 𝐵𝑟 𝐼𝑛𝑠𝑡 .𝑔𝑒𝑡𝑉 𝑎𝑟𝑠 ( )
strengths is discussed in §4.4. 4
for each variable 𝑏𝑣𝑎𝑟 ∈ 𝐵𝑣𝑎𝑟𝑠 do
The core insight that is leveraged by Pythia is that control-
5
6 𝑑𝑒 𝑓 _𝑠𝑒𝑡 ← 𝑔𝑒𝑡𝑎𝑙𝑙𝐷𝑒 𝑓 𝑖𝑛𝑖𝑡𝑖𝑜𝑛𝑠 (𝑏𝑣𝑎𝑟 )
flow bending attacks can be triggered by only a finite sub- 7 for each definition 𝑑𝑒 𝑓 ∈ 𝑑𝑒 𝑓 _𝑠𝑒𝑡 do
set of program variables. These variables are either branch 8 worklist.push_back(def)
predicate variables, variables with input channels, or program 9 end
pointers. In this paper, we call the collective set of these vari- 10 end
ables as vulnerable variables. The goal here is to isolate 11 while worklist is not empty do
such variables so that their integrity can be authenticated 12 𝑑 ← remove a definition from the worklist
for each operand 𝑜𝑝 ∈ 𝑑 do
with ARM-PA. 13
14 if 𝑜𝑝 ∈ 𝐵𝑡 then
15 𝐵𝑠𝑢𝑏 .push_back(op)
4.1 Branch Decomposition & Input Channel
16 else
Construction 17 𝑑𝑒 𝑓 _𝑠𝑒𝑡 ← 𝑔𝑒𝑡𝐴𝑙𝑙𝐷𝑒 𝑓 𝑖𝑛𝑖𝑡𝑖𝑜𝑛𝑠 (𝑜𝑝 )
The program path taken by an application during its execu- 18 for each definition 𝑑𝑒 𝑓 ∈ 𝑑𝑒 𝑓 _𝑠𝑒𝑡 do
tion can vary in the presence of branch statements. Specifi- 19 worklist.push_back(def)
cally, the dynamic program control flow is contingent upon 20 end
end
the individual program variables that constitute the branch 21
end
predicate. However, the program control flow can also be sub-
22
23 end
verted by manipulating other program variables that entail
a direct/in-direct definition of branch variables. Therefore, The process of branch sub-variable computation has been
we need to consider all such possible program variables that summarized as branch decomposition algorithm (Algorithm 1).
can be exploited to flip the outcome of a conditional branch It follows a worklist-based approach that iteratively captures
and subvert the program’s control-flow. sub-variables emanating from branch variables by traversing
A naive approach to solve this problem would be to sim- their def-use chain in a backwards (against the control-flow)
ply authenticate and secure all possible program variables. direction. The worklist keeps track of all definitions remain-
However, adopting such an approach will entail frequent ing to be decomposed at a particular time-step. The branch
authentication across every variable and their uses, and will decomposition algorithm deals with pointer variables by
incur significant overheads. Thus, to minimize runtime over- loading the value stored at the address pointed, after per-
heads, we first need to determine the set of vulnerable pro- forming null pointer checks. The alias of all the pointers
gram variables, which can act as a source of control-flow present in the function and their backward slices are also
bending attacks. We start by defining the notion of branch analyzed by this algorithm.
sub-variables: On the other hand, in order to refine the set of vulnerable
program variables that can lead to control-flow bending at-
Definition 4.1 (Branch Sub-Variable). A program variable 𝑣,
tacks, we can analyze program variables that are a part of
is a branch sub-variable for a conditional branch 𝑏, if it’s either
user input channels. Similar to branch decomposition, the
a branch predicate variable of 𝑏, or if it contains an upwards-
set of variables that involve input channel, can be computed
exposed use of at least one of branch predicate variables of 𝑏.
using their forward-slices. A forward program slice is ob-
In a nutshell, the set of branch sub-variables of a branch tained by traversing the use-def chains of variables along
predicate statement represents every possible program vari- the direction of dataflow. This is the exact reverse of the
able that can affect the outcome of the given branch. For the branch decomposition algorithm, as here we find the sub-
computation of the branch sub-variable set, we leverage variables by walking the use-def chain in a forward manner,
the backward program slices of branch predicate variables. analyzing any definition that uses the input channel vari-
The backward program slice of a variable is obtained by ables. This process is illustrated in Fig 2. The input channel
traversing its Use-Def (UD) Chain (2.2) against the direction

855
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

Algorithm 2: Complete Pointer Authentication


(CPA) Algorithm
1 Input: Set of all conditional branches 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 present in
program
Result: Encrypt and Authenticate the set of vulnerable variables
associated with Conditional Branches using ARM-PA
2 for each branch instruction 𝑏𝑟 ∈ 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 do
3 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠 ← 𝑏𝑟𝑎𝑛𝑐ℎ𝐷𝑒𝑐𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑏𝑟 )
4 for each definition 𝑑𝑒 𝑓 ∈ 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠 do
5 ARM_PA.encrypt(def)
6 for each load 𝐿𝑣 ∈ 𝑑𝑒 𝑓 do
7 ARM_PA.authenticate(𝐿𝑣 )
8 end
9 for each store 𝑆 𝑣 ∈ 𝑑𝑒 𝑓 do
10 ARM_PA.encrypt(𝑆 𝑣 )
Figure 2. Simple illustration of computing vulnerable pro- 11 end
gram variables by branch decomposition & input channel 12 end
end
construction. 13

construction algorithm also follows the same structure as the (on average), the maximum number of extra instructions that
branch decomposition worklist algorithm. The intersection are instrumented in the program is given by:
of the sub-variables sets obtained from branch decomposition
and input-channel construction, constitutes the refined set of Avg ARM-PA instructions = 𝐵 𝑣 (2 𝑢 + 1) (1)
vulnerable program variables.
Once the set of vulnerable program variables is computed An algorithm depicting the complete pointer authentica-
and refined, the next step is to secure them to prevent control- tion scheme is presented in Algo. 2. The scheme takes in all
flow bending attacks. We will now discuss various defensive vulnerable variables associated with conditional branches
mechanisms that can be used to secure these variables. and input-channels present in the program. It first computes
the set of vulnerable program variables by branch decom-
4.2 Conservative Approach: Complete Pointer position (line 3). It then creates a pointer reference for each
Authentication using ARM-PA one of them. In addition, it also performs alias analysis for
In this defensive approach, all vulnerable program variables these pointers. It then adds ARM-PA encryption on the store
are simply encrypted using ARM-PA mechanism (without instruction, and then ARM-PA authentication (decryption)
any refinement), and then decrypted at every subsequent for each subsequent use.
load, to authenticate their integrity. ARM-PA leverages the
unused address bits in program pointers to maintain vari- 4.3 Performance-Aware Approach: Stack Canaries &
able integrity. Therefore, in order to successfully apply this Heap Sectioning
scheme to vulnerable variables, data pointers are created Pythia further refines the set of vulnerable variables by first
for each non-pointer vulnerable variable. Each created data segregating statically and dynamically allocated program
pointer is encrypted at its definition, and when it’s stored to variables. For statically-allocated program variables that re-
the memory, its integrity is checked before every use. In case side in the program’s stack memory, Pythia relocates them
an attacker has attempted to taint any variable, it will be on the bottom of the stack memory to isolate and capture the
detected before it is loaded from the memory. This scheme of effect of buffer overflows effectively. For dynamically allo-
encrypting every vulnerable program variable results in the cated variables, Pythia divides the program’s heap memory
inclusion of at least two ARM-PA instructions. In addition, into two sections (isolated and shared), and vulnerable vari-
this scheme also finds out the may-aliases of pointers and ables are relocated to the isolated section to prevent buffer
ensures that they adhere to the ARM-PA encryption and overflow attacks.
decryption scheme for accessing. ★ Securing Statically Allocated Variables: Pythia first
For a single vulnerable variable 𝑖 with 𝑢𝑖 number of uses, detects all the branch sub-variables that are allocated in the
this conservative scheme would introduce one additional in- program’s stack memory, within a function. It then performs
struction for encrypting during store (each variable can be input channel construction on such variables to determine
defined only once in SSA IR form) plus the 𝑢𝑖 number of addi- the interaction between the variable uses and input chan-
tional decrypting instructions for each use. This leads to 1 +𝑢𝑖 nels. A stack-allocated variable is marked as vulnerable if
number of additional program instructions, for vulnerable any of its (direct/indirect) uses is passed on as arguments
program variable 𝑖. Therefore, in a program with 𝐵 condi- to input channels. Pythia re-arranges the stack memory lay-
tional branches and 𝑣 vulnerable variables, each with 𝑢 uses out to allocate the vulnerable variable to the stack bottom

856
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

(lower address). In the event of any overflow triggered by allocated program variables. Although it’s straightforward
an adversary, the stack memory space of non-vulnerable to detect vulnerable heap-allocated variables that interact
variables will not be affected since the stack memory usually with input channels, performing heap re-layout is extremely
grows only in one direction, i.e towards the higher address. challenging since it involves performing non-trivial changes
Any possible try of writing at the beginning of a stack array in the system’s default memory allocation algorithm. The
would cause a bus error so it can only write towards the end goal here is to develop a simple light-weighted scheme that
of the array (the higher address). doesn’t involve adding canaries across the entire structure
Algorithm 3: Stack Re-layout & Canary Algorithm of heap memory, which will cause significant overheads, and
1 Input: Set of all conditional branches 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 present in
defeat the purpose of dynamic memory allocation.
program ★ Securing Dynamically Allocated Variables: In order
Result: Perform stack re-layout & encrypt vulnerable variables to protect dynamically allocated program variables, Pythia
with canaries splits the program heap into an isolated section and shared
2 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠𝑠 ← ∅ section. The vulnerable program variables are allocated to
for each branch instruction 𝑏𝑟 ∈ 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 do
3
the secure portion of the heap, and the other variables are on
4 𝑏𝑎𝑐𝑘𝑠𝑙𝑖𝑐𝑒𝑑_𝑣𝑎𝑟𝑠𝑠 ← 𝑏𝑟𝑎𝑛𝑐ℎ𝐷𝑒𝑐𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑏𝑟 )
5 for each branch variable 𝑏 𝑣 ∈ 𝑏𝑎𝑐𝑘𝑠𝑙𝑖𝑐𝑒𝑑_𝑣𝑎𝑟𝑠 do
the shared portion of the heap. Pythia accomplishes heap sec-
6 if 𝑏 𝑣 ← 𝑖𝑠𝑆𝑡𝑎𝑡𝑖𝑐𝑀𝑒𝑚𝑜𝑟 𝑦𝐴𝑙𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 () then tioning by creating two variations of the memory allocation
7 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠𝑠 ← algorithm: one for isolated allocation and shared allocation.
𝐼𝑛𝑝𝑢𝑡𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 (𝑏 𝑣 ) These algorithms handle memory allocations for specific
8 end address ranges. Pythia first detects the dynamically allocated
end
9
vulnerable program variables that interact with input chan-
10 end
nels. It then replaces their heap memory allocation for them
11 for each vulnerable stack variable 𝑠 𝑣 ∈ 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠𝑠 do
12 𝑠𝑡𝑎𝑐𝑘𝑀𝑒𝑚𝑜𝑟 𝑦𝐿𝑎𝑦𝑜𝑢𝑡 (𝑠 𝑣 ) to be mapped in the isolated heap. Pythia’s custom memory
13 𝑐𝑎𝑛𝑠𝑣 ← 𝑎𝑑𝑑𝐶𝑎𝑛𝑎𝑟 𝑦 (𝑠 𝑣 ) allocation is based on glibc’s malloc implementation, and
14 𝑠𝑡𝑎𝑐𝑘𝑀𝑒𝑚𝑜𝑟 𝑦𝐿𝑎𝑦𝑜𝑢𝑡 (𝑐𝑎) both libraries are linked at the compile time.
15 𝐴𝑅𝑀_𝑃𝐴.𝑒𝑛𝑐𝑟 𝑦𝑝𝑡 (𝑐𝑎) Algorithm 4: Heap Sectioning
16 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 _𝑢𝑠𝑒𝑠 ← 𝑔𝑒𝑡𝐼𝑛𝑝𝑢𝑡𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑈 𝑠𝑒𝑠 (𝑠 𝑣 )
1 Input: Set of all conditional branches 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 present in
17 for each dispatcher use 𝑑𝑢 ∈ 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 _𝑢𝑠𝑒𝑠 do
program
18 𝐴𝑅𝑀_𝑃𝐴.𝑑𝑒𝑐𝑟 𝑦𝑝𝑡 (𝑐𝑎)
Result: Perform Heap-Sectioning & encrypt dynamic vulnerable
19 𝑑𝑒_𝑟𝑒 𝑓 ← ∗𝑐𝑎 variables
20 𝐴𝑅𝑀_𝑃𝐴.𝑒𝑛𝑐𝑟 𝑦𝑝𝑡 (𝑐𝑎) 2 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠ℎ ← ∅
21 end 3 for each branch instruction 𝑏𝑟 ∈ 𝐵𝑟𝑎𝑛𝑐ℎ𝑒𝑠 do
22 end 4 𝑏𝑎𝑐𝑘𝑠𝑙𝑖𝑐𝑒𝑑_𝑣𝑎𝑟𝑠ℎ ← 𝑏𝑟𝑎𝑛𝑐ℎ𝐷𝑒𝑐𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑏𝑟 )
for each branch variable 𝑏 𝑣 ∈ 𝑏𝑎𝑐𝑘𝑠𝑙𝑖𝑐𝑒𝑑_𝑣𝑎𝑟𝑠 do
However, despite the stack layout re-orientation, over-
5
6 if 𝑏 𝑣 ← 𝑖𝑠𝐷𝑦𝑛𝑎𝑚𝑖𝑐𝑎𝑙𝑙 𝑦𝑀𝑒𝑚𝑜𝑟 𝑦𝐴𝑙𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ( ) then
flows from vulnerable stack variables can still spill into one 7 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠ℎ ←
another and lead to control-flow bending. To solve this prob- 𝐼𝑛𝑝𝑢𝑡𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 (𝑏 𝑣 )
lem, Pythia adds canaries with random values between each 8 end
vulnerable stack variable. Pythia inserts integrity checks in 9 end
the stack canaries, which serve as an indicator of buffer over- 10 end
𝑠𝑎𝑓 𝑒𝐴𝑑𝑑𝑟 ← 𝑝𝑒𝑟 𝑓 𝑜𝑟𝑚𝐻𝑒𝑎𝑝𝑆𝑒𝑐𝑡𝑖𝑜𝑛𝑖𝑛𝑔 ( )
flow in a vulnerable stack variable. The initialization of the 11
for each vulnerable heap variable ℎ 𝑣 ∈ 𝑣𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒_𝑣𝑎𝑟𝑠ℎ do
stack canary value is chosen at random, to prevent an adver-
12
13 𝑟𝑒𝑙𝑜𝑐𝑎𝑡𝑒 (ℎ 𝑣 , 𝑠𝑎𝑓 𝑒𝐴𝑑𝑑𝑟 )
sary from reverse-engineering the mechanism by analyzing 14 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 _𝑢𝑠𝑒𝑠 ← 𝑔𝑒𝑡𝐼𝑛𝑝𝑢𝑡𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝑈 𝑠𝑒𝑠 (ℎ 𝑣 )
the program binary. To minimize runtime overheads, Pythia 15 for each dispatcher use 𝑑𝑢 ∈ 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 _𝑢𝑠𝑒𝑠 do
uses hardware ARM-PA since it directly utilizes the memory 16 𝐴𝑅𝑀_𝑃𝐴.𝑑𝑒𝑐𝑟 𝑦𝑝𝑡 (𝑑𝑢 )
location’s address bits for encryption & decryption. In case 17 𝑑𝑒_𝑟𝑒 𝑓 ← ∗𝑑𝑢
of a memory violation, the ARM-PA decryption mechanism 18 ...
triggers a program crash. 19 ∗𝑑𝑢 ← 𝑑𝑒_𝑟𝑒 𝑓
An algorithm depicting the stack re-layout and the canary 20 𝐴𝑅𝑀_𝑃𝐴.𝑒𝑛𝑐𝑟 𝑦𝑝𝑡 (𝑑𝑢 )
end
encryption is presented in Algorithm 3. This scheme ensures 21
end
that no vulnerable program variable can overwrite any other
22

program variable, which prevents control-flow bending at- After sectioning the program heap memory into isolated
tacks resulting from tainting statically-allocated program and shared regions, Pythia uses ARM-PA to encrypt the vul-
variables through malicious inputs. nerable variable and its uses. The scheme of securing dynam-
Similar to statically allocated program variables, control- ically allocated variables has been summarized in Algorithm
flow bending attacks can also originate from dynamically 4. It follows a similar flow as the stack layout algorithm (3)

857
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

Figure 3. Illustration of Pythia’s defensive mechanism on a simple program. The statically vulnerable allocated variables 𝑎, 𝑏, 𝑑
are relocated to the bottom of stack memory, and equipped with ARM-PA encrypted canaries. The dynamically allocated
vulnerable variable 𝑝 is moved to the isolated section of heap memory, where it is encrypted.

of performing branch decomposition, and then input chan- ★ Tackling Interprocedural Overflows: A special case
nel construction, only for dynamically allocated program of control-flow bending can occur during function calls.
variables (lines 2-10). The heap sectioning procedure marks When a function (caller) calls another function (callee) within
specific memory address regions in the heap memory as its body, it might be possible for the callee function to trig-
isolated (line 11), vulnerable variables are allocated in those ger a buffer overflow which might spill into caller’s stack
‘safe’ addresses (line 13). The portion of heap memory dis- canaries. This typically happens when the callee function’s
tributed between isolated and shared regions can be adjusted arguments are passed by reference (or pointers). For stati-
based on the number of secure heap variables. Finally, ARM- cally allocated variables passed by references (or pointers),
PA is leveraged to encrypt these variables (line 14), and all Pythia performs alias analysis to check if they may point
of their uses (lines 15-18). to any of the vulnerable variables, and stores their value in
a global pointer canary (authenticated with ARM-PA). For
4.4 Analyzing Pythia’s Security Strengths & dynamically allocated variables passed by pointers, Pythia
Overheads checks if such variables are aliased with interprocedural heap
★ Instruction Overhead: In contrast to the conservative allocation functions (e.g. malloc) and just uses the pointers
scheme (§4.2), the performance-aware scheme of Pythia re- passed. If we get a case such that the variable passed is a
duces the total number of ARM-PA instructions that are statically allocated variable along one call chain and a dynam-
instrumented in the program. For both statically allocated ically allocated variable, we pass the dynamically allocated
program variable 𝑠𝑣𝑖 and dynamically allocated program variable as the argument and in the global pointer for canary
variable 𝑑𝑣𝑖 , this scheme adds one encryption and one de- because we would have authenticated it with ARM-PA along
cryption for every use 𝑑𝑢𝑖 with input channel. If a program the statically allocated variable call chain. Therefore, with
has 𝑠𝑣 statically allocated variables and ℎ𝑣 dynamically al- the use of global pointers to canaries, Pythia can detect buffer
located variables, with 𝑑𝑢 number of uses (on average), an overflows that span across different function calls.
upper-bound of additional ARM-PA instructions (𝐼 ) can be ★ Handling Brute-Force/Canary Attacks: Encryption
obtained as: and authentication-based security mechanisms such as ARM-
PA can often be susceptible to brute-force attacks where an
𝐼 ≤ 𝐵 [𝑠𝑣 (1 + 3 𝑑𝑢) + ℎ𝑣 (1 + 2 𝑑𝑢)] (2) attacker repeatedly runs the application to guess the canary
values (pointer authentication code) correctly. Pythia ensures
𝐼 ≤ 𝐵 [𝑠𝑣 (1 + 2 𝑑𝑢) + 𝑠𝑣 𝑑𝑢 + ℎ𝑣 (1 + 2 𝑑𝑢)] (3)
that the canary values are re-randomized on every entry to
≤ 𝐵 [(1 + 2 𝑑𝑢) 𝑣 + 𝑠𝑣 𝑑𝑢]

(4) the function. A wrong guess will crash the program which
≈ 𝐵 (1 + 2 𝑑𝑢) 𝑣 << 𝐵 (1 + 2 𝑢) 𝑣

(5) will force the attacker to guess the canary value across differ-
ent executions of the application. This makes each program
In Eq. 5, 𝑣 ′ represents the sum of vulnerable statically and invocation independent of the previous attempt. Therefore,
allocated variables. Comparing this with the conservative the probability of an adversary guessing the canary value of
scheme (Eq. 1), we find that the number of refined variables is a Linux system with 24-bit PAC correctly within 𝑁 repeated
much lesser than the actual vulnerable variables, i.e 𝑣 ′ << 𝑣. attempts for a program with 𝑘 canary:
Thus, in practice the upper bound in Eq. 5 leads to a much
smaller value than in Eq. 1, despite the extra 𝑠𝑣 × 𝑑𝑢 term.

858
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

• How effective is the conservative scheme in defend-


 𝑁 −1 ing against non-control data attacks, and what are its
1 1 runtime overheads?

𝑘
P(Brute Forcing) = 𝑘 1 − 24 ≈ (6)
2 224 224 • How do ARM-PA instructions and heap sectioning
instructions affect the performance of benchmarks?
Eq. 6 shows that there is 1 in 16 million chance that a brute- • How secure is Pythia’s performance-aware approach
force attack can successfully guess the authentication value for involving stack canaries and heap sectioning approach
one canary. In addition, the brute force can be modeled as a against non-control data attacks? Does it manage to re-
geometric random variable. As a result, the expected number duce the runtime overheads and ARM-PA instructions
of tries (𝐸 [𝑋 ]) is 1/𝑝 where 𝑝 = 2124 meaning it will take 16 compared to the conservative scheme?
million tries before the attacker can figure out a canary. For • How does Pythia compare to DFI in terms of secur-
k canaries, it just needs to divide these chances over those k ing vulnerable branches in applications that can be
canaries. manipulated through the input channels?
In addition, we re-randomize whenever the canary’s neigh- • Can Pythia be effectively used to detect the control-
bor stack variable will be used by an input channel. As a flow bending attacks with low overheads in real-world
result, any value extracted through a buffered read would examples?
be useless since the canary’s value had changed already. Experimental Setup: The experiments were conducted
One important point to note here is that because of this re- on an Apple MacBook M1 Pro, running Linux Ubuntu 22.04.
randomization, the “window” in which an attacker can break The system has 10-cores CPU (8 performance cores at 3.2
a canary exists only during the specific function invocation, GHZ and 2 efficiency cores at 2.06 GHZ) with 16-core GPU
between the start of the function’s execution and the load and 16-core neural engine and 24 MB L3 cache. Our exper-
instruction protected by the canary. iments were ran without frequency scaling or any manual
core-scheduling.
5 Implementation Benchmark Programs: Our experiments were performed
Pythia was implemented as a unified set of compiler passes in on programs from SPEC 2017 Benchmarks [10] on ref inputs,
LLVM 14. The Pythia code is split between the LLVM module Nginx [57] and also on representations of real-world exam-
passes, LLVM machine code passes, the random library, and ples that have control flow bending vulnerabilities shown in
the secure memory allocation library. The codebase is around §2.2.
∼3420 LOCs including some edits to LLVM original files Performance Baselines: We evaluate Pythia’s perfor-
to include our passes and intrinsics. The secure memory mance comparison with two different baselines:
allocation is based on glibc malloc implementation. • Vanilla Execution: Application is compiled with O3-
Module Pass: The algorithms presented in the paper are flag without adding any new instruction.
implemented as LLVM Module Pass. LLVM’s mem2reg pass • Complete Pointer Authentication (CPA): Conser-
transforms the program IR by promoting memory references vative defensive scheme described in §4.2, where all
into register references, thereby reducing the loads/stores. the (un-refined) vulnerable variables are simply en-
We created intrinsic functions for ARM-PA encryption for crypted with ARM-PA.
the remaining loads, stores, and alloca instructions, along
with metadata for the backend machine pass. 6.1 Performance Evaluation
Alias Analysis: Pythia uses LLVM’s in-built alias analyses
The performance results are normalized against the vanilla
(basic-aa, globals-aa, aa, and tbaa) for handling pointers in
execution baseline, where no security mechanism is utilized.
backward and forward program slices.
★ Complete Pointer Authentication (CPA): CPA scheme
Machine Pass: To handle register spills at the machine
encrypts the un-refined set of vulnerable program variables.
code generation level, we leverage the instrumented meta-
As illustrated in Fig. 6(a), the vulnerable variables set in CPA
data and intrinsics to detect additional encryption & authen-
consist of about 29% of all program variables on average.
tication points. In addition, we use the same data to add
These variables are encrypted at least once (initial store),
canaries that were missed or need to be moved. The canaries
and decrypted at least once (all live program variables have
are populated with C++ random number generator with a li-
at least one use). Overall, since the set of un-refined variables
brary call at each invocation of the function, and right before
is constituted by a significant number of program variables,
the input channel for stack variables.
the total number of ARM-PA instructions that are added to
the program is significant. Furthermore, any spills due to reg-
6 Evaluation ister allocation will lead to even more ARM-PA instructions
The evaluation of Pythia answers the following set of ques- instrumented in the program (Fig.6(b)). After the addition of
tions about program security and performance overheads: PA instructions, CPA incurs an average overhead of 47.88%

859
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

Figure 4. (a) Runtime overhead comparison between CPA and Pythia. (b) Binary size comparison between CPA and Pythia.
The baseline absolute numbers for spec in secs and nginx in GB/s are shown on top.

with a worst-case of 69.8% as seen in Fig. 4(a). The worst 4, Pythia’s performance overhead dropped to an average of
case is 502.𝑔𝑐𝑐_𝑟 which has the most number of vulnera- 13.07% with the most noticeable change in 500.𝑝𝑒𝑟𝑙𝑏𝑒𝑛𝑐ℎ_𝑟
ble variables resulting in a maximum number of ARM-PA from 60.7% (CPA) to 18%. Maximum overhead in the Pythia
instructions added. scheme was 25.4% (502.𝑔𝑐𝑐_𝑟 ). Pythia’s program IPC degra-
The extra ARM-PA instructions also adversely affect the dation (Fig. 5) decreased by 2.8% on average - which bears
IPC count, and application binary sizes (Fig. 4-5). However, testimony to the fact that adding ARM-PA selectively in-
the IPC does not suffer radically since ARM-PA directly lever- creases the opportunities of out-of-order processing, where
ages hardware support in variable authentication. As shown more instructions can be completed in the same cycle. Note
in Fig. 5(a), the average IPC degradation for CPA scheme that heap sectioning makes the heap memory more frag-
is around 4.9%, with the worst IPC degradation of 13% in mented. In case heap variables from isolated and shared sec-
523.𝑥𝑎𝑙𝑎𝑛𝑐𝑏𝑚𝑘_𝑟 . This is caused by repeated execution of tions are accessed consecutively, cache misses might increase
ARM-PA instructions inside loop nests. Similarly, the ad- due to non-local accesses. This is why certain benchmarks
dition of ARM-PA instructions causes application binaries (510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 ) have slightly more cache misses in Pythia over
to bloat. As shown in Fig. 4(b), the average binary size in- the baseline. Another implication of instrumenting less ARM-
creased by 21.56%, with the maximum of 33.2% in 𝑛𝑔𝑖𝑛𝑥. PA instructions is that the application binary size decreased
Furthermore, increased instructions in the program lead to to 10.37%, with 510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 having the highest binary bloat
additional LLC misses. with 17.99%.
★ Pythia: In contrast to CPA, Pythia refines the set of vul-
nerable variables by performing input channel construction 6.2 Security Mechanism Evaluation
on them. As a result, Pythia reduces the number of vulnerable
★ Input Channel (IC): Our experiments found 25326 in-
program variables by about 4.5x (Fig.6(a)). Another major
put channels functions across the 16 benchmarks, whose
justification for refining the set of vulnerable program vari-
distribution is presented in Fig. 5(b). As seen in the figure,
ables is the observation that ∼74% of conditional branches
the most common input channel functions across the bench-
in all benchmarks are not affected by input channels at all.
mark are print (31.5%) and move/copy (65.9%). The rest (map,
Instead, only 1.26% directly affected conditional branches
scan, get, put) account for only 2.6% of the input channels.
and 25.1% indirectly affected conditional branches can re-
These input channels are either predefined library functions
sult in control-flow bending. Overall, only 5.1% of program
(such as printf), or custom user-implemented versions. Our
benchmarks are marked as vulnerable.
experiments have revealed that such input channel functions
Aside from vulnerable variable refinement, the stack re-
often get translated as intrinsics in the LLVM IR, especially
layout and heap sectioning in Pythia also decreases the over-
in C++ benchmarks, making detecting their presence easier.
all amount of ARM PA instructions. However, stack canaries
Benchmarks (such as 510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 , 502.𝑔𝑐𝑐_𝑟 ) contain the
add new stack define instructions (mov), call random number
maximum input channels.
generator library function, and load/store instructions for
★ Pointer Authentication : The CPA baseline instru-
encryption/decrypted, which adds up to the overhead. The
mented a total of 5 × 105 PA instructions across all the bench-
library call for heap sectioning adds an extra overhead of
marks. Specifically, 502.𝑔𝑐𝑐_𝑟 and 510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 have the max-
about 23 ns on average.
imum number of PA instructions (1.3 × 105 each). As seen in
Thus, these factors combine to minimize the program’s
Fig. 6 (b), Pythia dramatically decreased total PA instructions
runtime overhead when secured with Pythia. As shown in Fig.
to 1.1 × 104 with 510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 having the most with 59, 680

860
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

Figure 5. (a) IPC degradation comparison between CPA and Pythia. (b) Distribution of printf ICs and copy/move ICs based on
the total ICs in the benchmark.

Figure 6. (a) Distribution of vulnerable variables and ARM-PA instructions between CPA and Pythia scheme. (b) ARM PA
instructions decrease in Pythia over CPA.

instructions. Practically, in both the schemes only 50% of are usually utilized inside program loops. As a result, the
instrumented PA instructions are executed dynamically. size of the isolated heap section is scalable. Furthermore,
★ Stack Canaries + Heap Relocation: As mentioned benchmarks like 519.𝑙𝑏𝑚_𝑟 and 505.𝑚𝑐 𝑓 _𝑟 which don’t have
earlier, more than ∼99% variables being used in input chan- any vulnerable heap variables, incur overheads because of
nels are stack variables (around ∼29300). Pythia which adds heap sectioning (∼ 126𝑛𝑠 on average).
one canary per stack variable, thus adding ∼ 29300 canaries Attack Distance + Branch Security: Lastly, we compare
across all benchmarks. The CPA scheme requires an encryp- the attack detection capabilities of Pythia with a state-of-
tion during the stack allocation, and then a decryption for the-art Data-flow Integrity (DFI) mechanism [14]. Across
loading into input channel function, followed by an encryp- all the benchmarks, the average distance of input channels
tion at store (Eq. 1). In contrast, the Pythia scheme adds one from respective branches is 83.29 LLVM instructions with the
extra layer of encryption before loading the input channel longest distance being 500.perl_r with 149.76 LLVM instruc-
use (Eq. 5). This extra ARM-PA encryption helps Pythia save tions. DFI is unable to reason about pointer arithmetic and
many added instructions in case of register spills. For ex- field sensitivity cases. Therefore, DFI’s average attack dis-
ample, a variable spilled twice in the CPA Scheme would tance is about 113.95 LLVM instructions since its backward
have 7 PA instructions (4 encrypts and 3 decrypts), while the slice mechanism terminates whenever it encounters pointer
Pythia requires only 4 PA instructions (3 encrypts and 1 de- arithmetic and field sensitive cases being unable to reason
crypt right after the input channel). This reduction builds up about them regarding their data-flow. On the other hand,
significantly across all statically allocated program variables. Pythia’s average attack distance is 127.35 LLVM instructions.
Compared to statically allocated variables, dynamically Pythia focuses on protecting all the variables encountered
allocated vulnerable program variables that get relocated by using PA authentication rather than relying on verifying
heap sectioning are significantly less prevalent in the bench- underlying data flow leading to longer backward slices. How-
marks. However, our analysis has shown that such variables ever, in some cases, Pythia cannot extend the backward slice

861
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

to the input channel due to complex inter-procedural alias ★ Motivating Examples: We rewrote the motivating
analysis encountered, which we do not tackle currently. examples (§2.2) so that they could be tested on Pythia. We
In addition, we compare the branch protection capability added some extra instructions to prevent code restructing
of DFI and Pythia (please refer to Fig. 7(b)) as it shows the due to optimizations by the compiler.
security strength of both techniques. Recall that a given tech- The first example is the String-Buffer Overflow attack as
nique protects a branch if the technique can generate and seen in Listing 1. The critical vulnerability is the input chan-
protect branch’s backward slice to the input channel. Due to nel ‘strcpy’ on line 14. Pythia identifies it as an input channel,
Pythia’s backward slice generation and ARM-PA protection and classifies ’someinput’ as a stack variable. It will place
capability, Pythia protects an average of 92% branches across ’someinput’ at the higher address of the current function’s
all benchmarks against DFI’s 86.6%. Looking at the bench- stack frame and add a canary after it with a random value.
mark breakdowns, we can see Pythia offers more protection A simple authentication check after the canary determines
than DFI, ranging from 0% to 17% with an average of 5.6%. whether the value has changed.
One important point to note here is that a small percent- The second example of ProFTPd Vulnerability in Listing 2
age difference can result in a huge number of actual condi- is similar. In this case, the input channel ’sstrncpy’ will af-
tional branches (2% difference in 502.𝑔𝑐𝑐_𝑟 results in 7000 fect rptr causing the overflow. Like the previous example, it
more branches, and 7% in 510.𝑝𝑎𝑟𝑒𝑠𝑡_𝑟 leads to 140000 more will take the variable to the higher address in the call stack.
branches being protected by Pythia). In general, Pythia’s Pythia creates the canary with a random value and encrypts
protection is stronger for C++ codes mainly due to com- it initially then re-encrypts before the input channel. There
plex pointer operations in the benchmark that terminate the is an authentication after its use in the input channel. Any
backward slices of DFI. Pythia offers over 90% protection overflow will crash the program during the canary’s check.
for 13 benchmarks, whereas DFI offers over 90% protection The third example in Listing 3 also has an overflow is-
for only 9 benchmarks. If we look at 100% protection cases sue. In this case, the overflow happens from ’k’ into ’l’. The
(all branches are secured), DFI only provides perfect pro- encryption and authentication will detect the overflow im-
tection for 519.𝑙𝑏𝑚_𝑟 , which has only 75 branches. In con- mediately after the input channel.
trast, Pythia can fully secure three benchmarks (519.𝑙𝑏𝑚_𝑟 ,
505.𝑚𝑐 𝑓 _𝑟 and 525.𝑥264_𝑟 ), where 525.𝑥264_𝑟 has over 7000 6.4 Limitations
branches. Overall, DFI and Pythia perform similarly in non- Pythia cannot detect stack buffer overflows resulting within
pointer arithmetic cases, whereas Pythia has a significant
objects such as sub-fields of a struct. If this overflow affects
advantage in pointer-heavy and C++ code.
another object, Pythia’s stack canaries can detect it imme-
diately. To solve this problem of overflow detection within
6.3 Real-World Examples sub-fields, stack canaries must be inserted within individual
fields. Furthermore, with precise alias analysis, the specific
★ Nginx : We evaluated Pythia on 𝑛𝑔𝑖𝑛𝑥 [57], which is a fields being used by the input channels can be detected, and
well-known web server that needs strict security guarantees.
canaries can only be created for such fields. This is a focus
Recent DOP attacks [17] has exposed the vulnerabilities of
of our future work.
Nginx. In addition, this application is multi-threaded, which
will be useful in stress testing the heap sectioning frame-
work. In our experiments, we used the same experimentation 7 Related Work
scheme described in Blankit [62]. Nginx has a workload gen- Memory Safety: One of the possible way to tackle non-
erator with 12 threads to create 400 concurrent connections. control data attacks (including control-flow bending) is to
We use nginx’s workload generator to satisfy requests for prevent an adversary from exploiting memory errors by en-
Wikipedia’s home page for 3s, 30s, and 300s. The overheads suring general memory safety [24, 36, 54–56, 60, 66]. These
for nginx are based on the transfer rate degrading or not. techniques prevent illegal memory access by introducing
By averaging the performance across the three runs, the non-trivial language extensions. However, memory safety
CPA runtime overhead is around 49.13%, while Pythia drops techniques have high overheads compared to Pythia. For
it to 20.15%. Nginx also uses a mixture of input channels example, on legacy applications (only C applications), Soft-
from glibc and their implementation variations beginning Bound [54] has an average overhead of 67% and Softbound +
with "ngx_". Despite having significantly fewer variables, CETS [55] has an average overhead of 116%. Other bound-
it has many input channels (720) with the majority being checking based techniques that only handle spatial memory
copy/move input channels (712). In nginx, there is a very errors, such as ASAN [68], has an average overheads of 76%
high loop in the call chain, so the PA instructions added (with slowdowns upto 2.67x), heap-only approach LowFat-
will be repeatedly executed. As mentioned, Pythia has a Pointer [26] with an overhead of 113%, and LBC [30] with
significantly higher attack distance than DFI for nginx and an overhead of 23% (legacy applications). Our technique cur-
also protects 300 more branches. rently has 13.3% overhead with all the added instructions.

862
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

Figure 7. (a) Percentage of variables in the backslice that are pointers. In addition, the percentage of conditional branches in a
given benchmark. (b) Comparison between the percentage of conditional branches secured by DFI vs Pythia.

The compiler-based [24, 36, 56] that type-safe pointers are Data-Flow Integrity (DFI) [14]: The goal of DFI is to first
specifically made for C. Our work has been evaluated on compute a static data-flow graph and then verify whether
both C and C++ benchmarks. Other works focused on bound the transfer of dataflow facts at the runtime is permitted by
checking the pointer memory accesses to ensure they don’t the graph or not. The problem with this approach is that it
access unauthorized memory locations [40, 42, 69, 79]. Most requires maintaining and checking the dynamic dataflow in-
of these works require specific hardware processors or exten- formation at runtime using SETDEF and CHKDEF for every
sions. Pythia utilizes hardware extensions already available program variable, which causes overheads of up to 2.5x. In
in commercial ARM chips (seen in Apple products and Gravi- particular, DFI is unable to reason about pointer arithmetic
ton servers [4]). and field-based alias analysis which results in its ability to
Stack Canaries-based Mechanisms: Majority of de- construct backward slices that can cover an input channel.
fense mechanisms [63, 72, 78] that use stack canaries to
protect against non-control data attacks focus exclusively 8 Conclusion
on stacks, and usually do not defend against heap-based vul- In this work, we proposed Pythia, a compiler-guided de-
nerabilities. Moreover, such techniques are also vulnerable fense framework that combines traditional compiler analysis
to inter-procedural overflows, and stack bypassing where with pointer authentication. Pythia prevents control flow
an adversary can try to leak the canary value by performing bending by isolating vulnerable variables, tackling statically
buffer reads. Pythia mitigates this by randomizing the ca- allocated variables by re-orienting and adding canaries, and
nary value before every input channel, which minimizes the dynamically allocated variables by heap sectioning. In our
probability of such attacks. In addition, these techniques are evaluation, we found that Pythia’s performance-aware ap-
stack-specific so they do not solve the heap buffer overflows. proach of using the isolation and pointer authentication has
Address Randomization: Another alternative method to an average overhead of 13.07% compared to the complete
combat data attacks is to simply prevent an adversary from pointer authentication baseline of 47.88%, without compro-
locating privileged information by randomizing the data mising security guarantees. In addition, Pythia can secure
layout [6–8, 28, 44]. Note that such techniques are geared 5.6 % branches more than DFI and fully secure 3 applica-
towards making it harder for the adversary to guess the tions. Thus, it shows its effectiveness on pointer-intensive
memory addresses - they are not designed to prevent control- applications and C++ codes in terms of coverage of input
flow bending attacks like Pythia. Furthermore, to minimize channels.
overheads, such techniques often randomize only a part of
the program data [71]. Recent works have also focused on Acknowledgments
randomizing stack layout to reduce the probability of leaking We thank the anonymous reviewers and our shepherd Ashish
statically allocated safety-critical data [3, 8, 46, 50, 74]. Some Venkat for their invaluable comments to improve the paper
randomization techniques will figure out the actual locations significantly.
at runtime, or create somewhat of a padding. We simply
move the variables around at the compile stage so there is References
no runtime overhead for randomization and our canaries
[1] Cwe top 25 most dangerous software weaknesses, 2022.
for vulnerable variables provide the least amount of padding [2] Martín Abadi, Mihai Budiu, Ulfar Erlingsson, and Jay Ligatti. Control-
needed for stack protection. flow integrity principles, implementations, and applications. ACM

863
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

Transactions on Information and System Security (TISSEC), 13(1):1–40, [19] Crispin Cowan, F Wagle, Calton Pu, Steve Beattie, and Jonathan
2009. Walpole. Buffer overflows: Attacks and defenses for the vulnera-
[3] Misiker Tadesse Aga and Todd Austin. Smokestack: thwarting dop bility of the decade. In Proceedings DARPA Information Survivability
attacks with runtime stack layout randomization. In 2019 IEEE/ACM Conference and Exposition. DISCEX’00, volume 2, pages 119–129. IEEE,
International Symposium on Code Generation and Optimization (CGO), 2000.
pages 26–36. IEEE, 2019. [20] John Criswell, Nathan Dautenhahn, and Vikram Adve. Kcofi: Complete
[4] Amazon graviton2. https://aws.amazon.com/ec2/graviton/, 2019. Ac- control-flow integrity for commodity operating system kernels. In
cessed: 2021 April 16. 2014 IEEE Symposium on Security and Privacy, pages 292–307. IEEE,
[5] ARM ARM. Architecture reference manual-armv8, for armv8-a archi- 2014.
tecture profile. ARM Limited, Dec, 2017. [21] Lucas Davi, Patrick Koeberl, and Ahmad-Reza Sadeghi. Hardware-
[6] Brian Belleville, Hyungon Moon, Jangseop Shin, Dongil Hwang, assisted fine-grained control-flow integrity: Towards efficient pro-
Joseph M Nash, Seonhwa Jung, Yeoul Na, Stijn Volckaert, Per Larsen, tection of embedded systems against software exploitation. In 2014
Yunheung Paek, et al. Hardware assisted randomization of data. In In- 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6.
ternational Symposium on Research in Attacks, Intrusions, and Defenses, IEEE, 2014.
pages 337–358. Springer, 2018. [22] Lucas Davi and Ahmad-Reza Sadeghi. Building control-flow integrity
[7] Sandeep Bhatkar and R Sekar. Data space randomization. In In- defenses. In Building Secure Defenses Against Code-Reuse Attacks, pages
ternational Conference on Detection of Intrusions and Malware, and 27–54. Springer, 2015.
Vulnerability Assessment, pages 1–22. Springer, 2008. [23] Lucas Davi, Ahmad-Reza Sadeghi, Daniel Lehmann, and Fabian Mon-
[8] Kjell Braden, Lucas Davi, Christopher Liebchen, Ahmad-Reza Sadeghi, rose. Stitching the gadgets: On the ineffectiveness of coarse-grained
Stephen Crane, Michael Franz, and Per Larsen. Leakage-resilient control-flow integrity protection. In 23rd {USENIX} Security Sympo-
layout randomization for mobile devices. In NDSS, volume 16, pages sium ({USENIX} Security 14), pages 401–416, 2014.
21–24, 2016. [24] Dinakar Dhurjati, Sumant Kowshik, and Vikram Adve. Safecode:
[9] David Brumley, Tzi-cker Chiueh, Robert Johnson, Huijia Lin, and Enforcing alias analysis for weakly typed languages. ACM SIGPLAN
Dawn Song. Rich: Automatically protecting against integer-based Notices, 41(6):144–157, 2006.
vulnerabilities. 2007. [25] Ren Ding, Chenxiong Qian, Chengyu Song, Bill Harris, Taesoo Kim,
[10] James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. Spec and Wenke Lee. Efficient protection of path-sensitive control security.
cpu2017: Next-generation compute benchmark. In Companion of the In 26th {USENIX} Security Symposium ({USENIX} Security 17), pages
2018 ACM/SPEC International Conference on Performance Engineering, 131–148, 2017.
pages 41–42, New York, NY, 2018. ACM. [26] Gregory J Duck and Roland HC Yap. Heap bounds protection with
[11] Nathan Burow, Scott A Carr, Joseph Nash, Per Larsen, Michael Franz, low fat pointers. In Proceedings of the 25th International Conference on
Stefan Brunthaler, and Mathias Payer. Control-flow integrity: Pre- Compiler Construction, pages 132–142, 2016.
cision, security, and performance. ACM Computing Surveys (CSUR), [27] Ulfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, and
50(1):1–33, 2017. George C Necula. Xfi: Software guards for system address spaces.
[12] Sadullah Canakci, Leila Delshadtehrani, Boyou Zhou, Ajay Joshi, and In Proceedings of the 7th symposium on Operating systems design and
Manuel Egele. Efficient context-sensitive cfi enforcement through a implementation, pages 75–88, 2006.
hardware monitor. In International Conference on Detection of Intrusions [28] Cristiano Giuffrida, Anton Kuijsten, and Andrew S Tanenbaum. En-
and Malware, and Vulnerability Assessment, pages 259–279. Springer, hanced operating system security through efficient and fine-grained
2020. address space randomization. In 21st USENIX Security Symposium
[13] Nicholas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and (USENIX Security 12), pages 475–490, 2012.
Thomas R Gross. Control-flow bending: On the effectiveness of control- [29] Guang Gong. Exploiting heap corruption due to integer overflow in
flow integrity. In 24th {USENIX} Security Symposium ({USENIX} android libcutils. Black Hat USA, 2015.
Security 15), pages 161–176, 2015. [30] Niranjan Hasabnis, Ashish Misra, and R Sekar. Light-weight bounds
[14] Miguel Castro, Manuel Costa, and Tim Harris. Securing software by checking. In Proceedings of the Tenth International Symposium on Code
enforcing data-flow integrity. In Proceedings of the 7th symposium on Generation and Optimization, pages 135–144, 2012.
Operating systems design and implementation, pages 147–160, 2006. [31] Konrad Hohentanner, Philipp Zieris, and Julian Horsch. Pacsafe: Lever-
[15] Shuo Chen, Jun Xu, Nithin Nakka, Zbigniew Kalbarczyk, and Rav- aging arm pointer authentication for memory safety in c/c++. arXiv
ishankar K Iyer. Defeating memory corruption attacks via pointer preprint arXiv:2202.08669, 2022.
taintedness detection. In 2005 International Conference on Dependable [32] Hong Hu, Zheng Leong Chua, Sendroiu Adrian, Prateek Saxena, and
Systems and Networks (DSN’05), pages 378–387. IEEE, 2005. Zhenkai Liang. Automatic generation of data-oriented exploits. In
[16] Shuo Chen, Jun Xu, Emre Can Sezer, Prachi Gauriar, and Ravishankar K 24th {USENIX} Security Symposium ({USENIX} Security 15), pages
Iyer. Non-control-data attacks are realistic threats. In USENIX Security 177–192, 2015.
Symposium, volume 5, 2005. [33] Hong Hu, Chenxiong Qian, Carter Yagemann, Simon Pak Ho Chung,
[17] Long Cheng, Salman Ahmed, Hans Liljestrand, Thomas Nyman, William R Harris, Taesoo Kim, and Wenke Lee. Enforcing unique code
Haipeng Cai, Trent Jaeger, N. Asokan, and Danfeng (Daphne) Yao. target property for control-flow integrity. In Proceedings of the 2018
Exploitation techniques for data-oriented attacks with existing and ACM SIGSAC Conference on Computer and Communications Security,
potential defense approaches. ACM Trans. Priv. Secur., 24(4), sep 2021. pages 1470–1486, 2018.
[18] Mauro Conti, Stephen Crane, Lucas Davi, Michael Franz, Per Larsen, [34] Hong Hu, Shweta Shinde, Sendroiu Adrian, Zheng Leong Chua, Pra-
Marco Negro, Christopher Liebchen, Mohaned Qunaibit, and Ahmad- teek Saxena, and Zhenkai Liang. Data-oriented programming: On the
Reza Sadeghi. Losing control: On the effectiveness of control-flow expressiveness of non-control data attacks. In 2016 IEEE Symposium
integrity under stack attacks. In Proceedings of the 22nd ACM SIGSAC on Security and Privacy (SP), pages 969–986. IEEE, 2016.
Conference on Computer and Communications Security, pages 952–963, [35] Hyerean Jang, Moon Chan Park, and Dong Hoon Lee. Ibv-cfi: Efficient
2015. fine-grained control-flow integrity preserving cfg precision. Computers
& Security, 94:101828, 2020.

864
ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA Khan, Chatterjee, Pande

[36] Trevor Jim, J Gregory Morrisett, Dan Grossman, Michael W Hicks, [53] Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen,
James Cheney, and Yanling Wang. Cyclone: a safe dialect of c. In Chris Porter, Ada Gavrilovska, and Santosh Pande. Beacons: An end-
USENIX Annual Technical Conference, General Track, pages 275–288, to-end compiler framework for predicting and utilizing dynamic loop
2002. characteristics. Proceedings of the ACM on Programming Languages,
[37] Dongjae Jung, Minsu Kim, Jinsoo Jang, and Brent Byunghoon Kang. 7(OOPSLA2):173–203, 2023.
Value-based constraint control flow integrity. IEEE Access, 8:50531– [54] Santosh Nagarakatte, Jianzhou Zhao, Milo MK Martin, and Steve
50542, 2020. Zdancewic. Softbound: Highly compatible and complete spatial mem-
[38] Mustakimur Khandaker, Abu Naser, Wenqing Liu, Zhi Wang, Yajin ory safety for c. In Proceedings of the 30th ACM SIGPLAN Conference
Zhou, and Yueqiang Cheng. Adaptive call-site sensitive control flow on Programming Language Design and Implementation, pages 245–258,
integrity. In 2019 IEEE European Symposium on Security and Privacy 2009.
(EuroS&P), pages 95–110. IEEE, 2019. [55] Santosh Nagarakatte, Jianzhou Zhao, Milo MK Martin, and Steve
[39] Mustakimur Rahman Khandaker, Wenqing Liu, Abu Naser, Zhi Wang, Zdancewic. Cets: compiler enforced temporal safety for c. In Pro-
and Jie Yang. Origin-sensitive control flow integrity. In 28th {USENIX} ceedings of the 2010 International Symposium on Memory Management,
Security Symposium ({USENIX} Security 19), pages 195–211, 2019. pages 31–40, 2010.
[40] Yonghae Kim, Jaekyu Lee, and Hyesoon Kim. Hardware-based always- [56] George C Necula, Jeremy Condit, Matthew Harren, Scott McPeak, and
on heap memory safety. In 2020 53rd Annual IEEE/ACM International Westley Weimer. Ccured: Type-safe retrofitting of legacy software.
Symposium on Microarchitecture (MICRO), pages 1153–1166. IEEE, ACM Transactions on Programming Languages and Systems (TOPLAS),
2020. 27(3):477–526, 2005.
[41] Benjamin A Kuperman, Carla E Brodley, Hilmi Ozdoganoglu, TN Vi- [57] nginx. https://nginx.org/, 2019. Accessed: 2021 Oct 10.
jaykumar, and Ankit Jalote. Detection and prevention of stack buffer [58] Ben Niu and Gang Tan. Modular control-flow integrity. In Proceedings
overflow attacks. Communications of the ACM, 48(11):50–56, 2005. of the 35th ACM SIGPLAN Conference on Programming Language Design
[42] Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach, and Implementation, pages 577–587, 2014.
Pramod Bhatotia, Pascal Felber, and Christof Fetzer. Sgxbounds: Mem- [59] Gene Novark and Emery D Berger. Dieharder: securing the heap. In
ory safety for shielded execution. In Proceedings of the Twelfth European Proceedings of the 17th ACM conference on Computer and communica-
Conference on Computer Systems, pages 205–221, 2017. tions security, pages 573–584, 2010.
[43] Volodymyr Kuznetzov, László Szekeres, Mathias Payer, George Candea, [60] Thomas Nyman, Ghada Dessouky, Shaza Zeitouni, Aaro Lehikoinen,
R Sekar, and Dawn Song. Code-pointer integrity. In The Continuing Andrew Paverd, N Asokan, and Ahmad-Reza Sadeghi. Hardscope:
Arms Race: Code-Reuse Attacks and Defenses, pages 81–116. 2018. Hardening embedded systems against data-oriented attacks. In Pro-
[44] Per Larsen, Andrei Homescu, Stefan Brunthaler, and Michael Franz. ceedings of the 56th Annual Design Automation Conference 2019, pages
Sok: Automated software diversity. In 2014 IEEE Symposium on Security 1–6, 2019.
and Privacy, pages 276–291, 2014. [61] Mathias Payer, Antonio Barresi, and Thomas R Gross. Fine-grained
[45] Byoungyoung Lee, Chengyu Song, Yeongjin Jang, Tielei Wang, Taesoo control-flow integrity through binary hardening. In International
Kim, Long Lu, and Wenke Lee. Preventing use-after-free with dangling Conference on Detection of Intrusions and Malware, and Vulnerability
pointers nullification. In NDSS. Citeseer, 2015. Assessment, pages 144–164. Springer, 2015.
[46] Seongman Lee, Hyeonwoo Kang, Jinsoo Jang, and Brent Byunghoon [62] Chris Porter, Girish Mururu, Prithayan Barua, and Santosh Pande.
Kang. Savior: Thwarting stack-based memory safety violations by Blankit library debloating: getting what you want instead of cutting
randomizing stack layout. IEEE Transactions on Dependable and Secure what you don’t. In Alastair F. Donaldson and Emina Torlak, editors,
Computing, 19(4):2559–2575, 2021. Proceedings of the 41st ACM SIGPLAN International Conference on Pro-
[47] Kyung-Suk Lhee and Steve J Chapin. Buffer overflow and format string gramming Language Design and Implementation, PLDI 2020, London,
overflow vulnerabilities. Software: practice and experience, 33(5):423– UK, June 15-20, 2020, pages 164–180. ACM, 2020.
460, 2003. [63] Weizhong Qiang, Jiawei Yang, Hai Jin, and Xuanhua Shi. Privguard:
[48] Yuan Li, Wende Tan, Zhizheng Lv, Songtao Yang, Mathias Payer, Ying Protecting sensitive kernel data from privilege escalation attacks. IEEE
Liu, and Chao Zhang. Pacmem: Enforcing spatial and temporal mem- Access, 6:46584–46594, 2018.
ory safety via arm pointer authentication. In Proceedings of the 2022 [64] Paruj Ratanaworabhan, V Benjamin Livshits, and Benjamin G Zorn.
ACM SIGSAC Conference on Computer and Communications Security, Nozzle: A defense against heap-spraying code injection attacks. In
pages 1901–1915, 2022. USENIX security symposium, pages 169–186, 2009.
[49] Yuan Li, Mingzhe Wang, Chao Zhang, Xingman Chen, Songtao Yang, [65] Robert Schilling, Pascal Nasahl, and Stefan Mangard. Fipac: Thwarting
and Ying Liu. Finding cracks in shields: On the security of control fault-and software-induced control-flow attacks with arm pointer au-
flow integrity mechanisms. In Proceedings of the 2020 ACM SIGSAC thentication. In Constructive Side-Channel Analysis and Secure Design:
Conference on Computer and Communications Security, pages 1821– 13th International Workshop, COSADE 2022, Leuven, Belgium, April
1835, 2020. 11-12, 2022, Proceedings, pages 100–124. Springer, 2022.
[50] Yu Liang, Xinjie Ma, Daoyuan Wu, Xiaoxiao Tang, Debin Gao, Guojun [66] Cole Schlesinger, Karthik Pattabiraman, Nikhil Swamy, David Walker,
Peng, Chunfu Jia, and Huanguo Zhang. Stack layout randomization and Benjamin Zorn. Modular protections against non-control data
with minimal rewriting of android binaries. In Information Security attacks. Journal of Computer Security, 22(5):699–742, 2014.
and Cryptology-ICISC 2015: 18th International Conference, Seoul, South [67] David Sehr, Robert Muth, Cliff L Biffle, Victor Khimenko, Egor Pasko,
Korea, November 25-27, 2015, Revised Selected Papers 18, pages 229–245. Bennet Yee, Karl Schimpf, and Brad Chen. Adapting software fault
Springer, 2016. isolation to contemporary cpu architectures. 2010.
[51] Hans Liljestrand, Thomas Nyman, Kui Wang, Carlos Chinea Perez, [68] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and
Jan-Erik Ekberg, and N Asokan. Pac it up: Towards pointer integrity Dmitriy Vyukov. {AddressSanitizer}: A fast address sanity checker.
using arm pointer authentication. In USENIX Security Symposium, In 2012 USENIX annual technical conference (USENIX ATC 12), pages
pages 177–194, 2019. 309–318, 2012.
[52] Vishwath Mohan, Per Larsen, Stefan Brunthaler, Kevin W Hamlen, [69] Rasool Sharifi and Ashish Venkat. Chex86: Context-sensitive enforce-
and Michael Franz. Opaque control-flow integrity. In NDSS, volume 26, ment of memory safety via microcode-enabled capabilities. In 2020
pages 27–30, 2015.

865
Pythia: Compiler-Guided Defense Against Non-Control Data Attacks ASPLOS ’24, April 27-May 1, 2024, La Jolla, CA, USA

ACM/IEEE 47th Annual International Symposium on Computer Archi- 13), pages 337–352, 2013.
tecture (ISCA), pages 762–775. IEEE, 2020. [85] Yajin Zhou, Xiaoguang Wang, Yue Chen, and Zhi Wang. Armlock:
[70] Chengyu Song, Hyungon Moon, Monjur Alam, Insu Yun, Byoungy- Hardware-based fault isolation for arm. In Proceedings of the 2014 ACM
oung Lee, Taesoo Kim, Wenke Lee, and Yunheung Paek. Hdfi: SIGSAC conference on computer and communications security, pages
Hardware-assisted data-flow isolation. In 2016 IEEE Symposium on 558–569, 2014.
Security and Privacy (SP), pages 1–17. IEEE, 2016. [86] Xiaotong Zhuang, Tao Zhang, and Santosh Pande. Using branch
[71] Laszlo Szekeres, Mathias Payer, Tao Wei, and Dawn Song. Sok: Eternal correlation to identify infeasible paths for anomaly detection. In 2006
war in memory. In 2013 IEEE Symposium on Security and Privacy, pages 39th Annual IEEE/ACM International Symposium on Microarchitecture
48–62. IEEE, 2013. (MICRO’06), pages 113–122. IEEE, 2006.
[72] Steven Van Acker, Nick Nikiforakis, Pieter Philippaerts, Yves Younan,
and Frank Piessens. Valueguard: Protection of native applications
against data-only buffer overflows. In Information Systems Security:
6th International Conference, ICISS 2010, Gandhinagar, India, December
17-19, 2010. Proceedings 6, pages 156–170. Springer, 2010.
[73] Victor Van der Veen, Dennis Andriesse, Enes Göktaş, Ben Gras, Li-
onel Sambuc, Asia Slowinska, Herbert Bos, and Cristiano Giuffrida.
Practical context-sensitive cfi. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications Security, pages 927–940,
2015.
[74] Ashish Venkat, Sriskanda Shamasunder, Hovav Shacham, and Dean M
Tullsen. Hipstr: Heterogeneous-isa program state relocation. In Pro-
ceedings of the Twenty-First International Conference on Architectural
Support for Programming Languages and Operating Systems, pages
727–741, 2016.
[75] Tielei Wang, Chengyu Song, and Wenke Lee. Diagnosis and emergency
patch generation for integer overflow exploits. In Detection of Intru-
sions and Malware, and Vulnerability Assessment: 11th International
Conference, DIMVA 2014, Egham, UK, July 10-11, 2014. Proceedings 11,
pages 255–275. Springer, 2014.
[76] Tielei Wang, Tao Wei, Zhiqiang Lin, and Wei Zou. Intscope: Auto-
matically detecting integer overflow vulnerability in x86 binary using
symbolic execution. In NDSS, pages 1–14, 2009.
[77] Ye Wang, Qingbao Li, Zhifeng Chen, Ping Zhang, Guimin Zhang,
and Zhihui Shi. Bci-cfi: A context-sensitive control-flow integrity
method based on branch correlation integrity. Information and Software
Technology, 136:106572, 2021.
[78] Zhilong Wang, Xuhua Ding, Chengbin Pang, Jian Guo, Jun Zhu, and
Bing Mao. To detect stack buffer overflow with polymorphic canaries.
In 2018 48th Annual IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN), pages 243–254. IEEE, 2018.
[79] Robert NM Watson, Jonathan Woodruff, Peter G Neumann, Simon W
Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis,
Khilan Gudka, Ben Laurie, et al. Cheri: A hybrid capability-system
architecture for scalable software compartmentalization. In 2015 IEEE
Symposium on Security and Privacy, pages 20–37. IEEE, 2015.
[80] Yubin Xia, Yutao Liu, Haibo Chen, and Binyu Zang. Cfimon: Detecting
violation of control flow integrity using performance counters. In
IEEE/IFIP International Conference on Dependable Systems and Networks
(DSN 2012), pages 1–12. IEEE, 2012.
[81] Wen Xu, Juanru Li, Junliang Shu, Wenbo Yang, Tianyi Xie, Yuanyuan
Zhang, and Dawu Gu. From collision to exploitation: Unleashing use-
after-free vulnerabilities in linux kernel. In Proceedings of the 22nd
ACM SIGSAC Conference on Computer and Communications Security,
pages 414–425, 2015.
[82] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres,
Stephen McCamant, Dawn Song, and Wei Zou. Practical control flow
integrity and randomization for binary executables. In 2013 IEEE
Symposium on Security and Privacy, pages 559–573. IEEE, 2013.
[83] Guimin Zhang, Qingbao Li, Zhifeng Chen, and Ping Zhang. Defending
non-control-data attacks using influence domain monitoring. KSII
Transactions on Internet and Information Systems (TIIS), 12(8):3888–
3910, 2018.
[84] Mingwei Zhang and R Sekar. Control flow integrity for {COTS}
binaries. In 22nd {USENIX} Security Symposium ({USENIX} Security

866

You might also like