Architectural and Operating System Support
Architectural and Operating System Support
Abstract—System call checking is extensively used to protect Google’s gVisor [19], Amazon’s Firecraker [20], [21], CoreOS
the operating system kernel from user attacks. However, existing rkt [22], Singularity [23], Kubernetes [24], and Mesos Con-
solutions such as Seccomp execute lengthy rule-based checking tainerizer [25] all use Seccomp. Further, Google’s recent
programs against system calls and their arguments, leading to
substantial execution overhead. Sandboxed API project [26] uses Seccomp to enforce sand-
To minimize checking overhead, this paper proposes Draco, a boxing for C/C++ libraries. Overall, Seccomp provides “the
new architecture that caches system call IDs and argument values most important security isolation boundary” for containerized
after they have been checked and validated. System calls are first environments [21].
looked-up in a special cache and, on a hit, skip all checks. We Unfortunately, checking system calls incurs overhead. Or-
present both a software and a hardware implementation of Draco.
The latter introduces a System Call Lookaside Buffer (SLB) to acle identified large Seccomp programs as a root cause that
keep recently-validated system calls, and a System Call Target slowed down their customers’ applications [27], [28]. Seccomp
Buffer to preload the SLB in advance. In our evaluation, we find programs can be application specific, and complex applications
that the average execution time of macro and micro benchmarks tend to need large Seccomp programs. Recently, a number of
with conventional Seccomp checking is 1.14× and 1.25× higher, Seccomp overhead measurements have been reported [27]–
respectively, than on an insecure baseline that performs no
security checks. With our software Draco, the average execution [29]. For example, a micro benchmark that repeatedly calls
time reduces to 1.10× and 1.18× higher, respectively, than on getppid runs 25% slower when Seccomp is enabled [29].
the insecure baseline. With our hardware Draco, the execution Seccomp’s overhead on ARM processors is reported to be
time is within 1% of the insecure baseline. around 20% for simple checks [30].
Index Terms—System call checking, Security, Operating sys- The overhead becomes higher if the checks include system
tem, Containers, Virtualization, Microarchitecture
call argument values. Since each individual system call can
have multiple arguments, and each argument can take multiple
I. I NTRODUCTION distinct values, comprehensively checking arguments is slow.
Protecting the Operating System (OS) kernel is a significant For example, Kim and Zeldovich [31] show that Seccomp
concern, given its capabilities and shared nature. In recent causes a 45% overhead in a sandboxed application. Our own
years, there have been reports of a number of security attacks measurements on an Intel Xeon server show that the average
on the OS kernel through system call vectors [1]–[10]. execution time of macro and micro benchmarks is 1.14× and
A popular technique to protect OS kernels against untrusted 1.25× higher, respectively, than without any checks. For this
user processes is system call checking. The idea is to limit the reason, current systems tend to perform only a small number of
system calls that a given process can invoke at runtime, as argument checks, despite being well known that systematically
well as the actual set of argument values used by the system checking arguments is critical for security [32]–[34].
calls. This technique is implemented by adding code at the OS The overhead of system call and argument value checking is
entry point of a system call. The code compares the incoming especially concerning for applications with high-performance
system call against a list of allowed system calls and argument requirements, such as containerized environments. The over-
set values, and either lets the system call continue or flags an head diminishes one of the key attractions of containerization,
error. All modern OSes provide kernel support for system call namely lightweight virtualization.
checking, such as Seccomp for Linux [11], Pledge [12] and To minimize this overhead, this paper proposes Draco, a
Tame [13] for OpenBSD, and System Call Disable Policy for new architecture that caches system call IDs and argument set
Windows [14]. values after they have been checked and validated. System
Linux’s Seccomp (Secure Computing) module is the most calls are first looked-up in a special cache and, on a hit,
widely-used implementation of system call checking. It is skip the checks. The insight behind Draco is that the patterns
used in a wide variety of today’s systems, ranging from of system calls in real-world applications have locality—the
mobile systems to web browsers, and to containers massively same system calls are issued repeatedly, with the same sets of
deployed in cloud and data centers. Today, every Android app argument values.
is isolated using Seccomp-based system call checking [15]. In this paper, we present both a software and a hardware
Systemd, which is Linux’s init system, uses Seccomp to implementation of Draco. We build the software one as a
support user process sandboxing [16]. Low-overhead virtu- component of the Linux kernel. While this implementation
alization technologies such as Docker [17], LXC/LXD [18], is faster than Seccomp, it still incurs substantial overhead.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
The hardware implementation of Draco introduces novel Linux. Seccomp allows the software to specify which system
microarchitecture to eliminate practically all of the checking calls a given process can make, and which argument values
overhead. It introduces the System Call Lookaside Buffer such system calls can use. This information is specified in a
(SLB) to keep recently-validated system calls, and the System profile for the process, which is expressed as a Berkeley Packet
Call Target Buffer (STB) to preload the SLB in advance. Filter (BPF) program called a filter [36]. When the process
In our evaluation, we execute representative workloads is loaded, a Just-In-Time compiler in the kernel converts the
in Docker containers. We run Seccomp on an Intel Xeon BPF program into native code. The Seccomp filter executes
server, checking both system call IDs and argument values. in the OS kernel every time the process performs a system
We find that, with Seccomp, the average execution time of call. The filter may allow the system call to proceed or, if it
macro and micro benchmarks is 1.14× and 1.25× higher, is illegal, capture it. In this case, the OS kernel may take one
respectively, than without performing any security checks. of multiple actions: kill the process or thread, send a SIGSYS
With our software Draco, the average execution time of macro signal to the thread, return an error, or log the system call [11].
and micro benchmarks reduces to 1.10× and 1.18× higher, System call checking is also used by other modern OSes, such
respectively, than without any security checks. Finally, we use as OpenBSD with Pledge [12] and Tame [13], and Windows
full-system simulation to evaluate our hardware Draco. The with System Call Disable Policy [14]. The idea behind our
average execution time of the macro and micro benchmarks proposal, Draco, can be applied to all of them.
is within 1% of a system that performs no security checks. Figure 1 shows an example of a system call check. In user
With more complex checks, as expected in the future, space, the program loads the system call argument and ID in
the overhead of conventional Seccomp checking increases registers rdi and rax, respectively, and performs the system
substantially, while the overhead of Draco’s software imple- call. This ID corresponds to the personality system call.
mentation goes up only modestly. Moreover, the overhead As execution enters the OS kernel, Seccomp checks if this
of Draco’s hardware implementation remains within 1% of combination of system call ID and argument value is allowed.
a system without checks. If it is allowed, it lets the system call execution to proceed
Overall, the contributions of this work are: (Line 8); otherwise, it terminates the process (Line 11).
1) A characterization of the system call checking overhead Process
for various Seccomp configurations. 1. ...
2. movl 0xffffffff,%rdi
2) The new Draco architecture that caches validated system 3. movl $135,%rax
call IDs and argument values for reuse. We introduce a 4. SYSCALL
5. ...
software and a hardware implementation. User space
3) An evaluation of the software and hardware implemen- Seccomp Kernel space
tations of Draco. 6. ...
7. if (syscallID == 135 &&
II. BACKGROUND (arg0 == 0xffffffff ||
arg0 == 0x20008)) { Terminate the
A. System Calls 8.
3. return SCMP_ACT_ALLOW user process
9. }
System calls are the interface an OS kernel exposes to the 10. ...
user space. In x86-64 [35], a user space process requests .
11.return
return
SECCOMP_RET_KILL_PROCESS
SCMP_ACT_KILL_PROCESS
a system call by issuing the syscall instruction, which
Execute the system call
transfers control to the OS kernel. The instruction invokes a
system call handler at privilege level 0. In Linux, syscall Fig. 1: Checking a system call with Seccomp.
supports from zero to six distinct arguments that are passed
to the kernel through general-purpose registers. Specifically, Figure 1 shows that a Seccomp profile is a long list of if
the x86-64 architecture stores the system call ID in the rax statements that are executed in sequence. Finding the target
register, and the arguments in registers rdi, rsi, rdx, r10, system call and the target combination of argument values in
r8, and r9. The return value of the system call is stored the list can take a long time, which is the reason for the often
in the rax register. The syscall instruction is a serializing high overhead of Seccomp.
instruction [35], which means that it cannot execute until all Seccomp does not check the values of arguments that are
the older instructions are completed, and that it prevents the pointers. This is because such a check does not provide any
execution of all the younger instructions until it completes. protection: a malicious user could change the contents of the
Finally, note that the work in this paper is not tied to Linux, location pointed to by the pointer after the check, creating a
but applies to different OS kernels. Time-Of-Check-Time-Of-Use (TOCTOU) attack [37], [38].
Seccomp could potentially do advanced checks such as
B. System Call Checking checking the value range of an argument, or the result of an
A core security technique to protect the OS kernel against operation on the arguments. However, our study of real-world
untrusted user processes is system call checking. The most Seccomp profiles shows that most real-world profiles simply
widely-used implementation of such a technique is the Secure check system call IDs and argument values based on a whitelist
Computing (Seccomp) module [11], which is implemented in of exact IDs and values [19], [39]–[43].
43
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
C. System Call Checking in Modern Systems IV. M EASURING OVERHEAD & L OCALITY
System call checking with Seccomp is performed in almost To motivate our proposal, we benchmark the overhead of
all of the modern Linux-based system infrastructure. This in- state-of-the-art system call checking. We focus on understand-
cludes modern container and lightweight VM runtimes such as ing the overhead of: 1) using generic system call profiles, 2)
Docker [17], Google’s gVisor [19], Amazon’s Firecracker [20], using smaller, application-specific system call profiles, and 3)
[21], and CoreOS rkt [22]. Other environments, such as using profiles that also check system call arguments.
Android and Chrome OS also use Seccomp [15], [44]–[46].
The systemd Linux service manager [16] has adopted Seccomp A. Methodology
to further increase the isolation of user containers. We run macro and micro benchmarks in Docker containers
Google’s Sandboxed API [26] is an initiative that aims to with each of the following four Seccomp profiles:
isolate C/C++ libraries using Seccomp. This initiative, together insecure: Seccomp is disabled. It is an insecure baseline where
with others [47]–[50], significantly reduce the barrier to apply no checks are performed.
Seccomp profiles customized for applications. docker-default: Docker’s default Seccomp profile. It is auto-
Seccomp profiles. In this paper, we focus on container tech- matically used in all Docker containers and other container
nologies, which have both high performance and high security technologies (e.g., CoreOS rkt and Singularity) as part of
requirements. The default Seccomp profiles provided by ex- the Moby project [57]. This profile is deployed by container
isting container runtimes typically contain hundreds of system management systems like Kubernetes, RedHat RedShift, and
call IDs and fewer argument values. For example, Docker’s Mesos Containerizers.
default profile allows 358 system calls, and only checks 7 syscall-noargs: Application-specific profiles without argument
unique argument values (of the clone and personality checks, where the filter whitelists the exact system call IDs
system calls). This profile is widely used by other container used by the application.
technologies such as CoreOS Rtk and Singularity, and is syscall-complete: Application-specific profiles with argument
applied by container management systems such as Kubernetes, checks, where the filter whitelists the exact system call IDs
Mesos, and RedHat’s RedShift. In this paper, we use Docker’s and the exact argument values used by the application. These
default Seccomp profile as our baseline profile. profiles are the most secure filters that include both system
Other profiles include the default gVisor profile, which is a call IDs and their arguments.
whitelist of 74 system calls and 130 argument checks. Also, syscall-complete-2x: Application-specific profiles that consist
the profile for the AWS Firecracker microVMs contains 37 of running the above syscall-complete profile twice in a
system calls and 8 argument checks [20]. row. Hence, these profiles perform twice the number of checks
In this paper, we also explore more secure Seccomp profiles as syscall-complete. The goal is to model a near-future
tailored for some applications. Application-specific profiles are environment that performs more extensive security checks.
supported by Docker and other projects [17], [24], [50]–[53]. Section X-B describes our toolkits for automatically
III. T HREAT M ODEL generating the syscall-noargs, syscall-complete, and
We adopt the same threat model as existing system call syscall-complete-2x profiles for a given application. The
checking systems such as Seccomp, where untrusted user workloads are described in Section X-A, and are grouped
space processes can be adversarial and attack the OS kernel. into macro benchmarks and micro benchmarks. All workloads
In the context of containers, the containerized applications are run on an Intel Xeon (E5-2660 v3) system at 2.60GHz with
deployed in the cloud, and an adversary tries to compromise 64 GB of DRAM, using Ubuntu 18.04 with the Linux 5.3.0
a container and exploit vulnerabilities in the host OS kernel kernel. We disable the spec_store_bypass, spectre_v2,
to achieve privilege escalation and arbitrary code execution. mds, pti, and l1tf vulnerability mitigations due to their
The system call interface is the major attack vector for user heavy performance impact—many of these kernel patches will
processes to attack the OS kernel. be removed in the future anyway.
Prior work [54]–[56] has shown that system call checking We run the workloads with the Linux BPF JIT compiler
is effective in defending against real-world attacks, because enabled. Enabling the JIT compiler can achieve 2–3× perfor-
most applications only use a small number of different system mance increases [30]. Note that JIT compilers are disabled
calls and argument values [2]–[6]. For example, the mitigation by default in many Linux distributions to prevent kernel JIT
of CVE-2014-3153 [3] is to disallow FUTEX_REQUEUE as the spraying attacks [58]–[60]. Hence, the Seccomp performance
value of the futex_op argument of the futex system call. that we report in this section represents the highest perfor-
Our focus is not to invent new security analyses or policies, mance for the baseline system.
but to minimize the execution overhead that hinders the
B. Execution Overhead
deployment of comprehensive security checks. As will be
shown in this paper, existing software-based security checking Figure 2 shows the latency or execution time of the work-
is often costly. This forces users to sacrifice either performance loads using the five types of Seccomp profiles described in
or security. Draco provides both high performance and a high Section IV-A. For each workload, the results are normalized
level of security. to insecure (i.e., Seccomp disabled).
44
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
2.00 macro-benchmarks micro-benchmarks
(Normalized to Insecure)
Latency/Execution Time
1.75
1.50 insecure
1.25 docker-default
1.00 syscall-noargs
0.75 syscall-complete
0.50 syscall-complete-2x
0.25
0.00 httpd nginx elastic mysql cassandra redis grep pwgen average sysbench hpcc unixbench fifo pipe domain mq average
search macro fio syscall (ipc) (ipc) (ipc) (ipc) micro
Fig. 2: Latency or execution time of the workloads using different Seccomp profiles. For each workload, the results are
normalized to insecure (i.e., Seccomp disabled).
Overhead of checking system call IDs. To assess the over- example, read accounts for about 18% of all system calls. We
head of checking system call IDs, we compare insecure to further break down each bar into the different argument sets
docker-default. As shown in Figure 2, enabling Seccomp used by the system call. Each color in a bar corresponds to
using the Docker default profile increases the execution time the frequency of a different argument set (or none).
of the macro and micro benchmarks by 1.05× and 1.12× on 0.18 12 no_arg
average, respectively. 0.16 8 arg_set_one
0.14 arg_set_two
Comparing docker-default to syscall-noargs, we see arg_set_three
0.12 arg_set_four
the impact of using application-specific profiles. Sometimes,
Fraction
0.10 43 arg_set_five
the overhead is visibly reduced. This is because the number of 0.08 arg_set_other
45 14
instructions needed to execute the syscall-noargs profile is 0.06 48 34 23
smaller than that of docker-default. Overall, the average 0.04 60 20 147 5
0.02 42 44 10 39 40 50 46 29
performance overhead of using syscall-noargs profiles is
0.00 l l l
1.04× for macro benchmarks and 1.09× for micro bench- readfuetcevxfromclooslle_wawitritevwerpitoell_ct fcancct epste4ndto poopl enatfstatopseenndfile stm
atmanpmatpimes
r ep mu
marks, respectively.
Overhead of checking system call arguments. To assess Fig. 3: Frequency of the top system calls and average reuse
the overhead of checking system call arguments, we first distance collected from the macro benchmarks.
compare syscall-noargs with syscall-complete. The We see that system calls have a high locality: 20 system
number of system calls in these two profiles is the same, but calls account for 86% of all the calls. Further, individual
syscall-complete additionally checks argument values. We system calls are often called with three or fewer different
can see that checking arguments brings significant overhead. argument sets. At the top of each bar, we show the average
On average, compared to syscall-noargs, the macro and reuse distance, defined as the number of other system calls
micro benchmarks increase their execution time from 1.04× between two system calls with the same ID and argument set.
to 1.14× and from 1.09× to 1.25×, respectively. As shown in Figure 3, the average distance is often only a few
We now double the number of checks by going from tens of system calls, indicating high locality in reusing system
syscall-complete to syscall-complete-2x. We can see call checking results.
that the benchmark overhead often nearly doubles. On aver-
age, compared to syscall-complete, the macro and micro V. D RACO S YSTEM C ALL C HECKING
benchmarks increase their execution time from 1.14× to 1.21× To minimize the overhead of system call checking, this
and from 1.25× to 1.42×, respectively. paper proposes a new architecture called Draco. Draco avoids
Implications. Our results lead to the following conclusions. executing the many if statements of a Seccomp profile at
First, checking system call IDs can introduce noticeable per- every system call (Figure 1). Draco caches system call IDs
formance overhead to applications running in Docker contain- and argument set values after they have been checked and
ers. This is the case even with Seccomp, which is the most validated once. System calls are first looked-up in the cache
efficient implementation of the checks. and, on a hit, the system call and its arguments are declared
Second, checking system call arguments is significantly validated, skipping the execution of the Seccomp filter.
more expensive than checking only system call IDs. This is Figure 4 shows the workflow. On reception of a system
because the number of arguments per system call and the call, a table of validated system call IDs and argument values
number of distinct values per argument are often large. is checked. If the current system call and arguments match
Finally, doubling the security checks in the profiles almost an entry in the table, the system call is allowed to proceed.
doubles the performance overhead. Otherwise, the OS kernel executes the Seccomp filter and
decides if the system call is allowed. If so, the table is updated
C. System Call Locality with the new entry and the system call proceeds; otherwise the
We measured all the system calls and arguments issued by program is killed or other actions are taken (Section II-B).
our macro benchmarks. Figure 3 shows the frequency of the This approach is correct because Seccomp profiles are
top calls which, together, account for 86% of all the calls. For stateless. This means that the output of the Seccomp filter
45
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
syscall(arg1 ... argN) Arg 1 … Arg N
Yes Selector
Proceed with [Software]
Update Table Validated Argument
the System Call
Check Table Present? Table (VAT)
Yes SID Argument
Valid Base
No Bitmask
Execute the
Seccomp Profle Allowed?
H1 Validated
Argument Sets
No H2
Terminate
User Program System Call Permissions Table (SPT) # Safe Syscalls
[Hardware or Software]
Fig. 4: Workflow of Draco system call checking.
Fig. 5: Checking a system call and its arguments.
execution depends only on the input system call ID and
arguments—not on some other state. Hence, a validation that When a system call is encountered, to find the correct entry
succeeded in the past does not need to be repeated. in the VAT, Draco hashes the argument values. Specifically,
Draco can be implemented in software or in hardware. In when the system call’s SID and argument set are known,
the following, we first describe the basic design, and then how the SPT is indexed with the SID. Draco uses the Arguments
we implement it in software and in hardware. We start with a Bitmask to select which parts of the arguments to pass to the
design that checks system call IDs only, and then extend it to hash functions and generate hash table indices for the VAT.
check system call argument values as well. For example, if a system call uses two arguments of one byte
each, only the eight bits of each argument are used to generate
A. Checking System Call IDs Only the hash table indices.
If we only want to check system call IDs, the design is The address in Base is added to the hash table indices to
simple. It uses a table called System Call Permissions Table access the VAT. Draco fetches the contents of the two entries,
(SPT), with as many entries as different system calls. Each and compares them to the values of the arguments of the
entry stores a single Valid bit. If the Valid bit is set, the system system call. If any of the two comparisons produces a match,
call is allowed. In Draco’s hardware implementation, each core the system call is allowed.
has a hardware SPT. In both hardware and software implemen- Draco uses two hash functions (H1 and H2 in Figure 5)
tations, the SPT contains information for one process. for the following reason. To avoid having to deal with hash
When the System Call ID (SID) of the system call is known, table collisions in a VAT structure, which would result in
Draco finds the SPT entry for that SID, and checks if the Valid multiple VAT probes, each VAT structure uses 2-ary cuckoo
bit is set. If it is, the system call proceeds. Otherwise, the hashing [61], [62]. Such a design resolves collisions gracefully.
Seccomp filter is executed. However, it needs to use two hash functions to perform two
accesses to the target VAT structure in parallel. On a read, the
B. Checking System Call Arguments
resulting two entries are checked for a match. On an insertion,
To check system call arguments, we enhance the SPT and the cuckoo hashing algorithm is used to find a spot.
couple it with a software structure called Validated Argument
Table (VAT). Both SPT and VAT are private to the process. C. A Software Implementation
The VAT is the same for both software and hardware imple- Draco can be completely implemented in software. Both
mentations of Draco. the SPT and the VAT are software structures in memory. We
Figure 5 shows the structures. The SPT is still indexed build Draco as a Linux kernel component. In the OS kernel,
with the SID. An entry now includes, in addition to the Valid at the entry point of system calls, we insert instructions to
bit, a Base and an Argument Bitmask field. The Base field read the SID and argument set values (if argument checking
stores the virtual address of the section of the VAT that holds is configured). Draco uses the SID to index into the SPT and
information about this system call. The Argument Bitmask decides if the system call is allowed or not based on the Valid
stores information to determine what arguments are used by bit. If argument checking is configured, Draco further takes the
this system call. Recall that a system call can take up to correct bits from the arguments to perform the hashes, then
6 arguments, each 1 to 8 bytes long. Hence, the Argument reads the VAT, compares the argument values, and decides
Bitmask has one bit per argument byte, for a total of 48 whether the system call passes or not.
bits. A given bit is set if this system call uses this byte as
an argument—e.g., for a system call that uses two arguments D. An Initial Hardware Implementation
of one byte each, the Argument Bitmask has bits 0 and 8 set. An initial hardware implementation of Draco makes the SPT
The VAT of a process has one structure for each system a hardware table in each core. The VAT remains a software
call allowed for this process. Each structure is a hash table of structure in memory. The checks can only be performed when
limited size, indexed with two hash functions. If an entry in either the SID or both the SID and argument set values are
the hash table is filled, it holds the values of an argument set available. To simplify the hardware, Draco waits until the
that has been found to be safe for this particular system call. system call instruction reaches the Reorder Buffer (ROB)
46
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
head. At that point, all the information about the arguments ROB Head
is guaranteed to be available in specific registers. Hence, the
Draco hardware indexes the SPT and checks the Valid bit. If Reorder Buffer (ROB) Syscall
argument checking is enabled, the hardware further takes the Argument
correct bits from the arguments, performs the hashes, reads 1 SID Syscall Permissions Table (SPT) Bitmask Arg 1 … Arg N
the VAT, compares the argument values, and determines if #Args Selector
the system call passes. If the combination of system call and Base
2 SID Syscall Look-aside Buffer (SLB)
argument set are found not to have been validated, the OS is
invoked to run the Seccomp filter. 4 Arg 1… Arg N
+ H1
VI. D RACO H ARDWARE I MPLEMENTATION Validated Argument Table (VAT) 3
+ H2
The initial hardware implementation of Section V-D has
the shortcoming that it requires memory accesses to read Fig. 7: Flow of operations in an SLB access.
the VAT—in fact, two accesses, one for each of the hash On an SLB miss, the corresponding VAT is accessed. Draco
functions. While some of these accesses may hit in caches, takes the current system call argument set and the Argument
the performance is likely to suffer. For this reason, this section Bitmask from the SPT and, using hash functions H1 and
extends the hardware implementation of Draco to perform the H2 , generates two hash values ( 3 ). Such hash values are
system call checks in the pipeline with very low overhead. combined with the Base address provided by the SPT to access
A. Caching the Results of Successful Checks the VAT in memory. The resulting two VAT locations are
To substantially reduce the overhead of system call check- fetched in parallel, and their contents are compared to the
ing, Draco introduces a new hardware cache of recently- actual system call argument set. On a match, the SLB entry
encountered and validated system call argument sets. The is filled with the SID, the validated argument set, and the one
cache is called System Call Lookaside Buffer (SLB). It is hash value that fetched the correct entry from the VAT ( 4 ).
inspired by the TLB (Translation Lookaside Buffer). The system call is then allowed to resume execution.
Figure 6 shows the SLB. It is indexed with the system call’s On subsequent accesses to the SLB with the same system
SID and number of arguments that it requires. Since different call’s SID and argument set, the SLB will hit. A hit occurs
system calls have different numbers of arguments, the SLB when an entry is found in the SLB that has the same SID and
has a set-associative sub-structure for each group of system argument set values as the incoming system call. In this case,
calls that take the same number of arguments. This design the system call is allowed to proceed without requiring any
minimizes the space needed to cache arguments—each sub- memory hierarchy access. In this case, which we assume is
structure can be sized individually. Each entry in the SLB has the most frequent one, the checking of the system call and
an SID field, a Valid bit, a Hash field, and a validated argument arguments has negligible overhead.
set denoted as <Arg1 ... ArgN >. The Hash field contains the
hash value generated using this argument set via either the H1 B. Preloading Argument Sets
or H2 hash functions mentioned in Section V-B.
The SLB can significantly reduce the system call checking
SID #Args SID Valid Hash Arg 1 Arg 2 Arg 3 Arg 4 Arg 5 Arg 6
time by caching recently validated arguments. However, on a
miss, the system call stalls at the ROB head until its arguments
SID Valid Hash Arg 1 Arg 2 are successfully checked against data coming from the VAT in
memory. To avoid this problem, we want to hide all the stall
SID Valid Hash Arg 1
time by preloading the SLB entry early.
Different Number This function is performed by the System Call Target Buffer
of Arguments. Each (STB). The STB is inspired by the Branch Target Buffer. Its
subtable is set-associative
goal is to preload a potentially useful entry in the SLB as soon
as a system call is placed in the ROB. Specifically, it preloads
Arg 1 Hash
in the SLB an argument set that is in the VAT and, therefore,
=
has been validated in the past for the same system call. When
=
= the system call reaches the head of the ROB and tries to
check its arguments, it is likely that the correct argument set
Fig. 6: System Call Lookaside Buffer (SLB) structure.
is already preloaded into the SLB.
Figure 7 describes the SLB operations performed by Draco Fundamentally, the STB operates like the BTB. While the
alongside the SPT and VAT. When a system call is detected at BTB predicts the target location that the upcoming branch will
the head of the ROB, its SID is used to access the SPT ( 1 ). jump to, the STB predicts the location in the VAT that stores
The SPT uses the Argument Bitmask (Figure 5) to generate the validated argument set that the upcoming system call will
the argument count used by the system call. This information, require. Knowing this location allows the hardware to preload
together with the SID, is used to index the SLB ( 2 ). the information into the SLB in advance.
47
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
System Call Target Buffer and Operation. The STB is hash value provided by the STB matches the one stored in
shown in Figure 8. Each entry stores the program counter the SLB entry. This is shown in Figure 6. We call this an SLB
(PC) of a system call, a Valid bit, the SID of the system call, Preload hit. No further action is needed because the SLB likely
and a Hash field. The latter contains the one hash value (of has the correct entry. Later, when the system call reaches the
the two possible) that fetched this argument set from the VAT head of the ROB and the system call argument set is known,
when the entry was loaded into the STB. the SLB is accessed again with the SID and the argument set.
If the argument set matches the one in the SLB entry, it is an
ROB Head
SLB Access hit; the system call has been checked without any
Reorder Buffer (ROB) Syscall … memory system access at all.
If, instead, no SLB Preload hit occurs, Draco performs the
PC Valid SID Hash action (ii) above. Specifically, Draco reads from the SPT the
Base address of the structure in the VAT, reads from the STB
the hash value, combines them, and indexes the VAT ( 4 ). If
PC a valid entry is found in the VAT, its argument set is preloaded
into the SLB ( 5 ). Again, when the system call reaches the
head of the ROB, this SLB entry will be checked for a match.
Fig. 8: System Call Target Buffer (STB) structure. C. Putting it All Together
The preloading operation into the SLB is shown in Figure 9. Figure 10 shows the Draco hardware implementation in a
As soon as an instruction is inserted in the ROB, Draco uses its multi-core chip. It has three per-core hardware tables: SLB,
PC to access the STB ( 1 ). A hit in the STB is declared when STB, and SPT. In our conservatively-sized design, they use
the PC in the ROB matches one in the STB. This means that 8KB, 4KB, and 4KB, respectively. The SLB holds a process’
the instruction is a system call. At that point, the STB returns most popular checked system calls and their checked argument
the SID and the predicted hash value. Note that the SID is the sets. When a system call reaches the head of the ROB, the
correct one because there is only one single type of system call SLB is queried. The SLB information is backed up in the per-
in a given PC. At this point, the hardware knows the system process software VAT in memory. VAT entries are loaded into
call. However, it does not know the actual argument values the SLB on demand by hardware.
because, unlike in Section VI-A, the system call is not at the
ROB head, and the actual arguments may not be ready yet. Core Processor Main
SLB Memory
L1
ROB Head STB
ROB
L2
SPT TLB
Reorder Buffer (ROB) Syscall … VAT
VAT
VAT
L3 VATs
Core
SLB L1 [Software]
H1 or 2
1 PC Syscall Target Buffer (STB) L2
STB
ROB
SPT TLB
Base
2 SID Syscall Permissions Table (SPT) Fig. 10: Multicore with Draco structures in a shaded pattern.
#Args
48
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
Execution STB SLB SLB
Action Speed
Flow Access Preload Access
1 Hit Hit Hit None. Fast
After the SLB access miss, fetch the argument set from the VAT, and fill an entry
2 Hit Hit Miss Slow
in the STB with the correct hash, and in the SLB with the correct hash and arguments.
After the SLB preload miss, fetch the argument set from the VAT, and fill an entry
3 Hit Miss Hit Fast
in the SLB with the correct SID, hash and arguments.
4 Hit Miss Miss After the SLB preload miss, do as 3 . After the SLB access miss, do as 2 . Slow
5 Miss N/A Hit After the SLB access hit, fill an entry in the STB with the correct SID and hash. Fast
After the SLB access miss, fetch the argument set from the VAT, and fill an entry in the
6 Miss N/A Miss Slow
STB with correct SID and hash, and in the SLB with correct SID, hash, and arguments.
TABLE I: Possible Draco execution flows. The Slow cases can have different latency, depending on whether the VAT accesses
hit or miss in the caches. If the VAT does not have the requested entry, the OS is invoked and executes the Seccomp filter.
set may be found in the caches, which saves accesses to main The base addresses in the SPT are stored as virtual ad-
memory. Finally, if the correct argument set is not found in dresses. In the hardware implementation of Draco, the hard-
the VAT, the OS is invoked and executes the Seccomp filter. If ware translates this base address before accessing a VAT
such execution validates the system call, the VAT is updated structure. Due to the small size of the VAT (several KBs for a
as well with the validated SID and argument set. process), this design enjoys good TLB translation locality, as
Flow 3 occurs when the STB access hits, the SLB preload well as natural caching of translations in the cache hierarchy.
misses, and Draco initiates an SLB preload that eventually If a page fault occurs on a VAT access, the OS handles it as
delivers an SLB access hit. As soon as the preload SLB miss a regular page fault.
is declared, Draco fetches the argument set from the VAT,
and fills an entry in the SLB with the correct SID, hash, and B. Implementation Aspects
arguments. When the system call reaches the ROB head and
Invocation of Software Checking. When the Draco hardware
checks the SLB, it declares an SLB access hit. This is a fast
does not find the correct SID or argument set in the VAT,
case.
Flow 4 occurs when the STB access hits, the SLB preload it sets a register called SWCheckNeeded. As the system call
misses, Draco’s SLB preload brings incorrect data, and the instruction completes, the system call handler in the OS first
actual SLB access misses. In this case, after the SLB preload checks the SWCheckNeeded register. If it is set, the OS runs
miss, Draco performs the actions in Flow 3 ; after the SLB system call checking (e.g., Seccomp). If the check succeeds,
access miss, Draco performs the actions in Flow 2 . the OS inserts the appropriate entry in the VAT and continues;
Flows 5 and 6 start with an STB miss. Draco does not otherwise the OS does not allow the system call to execute.
preload the SLB because it does not know the SID. When the Data Coherence. System call filters are not modified during
system call reaches the head of the ROB, the SLB is accessed. process runtime to limit attackers’ capabilities. Hence, there is
In Flow 5 , the access hits. In this case, after the SLB access no need to add support to keep the three hardware structures
hit, Draco fills an entry in the STB with the correct SID and (SLB, STB, and SPT) coherent across cores. Draco only
hash. In Flow 6 , the SLB access misses, and Draco has to provides a fast way to clear all these structures in one shot.
fill an entry in both STB and SLB. 5 is fast and 6 is slow. Context Switches. A simple design would simply invalidate
VII. S YSTEM S UPPORT the three hardware structures on a context switch. To reduce
the start-up overhead after a context switch, Draco improves
A. VAT Design and Implementation
on this design with two simple supports. First, on a context
The OS kernel is responsible for filling the VAT of each switch, the OS saves to memory a few key SPT entries for the
process. The VAT of a process has a two-way cuckoo hash process being preempted, and restores from memory the saved
table for each system call allowed to the process. The OS SPT entries for the incoming process. To pick which SPT
sizes each table based on the number of argument sets used by entries to save, Draco enhances the SPT with an Accessed bit
corresponding system call (e.g., based on the given Seccomp per entry. When a system call hits on one of the SPT entries,
profile). To minimize insertion failures in the hash tables, the the entry’s Accessed bit is set. Periodically (e.g., every 500µs),
size of each table is over-provisioned two times the number the hardware clears the Accessed bits. On a context switch,
of estimated argument sets. On an insertion to a table, if the only the SPT entries with the Accessed bit set are saved.
cuckoo hashing fails after a threshold number of attempts, the The second support is that, if the process that will be
OS makes room by evicting one entry. scheduled after the context switch is the same as the one that
During a table lookup, either the OS or the hardware
is being preempted, the structures are not invalidated. This is
(in the hardware implementation) accesses the two ways of
safe with respect to side-channels. If a different process is to
the cuckoo hash table. For the hash functions, we use the
be scheduled, the hardware structures are invalidated.
ECM A [63] and the ¬ ECM A polynomials to compute
the Cyclic Redundancy Check (CRC) code of the system call Simultaneous Multithreading (SMT) Support. Draco can
argument set (Figure 5). support SMT by partitioning the three hardware structures
49
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
and giving one partition to each SMT context. Each context mechanism carefully. The idea is to ensure that preloading
accesses its partition. leaves no side effect in the Draco hardware structures until
the system call instruction reaches the ROB head.
VIII. G ENERALITY OF D RACO Specifically, if an SLB preload request hits in the SLB, the
The discussions so far presented a Draco design that is LRU state of the SLB is not updated until the corresponding
broadly compatible with Linux’s Seccomp, so we could de- non-speculative SLB access. Moreover, if an SLB preload
scribe a whole-system solution. In practice, it is easy to apply request misses, the requested VAT entry is not immediately
the Draco ideas to other system call checking mechanisms loaded into the SLB; instead, it is stored in a Temporary
such as OpenBSD’s Pledge and Tame [12], [13], and Window’s Buffer. When the non-speculative SLB access is performed,
System Call Disable Policy [64]. Draco is generic to modern the entry is moved into the SLB. If, instead, the system
OS-level sandboxing and containerization technologies. call instruction is squashed, the temporary buffer is cleared.
Specifically, recall that the OS populates the SPT with Fortunately, this temporary buffer only needs to store a few
system call information. Each SPT entry corresponds to a entries (i.e., 8 in our design), since the number of system call
system call ID and has argument information. Hence, different instructions in the ROB at a time is small.
OS kernels will have different SPT contents due to different The second potential reason for side channels is the sharing
system calls and different arguments. of the Draco hardware structures by multiple processes. This
In our design, the Draco hardware knows which registers case is eliminated as follows. First, in the presence of SMT,
contain which arguments of system calls. However, we can the SLB, STB, and SPT structures are partitioned on a per-
make the design more general and usable by other OS kernels. context basis. Second, in all cases, when a core (or hardware
Specifically, we can add an OS-programmable table that context) performs a context switch to a different process, the
contains the mapping between system call argument number SLB, STB, and SPT are invalidated.
and general-purpose register that holds it. This way, we can Regarding side channels in the cache hierarchy, they can be
use arbitrary registers. eliminated using existing proposals against them. Specifically,
The hardware structures proposed by Draco can further for cache accesses due to speculative SLB preload, we can use
support other security checks that relate to the security of any of the recent proposals that protect the cache hierarchy
transitions between different privilege domains. For example against speculation attacks (e.g., [67]–[72]). Further, for state
Draco can support security checks in virtualized environments, left in the cache hierarchy as Draco hardware structures are
such as when the guest OS invokes the hypervisor through used, we can use existing proposals like cache partition-
hypercalls. Similarly, Draco can be applied to user-level con- ing [73], [74]. Note that this type of potential side channel
tainer technologies such as Google’s gVisor [19], where a user- also occurs in Seccomp. Indeed, on a context switch, Draco
level guardian process such as the Sentry or Gofer is invoked to may leave VAT state in the caches, but Seccomp may also
handle requests of less privileged application processes. Draco leave state in the caches that reveals what system calls where
can also augment the security of library calls, such as in the checked. For these reasons, we consider side channels in the
recently-proposed Google Sandboxed API project [26]. cache hierarchy beyond the scope of this paper.
In general, the Draco hardware structures are most attractive
in processors used in domains that require both high perfor- X. E VALUATION M ETHODOLOGY
mance and high security. A particularly relevant domain is We evaluate both the software and the hardware implemen-
interactive web services, such as on-line retail. Studies by tations of Draco. We evaluate the software implementation on
Akamai, Google, and Amazon have shown that even short the Intel Xeon E5-2660 v3 multiprocessor system described
delays can impact online revenue [65], [66]. Furthermore, in in Section IV-A; we evaluate the hardware implementation of
this domain, security in paramount, and the expectation is that Draco with cycle-level simulations.
security checks will become more extensive in the near future.
A. Workloads and Metrics
IX. S ECURITY I SSUES We use fifteen workloads split into macro and micro bench-
To understand the security issues in Draco, we consider two marks. The macro benchmarks are long-running applications,
types of side-channels: those in the Draco hardware structures including the Elasticsearch [75], HTTPD, and NGINX web
and those in the cache hierarchy. servers, Redis (an in-memory cache), Cassandra (a NoSQL
Draco hardware structures could potentially provide side database), and MySQL. We use the Yahoo! Cloud Serving
channels due to two reasons: (i) Draco uses speculation as Benchmark (YCSB) [76] using workloada and workloadc
it preloads the SLB before the system call instruction reaches with 10 and 30 clients to drive Elasticsearch and Cassandra,
the head of the ROB, and (ii) the Draco hardware structures respectively. For HTTPD and NGINX, we use ab, the Apache
are shared by multiple processes. Consider the first reason. An HTTP server benchmarking tool [77] with 30 concurrent
adversary could trigger SLB preloading followed by a squash, requests. We drive MySQL with the OLTP workload of
which could then speed-up a subsequent benign access that SysBench [78] with 10 clients. For Redis, we use the redis-
uses the same SLB entry and reveal a secret. To shield Draco benchmark [79] with 30 concurrent requests. We also evaluate
from this speculation-based attack, we design the preloading Function-as-as-Service scenarios, using functions similar to
50
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
the sample functions of OpenFaaS [80]. Specifically, we use Processor Parameters
a pwgen function that generates 10K secure passwords and a Multicore chip 10 OOO cores, 128-entry Reorder Buffer, 2GHz
grep function that searches patterns in the Linux source tree. L1 (D, I) cache 32KB, 8 way, write back, 2 cyc. access time (AT)
L2 cache 256KB, 8 way, write back, 8 cycle AT
For micro benchmarks, we use FIO from SysBench [78] L3 cache 8MB, 16 way, write back, shared, 32 cycle AT
with 128 files of a total size of 512MB, GUPS from Per-Core Draco Parameters
the HPC Challenge Benchmark (HPCC) [81], Syscall from STB 256 entries, 2 way, 2 cycle AT
UnixBench [82] in mix mode, and fifo, pipe, domain, and SLB (1 arg) 32 entries, 4 way, 2 cycle AT
message queue from IPC Bench [83] with 1000B packets. SLB (2 arg) 64 entries, 4 way, 2 cycle AT
SLB (3 arg) 64 entries, 4 way, 2 cycle AT
For macro benchmarks, we measure the mean request SLB (4 arg) 32 entries, 4 way, 2 cycle AT
latency in HTTPD, NGINX, Elasticsearch, Redis, Cassandra, SLB (5 arg) 32 entries, 4 way, 2 cycle AT
and MySQL, and the total execution time in the functions. SLB (6 arg) 16 entries, 4 way, 2 cycle AT
Temporary Buffer 8 entries, 4 way, 2 cycle AT
For micro benchmarks, we measure the message latency in SPT 384 entries, 1 way, 2 cycle AT
the benchmarks from IPC Bench, and the execution time in Main-Memory Parameters
the remaining benchmarks. Capacity; Channels 32GB; 2
Ranks/Channel 8
B. System Call Profile Generation Banks/Rank 8
Freq; Data rate 1GHz; DDR
There are a number of ways to create application-specific Host and Docker Parameters
system call profiles using both dynamic and static analysis. Host OS CentOS 7.6.1810 with Linux 3.10
They include system call profiling (where system calls not Docker Engine 18.09.3
observed during profiling are not allowed in production) [49],
[56], [84]–[86] and binary analysis [47], [48], [54]. TABLE II: Architectural configuration used for evaluation.
We build our own software toolkit for automatically
For HTTPD, NGINX, Elasticsearch, Redis, Cassandra, and
generating the syscall-noargs, syscall-complete, and
MySQL, we instrument the applications to track the beginning
syscall-complete-2x profiles described in Section IV-A
of the steady state. We then warm-up the architectural state
for target applications. The toolkit has components for (1)
by running 250 million instructions, and finally measure for
attaching strace onto a running application to collect the
two billion instructions. For functions and micro benchmarks,
system call traces, and (2) generating the Seccomp profiles
we warm-up the architectural state for 250 million instructions
that only allow the system call IDs and argument sets that
and measure for two billion instructions.
appeared in the recorded traces. We choose to build our own
toolkit because we find that no existing system call profiling The software stack for the hardware simulations uses Cen-
tool includes arguments in the profiles. tOS 7.6.1810 with Linux 3.10 and Docker Engine 18.09.03.
For syscall-complete-2x, we run the syscall- This Linux version is older than the one used for the real-
complete profile twice in a row. Hence, the resulting profile system measurements of Section IV and the evaluation of
performs twice the number of checks as syscall-complete. the software implementation of Draco in Section XI-A, which
The goal of syscall-complete-2x is to model a near-future uses Ubuntu 18.04 with Linux 5.3.0. Our hardware simulation
environment where we need more extensive security checks. infrastructure could not boot the newer Linux kernel. However,
note that the Draco hardware evaluation is mostly independent
C. Modeling the Hardware Implementation of Draco of the kernel version. The only exception is during the cold-
We use cycle-level full-system simulations to model a server start phase of the application, when the VAT structures are
architecture with 10 cores and 32 GB of main memory. The populated. However, our hardware simulations mostly model
configuration is shown in Table II. Each core is out-of-order steady state and, therefore, the actual kernel version has
(OOO) and has private L1 instruction and data caches, and a negligible impact.
private unified L2 cache. The banked L3 cache is shared. Appendix A repeats the real-system measurements of Sec-
We integrate the Simics [87] full-system simulator with tion IV and the evaluation of the software implementation of
the SST framework [88], together with the DRAMSim2 [89] Draco for CentOS 7.6.1810 with Linux 3.10.
memory simulator. Moreover, we utilize Intel SAE [90] on
XI. E VALUATION
Simics for OS instrumentation. We use CACTI [91] for energy
and access time evaluation of memory structures and the A. Performance of Draco
Synopsys Design Compiler [92] for evaluating the RTL im-
Draco Software Implementation. Figure 11 shows the per-
plementation of the hash functions. The system call interface
formance of the workloads using the software implementation
follows the semantics of x86 [35]. The Simics infrastructure
of Draco. The figure takes three Seccomp profiles from
provides the actual memory and control register contents for
Section IV-A (syscall-noargs, syscall-complete, and
each system call. To evaluate the hardware components of
syscall-complete-2x) and compares the performance of
Draco, we model them in detail in SST. For the software
the workloads on a conventional system (labeled Seccomp
components, we modify the Linux kernel and instrument the
in the figure) to a system augmented with software Draco
system call handler and Seccomp.
(labeled DracoSW in the figure). The figure is organized as
51
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
macro-benchmarks micro-benchmarks
2.00 syscall-noargs
(Seccomp)
(Normalized to Insecure)
Latency/Execution Time
1.75
syscall-noargs
1.50 (DracoSW)
1.25 syscall-complete
(Seccomp)
1.00
syscall-complete
0.75 (DracoSW)
0.50 syscall-complete-2x
0.25 (Seccomp)
0.00 syscall-complete-2x
httpd nginx elastic mysql cassandra redis grep pwgen average sysbench hpcc unixbench fifo pipe domain mq average (DracoSW)
search macro fio syscall (ipc) (ipc) (ipc) (ipc) micro
Fig. 11: Latency or execution time of the workloads using the software implementation of Draco and other environments. For
each workload, the results are normalized to insecure.
1.2 macro-benchmarks micro-benchmarks
1.0 insecure
(Normalized to Insecure)
Latency/Execution Time
syscall-noargs
0.8 (DracoHW)
syscall-complete
0.6 (DracoHW)
0.4 syscall-complete-2x
(DracoHW)
0.2
0.0 httpd nginx elastic mysql cassandra redis grep pwgen average sysbench hpcc unixbench fifo pipe domain mq average
search macro fio syscall (ipc) (ipc) (ipc) (ipc) micro
Fig. 12: Latency or execution time of the workloads using the hardware implementation of Draco. For each workload, the
results are normalized to insecure.
Figure 2, and all the bars are normalized to the insecure call IDs and argument set values (including the double-size
baseline that performs no security checking. checks). In all cases, the average overhead of hardware Draco
We can see that software Draco reduces the overhead of over insecure is 1%. Hence, hardware Draco is a secure,
security checking relative to Seccomp, especially for complex overhead-free design.
argument checking. Specifically, when checking both system
call IDs and argument sets with syscall-complete, the B. Hardware Structure Hit Ratios
average execution times of the macro and micro bench- To understand hardware Draco’s performance, Figure 13
marks with Seccomp are 1.14× and 1.25× higher, respec- shows the hit rates of the STB and SLB structures. For the
tively, than insecure. With software Draco, these execution SLB, we show two bars: Access and Preload. SLB Access
times are only 1.10× and 1.18× higher, respectively, than occurs when the system call is at the head of the ROB. SLB
insecure. When checking more complex security profiles Preload occurs when the system call is inserted in the ROB
with syscall-complete-2x, the reductions are higher. The and the STB is looked-up. An SLB Preload hit means only
corresponding numbers with Seccomp are 1.21× and 1.42×; that the SLB likely contains the desired entry. An SLB Preload
with software Draco, these numbers are 1.10× and 1.23× only. miss triggers a memory request to preload the entry.
However, we also see that the checking overhead of software
Draco is still substantial, especially for some workloads. This STB SLB Access SLB Preload
is because the software implementation of argument checking
100 macro-benchmarks micro-benchmarks
Hit Rate (%)
52
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
close to 99%. This means that, in most cases, the working set sets used by that system call (e.g., based on the given Seccomp
of the system call IDs and argument set values being used fits profile). Since the number of system calls is bounded, the total
in the SLB. size of the VAT is bounded. In our evaluation, we find that
However, even for these four applications, the SLB Access the geometric mean of the VAT size for a process is 6.98KB
hit rates are 75–93%. This means that SLB preloading is across all evaluated applications.
successful in bringing most of the needed entries into the SLB
on time, to deliver a hit when the entries are needed. Hence, D. Assessing the Security of Application-Specific Profiles
we recommend the use of SLB preloding. We compare the security benefits of an application-specific
profile (syscall-complete from Section IV-A) to the
C. Draco Resource Overhead generic, commonly deployed docker-default profile. Recall
Hardware Components. Hardware Draco introduces three that syscall-complete checks both syscall IDs and argu-
hardware structures (SPT, STB, and SLB), and requires a CRC ments, while docker-default only checks syscall IDs.
hash generator. In Table III, we present the CACTI analysis Figure 15(a) shows the number of different system calls
for the three hardware structures, and the Synopsys Design allowed by the different profiles. First, linux shows the total
Compiler results for the hash generator implemented in VHDL number of system calls in Linux, which is 403. The next
using a linear-feedback shift register (LFSR) design. For each bar is docker-default, which allows 358 system calls. For
unit, the table shows the area, access time, dynamic energy of the remaining bars, the total height is the number allowed
a read access, and leakage power. In the SLB, the area and by syscall-complete for each application. We can see that
leakage analysis includes all the subtables for the different syscall-complete only allows 50–100 system calls, which
argument counts and the temporary buffer. For the access time increases security substantially. Moreover, the figure shows
and dynamic energy, we show the numbers for the largest that not all of these system calls are application-specific. There
structure, namely, the three-argument subtable. is a fraction of about 20% (remaining in dark color) that
are required by the container runtime. Note that, as shown
Parameter SPT STB SLB CRC Hash
in Section IV, while application-specific profiles are smaller,
Area (mm2) 0.0036 0.0063 0.01549 0.0019
Access time (ps) 105.41 131.61 112.75 964
their checking overhead is still substantial.
Dyn. rd energy (pJ) 1.32 1.78 2.69 0.98
Leak. power (mW) 1.39 2.63 3.96 0.106 400 Total Allowed Syscalls
TABLE III: Draco hardware analysis at 22 nm. 300 Application-Specific Syscalls
# Syscalls
Since all the structures are accessed in less that 150 ps, we 200
conservatively use a 2-cycle access time for these structures.
100
Further, since 964 ps are required to compute the CRC hash
0 t l l ) ) ) )
lrinduexfaulhttpndgc isnexarcmh ysssqandraredisgrpepwegnecnh fiohhpscycsficfaol (iippec (aipinc (mipqc (ipc
function, we account for 3 cycles in our evaluation.
c k e la s t i c a sy s b enc p dom
SLB Sizing. Figure 14 uses a violin plot to illustrate the do e
unix
b
probability density of the number of arguments of system (a) Number of system calls allowed.
calls. The first entry (linux) corresponds to the complete 1.3K1.6K2.4K # Arguments Checked
# Arguments and Values
53
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
macro-benchmarks 2.2 2.6 2.7 4.3 4.1, 2.7, 3.7 micro-benchmarks 2.4
(Normalized to Insecure)
2.00
Latency/Execution Time
1.75
1.50 insecure
1.25 docker-default
1.00 syscall-noargs
0.75
0.50 syscall-complete
0.25
0.00 httpd nginx elastic mysql cassandra redis grep pwgen average sysbench hpcc unixbench fifo pipe domain mq average
search macro fio syscall (ipc) (ipc) (ipc) (ipc) micro
Fig. 16: Latency or execution time of the workloads using different Seccomp profiles. For each workload, the results are
normalized to insecure (i.e., Seccomp disabled). This expertiment uses the older CentOS 7.6.1810 with Linux 3.10.
2.00 macro-benchmarks 2.2 2.7 4.3 2.7, 3.7 micro-benchmarks 2.4
(Normalized to Insecure)
insecure
Latency/Execution Time
1.75
1.50 syscall-noargs
(Seccomp)
1.25 syscall-noargs
1.00 (DracoSW)
0.75 syscall-complete
0.50 (Seccomp)
0.25 syscall-complete
0.00 (DracoSW)
httpd nginx elastic mysql cassandra redis grep pwgen average sysbench hpcc unixbench fifo pipe domain mq average
search macro fio syscall (ipc) (ipc) (ipc) (ipc) micro
Fig. 17: Latency or execution time of the workloads using the software implementation of Draco and other environments. For
each workload, the results are normalized to insecure. This expertiment uses the older CentOS 7.6.1810 with Linux 3.10.
XII. OTHER R ELATED W ORK 1.10× and 1.18× higher, respectively, than on the insecure
baseline. Finally, with hardware Draco, the execution time was
There is a rich literature on system call checking [93]–
within 1% of the insecure baseline.
[104]. Early implementations based on kernel tracing [93]–
We expect more complex security profiles in the near future.
[99] incur excessive performance penalty, as every system call
In this case, we found that the overhead of conventional
is penalized by at least two additional context switches.
Seccomp checking increases substantially, while the overhead
A current proposal to reduce the overhead of Seccomp is
of software Draco goes up only modestly. The overhead of
to use a binary tree in libseccomp, to replace the branch
hardware Draco remains within 1% of the insecure baseline.
instructions of current system call filters [27]. However, this
does not fundamentally address the overhead. In Hromatka’s ACKNOWLEDGMENTS
own measurement, the binary tree-based optimization still
leads to 2.4× longer system call execution time compared This work was funded in part by NSF under grants CNS-
to Seccomp disabled [27]. Note that this result is without 1956007, CNS-1763658, CCF-1725734, CCF-1816615, CCF-
any argument checking. As shown in Section IV, argument 2029049, and a gift from Facebook. The authors thank Andrea
checking brings further overhead, as it leads to more complex Arcangeli from RedHat, Hubertus Franke and Tobin Feldman-
filter programs. Speculative methods [105]–[107] may not Fitzthum from IBM for discussions on Seccomp performance,
help reduce the overhead either, as their own overhead may and Seung Won Min from UIUC for assisting with the
surpass that of the checking code—they are designed for heavy Synopsys toolchain.
security analysis such as virus scanning and taint analysis.
A PPENDIX
A few architectural security extensions have been proposed
for memory-based protection, such as CHERI [108]–[110], This appendix repeats the real-system measurements of Sec-
CODOMs [111], [112], PUMP [113], REST [114], and Cal- tion IV and the evaluation of software Draco of Section XI-A
iforms [115]. While related, Draco has different goals and for the older CentOS 7.6.1810 with Linux 3.10 (except for
resulting design—it checks system calls rather than load/store the syscall-complete-2x profiles). The KPTI and Spectre
instructions, and its goal is to protect the OS. patches are enabled. The BPF JIT is enabled but Seccomp
does not make use of its functionality. These experiments were
XIII. C ONCLUSION performed for the initial submission of this paper.
To minimize system call checking overhead, we proposed If we compare Figure 16 to Figure 2, we see that the newer
Draco, a new architecture that caches validated system call kernel significantly improves the performance of Seccomp.
IDs and argument values. System calls first check a special Several pathological cases were eliminated and the overhead
cache and, on a hit, are declared validated. We presented of docker-default is reduced. Despite these improvements,
both a software and a hardware implementation of Draco. the overhead of syscall-complete remains significant.
Our evaluation showed that the average execution time of If we compare Figure 17 to Figure 11, we see that
macro and micro benchmarks with conventional Seccomp the improvement attained by software Draco over Seccomp
checking was 1.14× and 1.25× higher, respectively, than on with syscall-complete has decreased. However, software
an insecure baseline that performs no security checks. With Draco still significantly reduces overhead, especially for
software Draco, the average execution time was reduced to syscall-complete-2x.
54
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES [36] Linux, “Secure Computing with filters,”
https://www.kernel.org/doc/Documentation/prctl/seccomp filter.txt.
[1] “CVE-2017-5123,” https://nvd.nist.gov/vuln/detail/CVE-2017-5123. [37] T. Garfinkel, “Traps and Pitfalls: Practical Problems in System Call In-
[2] “CVE-2016-0728,” https://nvd.nist.gov/vuln/detail/CVE-2016-0728. terposition Based Security Tools,” in Proceedings of the 2004 Network
[3] “CVE-2014-3153,” https://nvd.nist.gov/vuln/detail/CVE-2014-3153. and Distributed System Security Symposium (NDSS’04), San Diego,
[4] “CVE-2017-18344,” https://nvd.nist.gov/vuln/detail/CVE-2017-18344. California, Feb. 2003.
[5] “CVE-2018-18281,” https://nvd.nist.gov/vuln/detail/CVE-2018-18281. [38] R. N. M. Watson, “Exploiting Concurrency Vulnerabilities in System
[6] “CVE-2015-3290,” https://nvd.nist.gov/vuln/detail/CVE-2015-3290. Call Wrappers,” in Proceedings of the 1st USENIX Workshop on
[7] “CVE-2016-5195,” https://nvd.nist.gov/vuln/detail/CVE-2016-5195. Offensive Technologies (WOOT’07), Boston, MA, USA, Aug. 2007.
[8] “CVE-2014-9529,” https://nvd.nist.gov/vuln/detail/CVE-2014-9529. [39] AWS, “Firecracker microVMs,” https://github.com/firecracker-
[9] “CVE-2014-4699,” https://nvd.nist.gov/vuln/detail/CVE-2014-4699. microvm/firecracker/blob/master/vmm/src/default syscalls/filters.rs.
[10] “CVE-2016-2383,” https://nvd.nist.gov/vuln/detail/CVE-2016-2383. [40] Moby Project, “A collaborative project for the con-
[11] J. Edge, “A seccomp overview,” https://lwn.net/Articles/656307/, Sep. tainer ecosystem to assemble container-based systems,”
2015. https://github.com/moby/moby/blob/master/profiles/seccomp/default.json.
[12] “OpenBSD Pledge,” https://man.openbsd.org/pledge. [41] Singularity, “Singularity: Application Containers for Linux,”
[13] “OpenBSD Tame,” https://man.openbsd.org/OpenBSD-5.8/tame.2. https://github.com/sylabs/singularity/blob/master/etc/seccomp-
[14] “PROCESS MITIGATION SYSTEM CALL DISABLE POLICY struc- profiles/default.json.
ture,” https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns- [42] Subgraph, “Repository of maintained OZ profiles and seccomp filters,”
winnt-process mitigation system call disable policy. https://github.com/subgraph/subgraph-oz-profiles.
[15] P. Lawrence, “Seccomp filter in Android O,” https://android- [43] “QEMU,” https://github.com/qemu/qemu/blob/master/qemu-
developers.googleblog.com/2017/07/seccomp-filter-in-android-o.html, seccomp.c.
Jul. 2017. [44] Julien Tinnes, “A safer playground for your Linux and Chrome
[16] J. Corbet, “Systemd Gets Seccomp Filter Support,” OS renderers,” https://blog.cr0.org/2012/09/introducing-chromes-next-
https://lwn.net/Articles/507067/. generation.html, Nov. 2012.
[17] Docker, “Seccomp security profiles for Docker,” [45] ——, “Introducing Chrome’s next-generation Linux sand-
https://docs.docker.com/engine/security/seccomp/. box,” https://blog.cr0.org/2012/09/introducing-chromes-next-
[18] Ubuntu, “LXD,” https://help.ubuntu.com/lts/serverguide/lxd.htmllxd- generation.html, Sep. 2012.
seccomp. [46] “OZ: a sandboxing system targeting everyday workstation
[19] Google, “gVisor: Container Runtime Sandbox,” applications,” https://github.com/subgraph/oz/blob/master/oz-
https://github.com/google/gvisor/blob/master/runsc/boot/filter/config.go. seccomp/tracer.go, 2018.
[20] AWS, “Firecracker Design,” https://github.com/firecracker- [47] “binctr: Fully static, unprivileged, self-contained, containers as exe-
microvm/firecracker/blob/master/docs/design.md. cutable binaries,” https://github.com/genuinetools/binctr.
[21] A. Agache, M. Brooker, A. Iordache, A. Liguori, R. Neugebauer, [48] “go2seccomp: Generate seccomp profiles from go binaries,”
P. Piwonka, and D.-M. Popa, “Firecracker: Lightweight virtualization https://github.com/xfernando/go2seccomp.
for serverless applications,” in 17th USENIX Symposium on Networked [49] oci-seccomp-bpf-hook, “OCI hook to trace syscalls and generate a
Systems Design and Implementation (NSDI 20). Santa Clara, CA: seccomp profile,” https://github.com/containers/oci-seccomp-bpf-hook.
USENIX Association, 2020, pp. 419–434. [50] “Kubernetes Seccomp Operator,” https://github.com/kubernetes-
[22] Rtk Documentation, “Seccomp Isolators Guide,” sigs/seccomp-operator.
https://coreos.com/rkt/docs/latest/seccomp-guide.html. [51] S. Kerner, “The future of Docker containers,”
[23] Singularity, “Security Options in Singularity,” https://lwn.net/Articles/788282/.
https://sylabs.io/guides/3.0/user-guide/security options.html. [52] Red Hat, “Configuring OpenShift Container Platform for a
[24] Kubernetes Documentation, “Configure a Security Context for Custom Seccomp Profile,” https://docs.openshift.com/container-
a Pod or Container,” https://kubernetes.io/docs/tasks/configure-pod- platform/3.5/admin guide/seccomp.htmlseccomp-configuring-
container/security-context/, Jul. 2019. openshift-with-custom-seccomp.
[25] Mesos, “Linux Seccomp Support in Mesos Containerizer,” [53] Rtk Documentation, “Overriding Seccomp Filters,”
http://mesos.apache.org/documentation/latest/isolators/linux-seccomp/. https://coreos.com/rkt/docs/latest/seccomp-guide.htmloverriding-
[26] “Sandboxed API,” https://github.com/google/sandboxed-api, 2018. seccomp-filters.
[27] T. Hromatka, “Seccomp and Libseccomp performance improvements,” [54] Q. Zeng, Z. Xin, D. Wu, P. Liu, and B. Mao, “Tailored Application-
in Linux Plumbers Conference 2018, Vancouver, BC, Canada, Nov. specific System Call Tables,” The Pennsylvania State University, Tech.
2018. Rep., 2014.
[28] ——, “Using a cBPF Binary Tree in Libseccomp to Improve Perfor- [55] C.-C. Tsai, B. Jain, N. A. Abdul, and D. E. Porter, “A Study of
mance,” in Linux Plumbers Conference 2018, Vancouver, BC, Canada, Modern Linux API Usage and Compatibility: What to Support When
Nov. 2018. You’re Supporting,” in Proceedings of the 11th European Conference
[29] M. Kerrisk, “Using seccomp to Limit the Kernel Attack Surface,” in on Computer Systems (EuroSys’16), London, United Kingdom, Apr.
Linux Plumbers Conference 2015, Seattle, WA, USA, Aug. 2015. 2016.
[30] A. Grattafiori, “Understanding and Hardening Linux Containers,” NCC [56] Z. Wan, D. Lo, X. Xia, L. Cai, and S. Li, “Mining Sandboxes for Linux
Group, Tech. Rep., Jun. 2016. Containers,” in Proceedings of 2017 IEEE International Conference on
[31] T. Kim and N. Zeldovich, “Practical and Effective Sandboxing for Non- Software Testing, Verification and Validation (ICST’17), Tokyo, Japan,
root Users,” in Proceedings of the 2013 USENIX Conference on Annual Mar. 2017.
Technical Conference (USENIX ATC’13), San Jose, CA, Jun. 2013. [57] Moby Project, “An open framework to assemble specialized container
[32] C. Kruegel, D. Mutz, F. Valeur, and G. Vigna, “On the Detection systems without reinventing the wheel,” https://mobyproject.org/, 2019.
of Anomalous System Call Arguments,” in Proceedings of the 8th [58] K. McAllister, “Attacking hardened Linux
European Symposium on Research in Computer Security, Gjøvik, systems with kernel JIT spraying,”
Norway, Oct. 2003. https://mainisusuallyafunction.blogspot.com/2012/11/attacking-
[33] D. Mutz, F. Valeur, G. Vigna, and C. Kruegel, “Anomalous System hardened-linux-systems-with.html, Nov. 2012.
Call Detection,” ACM Transactions on Information and System Security [59] E. Reshetova, F. Bonazzi, and N. Asokan, “Randomization can’t stop
(TISSEC), vol. 9, no. 1, pp. 61–93, Feb. 2006. BPF JIT spray,” https://www.blackhat.com/docs/eu-16/materials/eu-
[34] F. Maggi, M. Matteucci, and S. Zanero, “Detecting Intrusions through 16-Reshetova-Randomization-Can’t-Stop-BPF-JIT-Spray-wp.pdf, Nov.
System Call Sequence and Argument Analysis,” IEEE Transactions on 2016.
Dependable and Secure Computing (TDSC), vol. 7, no. 4, pp. 381–395, [60] R. Gawlik and T. Holz, “SoK: Make JIT-Spray Great Again,” in
Oct. 2010. Proceedings of the 12th USENIX Workshop on Offensive Technologies
[35] Intel, “Intel 64 and IA-32 Architectures Software Developer’s Manual,” (WOOT’18), 2018.
2019. [61] R. Pagh and F. F. Rodler, “Cuckoo Hashing,” Journal of Algorithms,
vol. 51, no. 2, pp. 122–144, May 2004.
55
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
[62] D. Fotakis, R. Pagh, P. Sanders, and P. Spirakis, “Space Efficient Hash [86] Adobe Security, “Better Security Hygiene for Containers,”
Tables with Worst Case Constant Access Time,” Theory of Computing https://blogs.adobe.com/security/2018/08/better-security-hygiene-
Systems, vol. 38, no. 2, pp. 229–248, Feb 2005. for-containers.html, 2018.
[63] ECMA International, “Standard ECMA-182,” https://www.ecma- [87] P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren,
international.org/publications/standards/Ecma-182.htm. G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner,
[64] Microsoft, “Windows System Call Disable Policy,” “Simics: A Full System Simulation Platform,” IEEE Computer, vol. 35,
https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt- no. 2, pp. 50–58, Feb. 2002.
process mitigation system call disable policy. [88] A. F. Rodrigues, J. Cook, E. Cooper-Balis, K. S. Hemmert, C. Kersey,
[65] Akamai, “Akamai Online Retail Performance Report: Milliseconds R. Riesen, P. Rosenfeld, R. Oldfield, and M. Weston, “The Structural
Are Critical,” https://www.akamai.com/uk/en/about/news/press/2017- Simulation Toolkit,” in Proceedings of the 2006 ACM/IEEE conference
press/akamai-releases-spring-2017-state-of-online-retail-performance- on Supercomputing (SC’10), Tampa, Florida, Nov. 2006.
report.jsp. [89] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A Cycle
[66] Google, “Find out how you stack up to new industry benchmarks for Accurate Memory System Simulator,” IEEE Computer Architecture
mobile page speed,” https://www.thinkwithgoogle.com/marketing- Letters, vol. 10, no. 1, pp. 16–19, Jan 2011.
resources/data-measurement/mobile-page-speed-new-industry- [90] N. Chachmon, D. Richins, R. Cohn, M. Christensson, W. Cui, and V. J.
benchmarks/. Reddi, “Simulation and Analysis Engine for Scale-Out Workloads,” in
[67] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. W. Fletcher, and Proceedings of the 2016 International Conference on Supercomputing
J. Torrellas, “InvisiSpec: Making Speculative Execution Invisible in (ICS’16), 2016.
the Cache Hierarchy,” in Proceedings of the 51st Annual IEEE/ACM [91] R. Balasubramonian, A. B. Kahng, N. Muralimanohar, A. Shafiee, and
International Symposium on Microarchitecture (MICRO-51), 2018. V. Srinivas, “CACTI 7: New Tools for Interconnect Exploration in
[68] M. Taram, A. Venkat, and D. Tullsen, “Context-Sensitive Fencing: Innovative Off-Chip Memories,” ACM Transactions on Architecture
Securing Speculative Execution via Microcode Customization,” in Pro- and Code Optimization, vol. 14, no. 2, pp. 14:1–14:25, Jun. 2017.
ceedings of the 24th International Conference on Architectural Support [92] Synopsys, “Synopsys Design Compiler,” https://www.synopsys.com/.
for Programming Languages and Operating Systems (ASPLOS’19), [93] I. Goldberg, D. Wagner, R. Thomas, and E. A. Brewer, “A Secure
2019. Environment for Untrusted Helper Applications Confining the Wily
[69] C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander, Hacker,” in Proceedings of the 6th USENIX Security Symposium, San
“Efficient Invisible Speculative Execution Through Selective Delay and Jose, California, Jul. 1996.
Value Prediction,” in Proceedings of the 46th International Symposium [94] D. A. Wagner, “Janus: an Approach for Confinement of Untrusted
on Computer Architecture (ISCA’19), 2019. Applications,” EECS Department, University of California, Berkeley,
[70] K. N. Khasawneh, E. M. Koruyeh, C. Song, D. Evtyushkin, D. Pono- Tech. Rep. UCB/CSD-99-1056, 2016.
marev, and N. Abu-Ghazaleh, “SafeSpec: Banishing the Spectre of a [95] N. Provos, “Improving Host Security with System Call Policies,” in
Meltdown with Leakage-Free Speculation,” in Proceedings of the 56th Proceedings of the 12th USENIX Security Symposium, Washington,
ACM/IEEE Design Automation Conference (DAC’19), 2019. DC, USA, Aug. 2003.
[71] G. Saileshwar and M. K. Qureshi, “CleanupSpec: An “Undo” Approach [96] K. Jain and R. Sekar, “User-Level Infrastructure for System Call
to Safe Speculation,” in Proceedings of the 52nd Annual IEEE/ACM Interposition: A Platform for Intrusion Detection and Confinement,”
International Symposium on Microarchitecture (MICRO-52), 2019. in Proceedings of the 2000 Network and Distributed System Security
[72] P. Li, L. Zhao, R. Hou, L. Zhang, and D. Meng, “Conditional Symposium (NDSS’00), San Diego, California, USA, Feb. 2000.
Speculation: An Effective Approach to Safeguard Out-of-Order Ex- [97] T. Garfinkel, B. Pfaff, and M. Rosenblum, “Ostia: A Delegating
ecution Against Spectre Attacks,” in International Symposium on High Architecture for Secure System Call Interposition,” in Proceedings
Performance Computer Architecture, 2019. of the 2004 Network and Distributed System Security Symposium
[73] F. Liu, Q. Ge, Y. Yarom, F. Mckeen, C. Rozas, G. Heiser, and (NDSS’04), San Diego, California, Feb. 2004.
R. B. Lee, “Catalyst: Defeating last-level cache side channel attacks [98] A. Acharya and M. Raje, “MAPbox: Using Parameterized Behavior
in cloud computing,” in 2016 IEEE International Symposium on High Classes to Confine Untrusted Applications,” in Proceedings of the 9th
Performance Computer Architecture (HPCA), 2016, pp. 406–418. USENIX Security Symposium, Denver, Colorado, USA, Aug. 2000.
[74] Y. Wang, A. Ferraiuolo, D. Zhang, A. C. Myers, and G. E. Suh, [99] A. Alexandrov, P. Kmiec, and K. Schauser, “Consh: Confined Ex-
“SecDCP: Secure Dynamic Cache Partitioning for Efficient Timing ecution Environment for Internet Computations,” The University of
Channel Protection,” in 2016 53nd ACM/EDAC/IEEE Design Automa- California, Santa Barbara, Tech. Rep., 1999.
tion Conference (DAC), 2016. [100] D. S. Peterson, M. Bishop, and R. Pandey, “A Flexible Containment
[75] Elastic, “Elasticsearch: A Distributed RESTful Search Engine,” Mechanism for Executing Untrusted Code,” in Proceedings of the 11th
https://github.com/elastic/elasticsearch, 2019. USENIX Security Symposium, San Francisco, CA, USA, Aug. 2002.
[76] Yahoo!, “Yahoo! Cloud Serving Benchmark,” [101] T. Fraser, L. Badger, and M. Feldman, “Hardening COTS Software
https://github.com/brianfrankcooper/YCSB, 2019. with Generic Software Wrappers,” in Proceedings of the 1999 IEEE
[77] Apache, “ab - Apache HTTP server benchmarking tool,” Symposium on Security and Privacy, 1999.
https://httpd.apache.org/docs/2.4/programs/ab.html, 2019. [102] C. M. Linn, M. Rajagopalan, S. Baker, C. Collberg, S. K. Debray,
[78] SysBench, “Scriptable database and system performance benchmark,” and J. H. Hartman, “Protecting Against Unexpected System Calls,” in
https://github.com/akopytov/sysbench. Proceedings of the 14th USENIX Security Symposium, Baltimore, MD,
[79] Redis, “How fast is Redis?” https://redis.io/topics/benchmarks, 2019. USA, Jul. 2005.
[80] OpenFaaS, “OpenFaaS Sample Functions,” [103] A. Dan, A. Mohindra, R. Ramaswami, and D. Sitara, “ChakraVyuha
https://github.com/openfaas/faas/tree/master/sample-functions. (CV): A Sandbox Operating System Environment for Controlled Exe-
[81] HPCC, “HPC Challenge Benchmark,” https://icl.utk.edu/hpcc/. cution of Alien Code,” IBM Research Division, T.J. Watson Research
[82] UnixBench, “BYTE UNIX benchmark suite,” Center, Tech. Rep. RC 20742 (2/20/97), Feb. 1997.
https://github.com/kdlucas/byte-unixbench. [104] D. P. Ghormley, D. Petrou, S. H. Rodrigues, and T. E. Anderson,
[83] IPCBench, “Benchmarks for Inter-Process-Communication Tech- “SLIC: An Extensibility System for Commodity Operating Systems,”
niques,” https://github.com/goldsborough/ipc-bench. in Proceedings of the Annual Conference on USENIX Annual Technical
[84] L. Lei, J. Sun, K. Sun, C. Shenefiel, R. Ma, Y. Wang, and Q. Li, Conference (USENIX ATC’98), New Orleans, Louisiana, Jun. 1998.
“SPEAKER: Split-Phase Execution of Application Containers,” in [105] E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn, “Parallelizing
Proceedings of the 14th Conference on Detection of Intrusions and Security Checks on Commodity Hardware,” in Proceedings of the 13th
Malware & Vulnerability Assessment (DIMVA’17), Bonn, Germany, International Conference on Architectural Support for Programming
2017. Languages and Operating Systems (ASPLOS XIII), Seattle, WA, USA,
[85] Heroku Engineering Blog, “Applying Seccomp Filters at Runtime for Mar. 2008.
Go Binaries,” https://blog.heroku.com/applying-seccomp-filters-on-go- [106] Y. Oyama, K. Onoue, and A. Yonezawa, “Speculative Security Checks
binaries, Aug. 2018. in Sandboxing Systems,” in Proceedings of the 19th IEEE International
Parallel and Distributed Processing Symposium (IPDPS’05), Denver,
CO, USA, Apr. 2005.
56
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.
[107] H. Chen, X. Wu, L. Yuan, B. Zang, P.-c. Yew, and F. T. Chong, “From Micro, vol. 36, no. 5, pp. 38–49, Jan. 2016.
Speculation to Security: Practical and Efficient Information Flow [111] L. Vilanova, M. Ben-Yehuda, N. Navarro, Y. Etsion, and M. Valero,
Tracking Using Speculative Hardware,” in Proceedings of the 35th “CODOMs: Protecting Software with Code-Centric Memory Do-
Annual International Symposium on Computer Architecture (ISCA’08), mains,” in Proceeding of the 41st Annual International Symposium on
Beijing, China, Jun. 2008. Computer Architecture (ISCA’14), 2014.
[108] J. Woodruff, R. N. Watson, D. Chisnall, S. W. Moore, J. Anderson, [112] L. Vilanova, M. Jordà, N. Navarro, Y. Etsion, and M. Valero, “Direct
B. Davis, B. Laurie, P. G. Neumann, R. Norton, and M. Roe, “The Inter-Process Communication (DIPC): Repurposing the CODOMs Ar-
CHERI Capability Model: Revisiting RISC in an Age of Risk,” in chitecture to Accelerate IPC,” in Proceedings of the 12th European
Proceeding of the 41st Annual International Symposium on Computer Conference on Computer Systems (EuroSys’17), 2017.
Architecture (ISCA’14), 2014. [113] U. Dhawan, C. Hritcu, R. Rubin, N. Vasilakis, S. Chiricescu, J. M.
[109] H. Xia, J. Woodruff, S. Ainsworth, N. W. Filardo, M. Roe, A. Richard- Smith, T. F. Knight, B. C. Pierce, and A. DeHon, “Architectural Support
son, P. Rugg, P. G. Neumann, S. W. Moore, R. N. M. Watson, and for Software-Defined Metadata Processing,” in Proceedings of the 20th
T. M. Jones, “CHERIvoke: Characterising Pointer Revocation Using International Conference on Architectural Support for Programming
CHERI Capabilities for Temporal Memory Safety,” in Proceedings of Languages and Operating Systems (ASPLOS’15), 2015.
the 52nd Annual IEEE/ACM International Symposium on Microarchi- [114] K. Sinha and S. Sethumadhavan, “Practical Memory Safety with
tecture (MICRO-52), 2019. REST,” in Proceedings of the 45th Annual International Symposium
[110] R. N. M. Watson, R. M. Norton, J. Woodruff, S. W. Moore, P. G. on Computer Architecture (ISCA’18), 2018.
Neumann, J. Anderson, D. Chisnall, B. Davis, B. Laurie, M. Roe, [115] H. Sasaki, M. A. Arroyo, M. T. I. Ziad, K. Bhat, K. Sinha, and
N. H. Dave, K. Gudka, A. Joannou, A. T. Markettos, E. Maste, S. J. S. Sethumadhavan, “Practical Byte-Granular Memory Blacklisting
Murdoch, C. Rothwell, S. D. Son, and M. Vadera, “Fast Protection- Using Califorms,” in Proceedings of the 52nd Annual IEEE/ACM
Domain Crossing in the CHERI Capability-System Architecture,” IEEE International Symposium on Microarchitecture (MICRO-52), 2019.
57
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 15,2021 at 21:10:36 UTC from IEEE Xplore. Restrictions apply.