Beyond The PDP-11: Architectural Support For A Memory-Safe C Abstract Machine
Beyond The PDP-11: Architectural Support For A Memory-Safe C Abstract Machine
looking for cases where programs rely on implementation- structure [20]. This may or may not be permitted behav-
defined behavior related to a specific understanding of the ior according to the standard, due to the ambiguous def-
machine’s memory model, and thus would be fragile in the inition of ‘object’. Use of this idiom would break with
presence of different interpretations. When a compiler en- any implementation that associates strict bounds check-
counters undefined behavior, it is free to do whatever it ing with a pointer based on its static type, but not one that
wishes. For example, if the value of a is the result of un- employs the bounds of the original object.
defined behavior, then it is acceptable for the compiler to This is a special case of pointer subtraction, but its
optimize a == b and a != b to the same value. In contrast, use breaks assumptions that we would like to make for
implementation-defined behavior must be self-consistent. pointer bounds: A compiler can statically insert bounds
For example, the value of sizeof(int) is implementation information on a pointer to a structure field, which can be
defined and may be 4, 8, or some other value, but must be enforced by the underlying substrate—preventing some
consistent within a program. categories of pointer error with a fine granularity.
We assume that all of the code that we inspected works
correctly, that all of the idioms that we find are intentional, Sub refers to any arbitrary pointer subtraction. High-level
and that the code depends on them working as expected. We languages commonly lack pointer subtraction, preferring
identified the following idioms: a model where pointers are always references to objects
(a base and a bounds), and accesses to object fields or ar-
Deconst refers to programs that remove the const qualifier ray elements require that the programmer have a pointer
from a pointer. This will break with any implementation to the object. With pointer subtraction, bounds checking
that enforces the const at run time. §6.7.3.4 [16] states: requires an offset as well as the base and bounds and so
If an attempt is made to modify an object defined with bounds-checked pointers are larger.
a const-qualified type through use of an lvalue with non- II refers to computation of invalid intermediate results. The
const-qualified type, the behavior is undefined. C specification allows pointers to point one element after
This means that such removal is permitted unless the the end of an array, but not be dereferenced when point-
object identified by the pointer is declared const, but this ing outside of a valid object. This case refers to pointer
guarantee is very hard to make statically and the removal arithmetic where the end result is within the bounds of an
can violate programmer intent. object, but intermediate results are not. This is undefined
behavior according to the C specification and makes even
We would like to be able to make a const pointer a
conservative garbage collection impossible unless a valid
guarantee that nothing that receives the pointer may write
pointer to the object is guaranteed to also exist.
to the resulting memory. This allows const pointers to be
passed across security-domain boundaries. Without this being expected to work, we could trap as
soon as a pointer went out of range, rather than waiting
Container describes behavior in a macro common in the until it is dereferenced. This would be useful mostly as a
Linux, BSD, and Windows kernels that, given a pointer
to a structure member, returns a pointer to the enclosing
debugging aid, as there are no extra security or reliability The relatively large number of instances of pointer sub-
guarantees from preventing these operations. traction in C code that we observed (Table 1) supports our
Int refers to storing a pointer in an integer variable in anecdotal observation that the lack of this support in the orig-
memory—implementation-defined behavior in C. We ig- inal C implementation for CHERI (described in detail in the
nore variables where the store-load sequence is opti- next section) would be problematic. We have not been able
mized away in calculating these. Disallowing this behav- to find the Last Word pattern by static analysis and thus have
ior makes accurate garbage collection possible, as the not included it. Note that most of the cases of invalid inter-
compiler can statically track every pointer use. It also mediates also involve subtraction; we have predominantly
eliminates a category of error where a value is manip- classified instances as subtraction if the pointers are deref-
ulated as an integer and later interpreted as a pointer. erenced immediately after the subtraction, indicating either
that the code is buggy or the pointer is valid.
IA refers to performing integer arithmetic on pointers— Arbitrary arithmetic on integers that are then cast to
such as storing a pointer in an integer value and then pointers is rare. The majority of the occurrences that we have
performing arbitrary arithmetic on it. This is a more seen are in the malloc() implementation in FreeBSD libc
general case of the Int idiom and relies on the same (jemalloc [13]). This is difficult to avoid, as malloc() is
implementation-defined behavior. It also makes conser- outside of the C abstract machine. The C specification indi-
vative garbage collection impossible, as the garbage col- cates that each block of memory returned by malloc() is an
lector can assume that every integer may be a pointer and object and that it is undefined behavior to use it after calling
still do a reasonable job, but cannot do any collection if free(). This means that, with a strict interpretation of the
pointers can be easily hidden. specification, it is impossible for the code inside free() to
Mask refers to simple masking of pointers. For example, recycle the memory. Similarly, the memory that has not yet
to store some other data in the low bits. This relies on been returned by malloc() is not yet part of the C abstract
pointers having a known representation. Breaking this al- machine. In real implementations, the compiler makes suffi-
lows more efficient pointer representations, for example cient allowances to permit these functions to be implemented
the “low-fat pointers” [18] representation for fat pointers. in C atop some more primitive functionality (mmap() or brk
Wide refers to storing a pointer in an integer variable of a (), which deals with pages of memory.
smaller size. This is undefined according to the C spec- We investigate tcpdump more closely in §5.2, where we
ification, but may work if you are able to guarantee that discuss porting it to two CHERI variants. It is worth noting
pointers are within a certain range, for example by allo- that numerous cases of invalid intermediates involve bounds-
cating memory with malloc and the MAP_32BIT flag. checking code: These are not required at all if the underly-
ing implementation supports bounds checking. We therefore
Code using this idiom is broken by existing implementa-
observe that, for at least some code, modifications to allow
tions, and most likely reflects bugs in the code. We were
compatibility with a restricted memory-safe implementation
surprised to see examples of this in programs that we
of C would simplify the code.
inspected, but fortunately it is sufficiently rare that fix-
ing all of the cases would be easy in these codebases.
This idiom is the result of assuming that sizeof(int) 3. The C abstract machine
== sizeof(void*) or sizeof(int32_t)== sizeof
Having identified a set of idioms that an implementation
(void*). This assumption was true for desktop comput-
must support, we now look at the requirements imposed
ers for a long time, and mobile devices until recently.
by the language. It is possible to implement the C abstract
Had we run the same experiment 15 years ago, we ex-
machine for any Turing-complete target, but that does not
pect we would have seen many more instances of this
mean it is easy or efficient to do so. When considering a low-
assumption. We conclude that C codebases adapt (over
level language like C, there are three important requirements
time) to changes in pointer behavior, and that additional
on a compilation target:
small changes are not impossible to support.
Last Word refers to accessing an object as aligned words Expressiveness: It must be sufficiently expressive to cap-
without regard for the fact that the object’s extent may ture the semantics of the abstract machine.
not include all of the last word. This is used as an op-
Efficiency: The primitive operations in the abstract machine
timization for strlen() in FreeBSD libc. While this is
must map simply to primitive operations in the underly-
undefined behavior in C, it works in systems with page-
ing system, allowing low-overhead implementation.
based memory protection mechanisms, but not in CHERI
where objects have byte granularity. We have found this Exposure: The features of the underlying system must be
idiom only in FreeBSD’s libc, as reported by valgrind. exposed to the programmer. For example, the language
should expose the register types of the underlying archi-
tecture as primitive types.
A language can be described as low level with regard to arithmetic operation to be followed by a branch-on-carry
a particular platform only if it meets these requirements. It with a trap instruction (ud2 on x86) at the destination. The
must be possible to compile the language to run on the target clang variant also permits a handler function to recover from
system, but also to do so without introducing significant the overflow. This was used to implement a prototype of
abstraction layers that hinder performance or hide the details the As-if Infinitely Ranged (AIR) proposal by CERT [11].
of the underlying architecture. Similar work was done later by MIT’s KINT [31] system,
This section discusses the flexibility that the C abstract defining an integer equivalent of the not-a-number (NaN)
machine allows implementers, both with respect to the prim- values found in floating point systems. These efforts have all
itive types used for memory access (integers and pointers), been purely focused on the compiler, and not on extending
and the layout and semantics of memory itself. We take par- the underlying architecture.
ticular note of changes that can be made to enforce mem-
ory safety without compromising the low-level nature of C. 3.1.2 Pointers
The C abstract machine does not include an idea of address In BCPL [24], an ancestor of C, pointers and integers were
translation—intentionally, as not all targets have MMUs— both words: pointers were words that could be derefer-
so, from the perspective of userspace C programs running on enced. In contrast, C was intended to support minicomputer
an operating system, we always mean virtual memory when and mainframe architectures, including segmented memory
we refer to memory as described by the abstract machine. models and microcontrollers with separate integer and ad-
dress registers. The PDP-11 model of C follows closely from
3.1 Primitive types BCPL: pointers are integers, any pointer can be cast to any
The C specification is intentionally vague on the representa- sufficiently large integer type, and results of casting can be
tion of most primitive types, allowing significant variety. used as pointers. Pointer arithmetic is just integer arithmetic.
This behavior is not mandated by the C standard. The
3.1.1 Ranges and representations
abstract machine divides memory into objects: regions of
Before we can consider modifying how pointers are im- memory with an associated type. Pointer arithmetic that ends
plemented in C, we should examine the requirements that outside of the original object is undefined, with the exception
the specification places on the implementations of primi- that pointers may point one element past the end of arrays,
tive types. §5.2.4.2.1 [16] is devoted to the ranges of integer but such a pointer is valid only for comparison, not for deref-
types. It defines a minimum range that each can represent. erencing. §7.20.1.4 [16] defines an optional intptr_t as an
Implementations may support larger ranges, with the excep- integer type that can store a pointer and have the pointer
tion of the char type. C11 requires it to be at least 8 bits, value preserved. There is no guarantee that any arithmetic
but POSIX requires that it is exactly 8 bits, providing a con- on intptr_t will result in a valid pointer and dereferencing
crete practical limit for implementations wishing to run code any such pointer is implementation defined. Pointer compar-
designed for any UNIX-like system. isons between pointers to different objects are undefined, al-
Within this size, the only constraint is that a signed char lowing the implementation to move objects as long as it does
must be able to represent integers between -127 and 127. so atomically with respect to the running code.
Most modern machines opt to use two’s complement arith-
metic, as it simplifies many operations. Two’s complement 3.1.3 The null pointer
representation allows values from -128 to 127 to be repre- As per §6.3.2.3 [16], the integer value 0 has a special mean-
sented. This leaves one value that can be represented, but ing when cast to a pointer: It must be distinct from any valid
which is outside of the range required by the specification. object in the system. This value is relevant (according to the
Most arithmetic operations can overflow. The §6.2.5 [16] specification) only when it is an integer constant expression
defines that overflow of unsigned values should wrap. that evaluates to 0. This distinction is important. For exam-
Signed overflow is undefined and may produce a trap rep- ple, the specification does not require that the following be a
resentation: A value that may trap if used and whose use null pointer:
makes any subsequent behavior undefined.
int i = 0; void * p = ( void *) i ;
These two facts lead to a potential implementation that
allows cheap trapping on overflow in hardware. All signed This distinction is difficult to support in modern compil-
arithmetic that overflowed would be set to this trap value, ers, as discussed in a recent LLVM mailing list thread [1].
and all signed arithmetic operations that took this value and
an input would raise an exception. This would necessitate 3.2 State of the unions
different instructions for signed and unsigned addition, but Footnote 95 in §6.5.2.3 [16] contains the exception to the
there is some precedent for this. MIPS, for example, includes normal aliasing rules:
operations that differ only by whether they trap on overflow. If the member used to read the contents of a union object is
Trapping on signed overflow is not a new idea. Both clang not the same as the member last used to store a value in the
and gcc provide a -ftrapv flag that causes every signed object, the appropriate part of the object representation of
the value is reinterpreted as an object representation in the C. This idiom is now very uncommon, due to its poor inter-
new type...This might be a trap representation. action with pipelined processors.
This requirement is one of the things that makes C useful Relying on the alignment to identify unused bits in a
in low-level contexts: It is possible to subvert the type system pointer is more common. On a 64-bit platform, most values
and interpret memory as different forms. Supporting unions are 8-byte aligned, and so the low 3 bits in a pointer are
can be difficult in environments with guarantees about type zero. It is therefore safe for a program to store information
safety but is important for our exposure requirement. in these bits. The ARMv8 architecture [3] extends this with
a mode guaranteeing that the MMU will not use the top 8
3.3 Code and data memory bits in a virtual address when performing address translation,
C is intended to be usable on microcontrollers with separate allowing them to be used for storing program-specific data.
address spaces for code and data. A numerical pointer value
may be different when interpreted as referring to data and 3.6 Garbage hoarding
code. In particular, a void* may refer to any kind of data, To add full memory safety to C would require temporal as
but is not guaranteed to be able to represent function point- well as spatial safety. We can add spatial safety within the
ers. POSIX breaks this separation by introducing the void confines of the abstract machine by changing the represen-
*dlsym(. . . ) function, used to look up a symbol in a shared tation of pointers to include bounds. Efficiently adding tem-
library. This is beyond the scope of the C language specifi- poral safety is more difficult.
cation, which has no notion of shared libraries, but is very Existing conservative garbage collectors for C, such as
important, e.g., for finding plugin interfaces. Unfortunately, the Boehm-Demers-Weiser collector [5], fail in cases out-
looking up function pointers is a common use for dlsym, lined in §3.5 because they make assumptions about pointer
and is not defined behavior in C. representations. The ability to recover pointers from integers
arithmetic makes accurate garbage collection impossible be-
3.4 Const enforcement cause any integer value may potentially be combined with
The const qualifier indicates that a pointer should not be others to form a valid address.
used to write memory. In typical C implementations, the im- It is not possible to implement a copying or relocating
mutability of const-qualified objects varies. Literal strings garbage collector if it is possible for object addresses to es-
are mapped read-only into the running process. Attempting cape from the collector. The conservative garbage collectors
to modify one will cause a segmentation fault. It is there- that are possible allow garbage objects to be accidentally
fore safe to remove a const qualifier only if you know ex- hoarded by integers that contain valid addresses.
actly where it was inserted. Unfortunately, the C specifica- We believe that efficient implementations of full temporal
tion contains functions such as memchr, which takes a const safety will require C implementations that are valid within
-qualified pointer as the first argument, and returns a (non- the requirements of the C abstract machine, but have unex-
const) pointer derived from it, stripping the const qualifier. pected behavior for much existing code. It is undefined be-
This function signature exists because the C type system havior in C to compare two pointers to different addresses
cannot express the real requirement: that the function has but in spite of this it is common to use addresses for compar-
a contract not to modify the buffer, and that the mutability ison in trees or as input to hashes for hash tables. This works
associated with the returned pointer should be equal to the in current implementations but would break if a collector is
rights that the caller has to the first argument. allowed to modify addresses.
Table 4. Lines of code changed to port from MIPS to CHERIv2 and CHERIv3
Dhrystones (per-second)
30,000
100 CHERIv2
Time (seconds)
Time (seconds)
CHERIv3 100
20,000
50
10,000 50
0 0 0
ort MS
T
eadd ter
MI
PS Iv2 Iv3 MI
PS Iv2 Iv3
Bis Tre erim
e
HE
R
HE
R ER ER
P C C CH CH
The effort required to port C code to a memory-safe envi- crash tcpdump (if not execute arbitrary code) to blind oppo-
ronment depends significantly on the coding style employed. nents is a common strategy in capture the flag uompetitions.
We have observed that the code that could most benefit from The code would benefit substantially from strong memory
a memory-safe substrate (judging from published vulnera- safety, yet written in a style that makes this very difficult.
bilities related to memory errors) is also the code that is Porting tcpdump to CHERIv2 required changes to around
harder to port. This observation is one of the motivations for 1.6K lines to avoid pointer subtraction: around 2.5% of the
allowing unmodified MIPS code in our CHERI implementa- total codebase. CHERIv3 is fully able to support the parts of
tion: it allows us to apply coarse-grained sandboxing to such the C standard expected by these codebases. Only, two lines
code. This means that even if we have to use non memory- of code had to be changed in just one file, to ensure that tcp-
safe code, the damage it can cause can be restricted. dump has read-only access to the packet being parsed rather
The two benchmark suites that we modified are both con- than the whole packet buffer. This change was not strictly
servative in their pointer use. They required changes only to required, but provided stronger and finer-grained protection
annotate some values with __capability, and performed than would otherwise be possible.
little or no unusual pointer arithmetic. In a pure capability We then modified the compiler to support a new ABI in
environment, no annotation would be required. This is fairly which all pointers are implemented as capabiliting, includ-
common in benchmark code. The performance difference ing references to on-stack objects, which are derived from
between them is primarily due to the larger pointers caus- a stack capability. We then compiled two versions of zlib
ing more cache misses. In spite of the relatively small cache using this mode. The first is entirely unmodified internally,
on CHERI (16KB L1, 64KB L2), this overhead is signifi- but requires some annotations in a header to allow it to be
cant only in the Olden benchmarks, which create and walk work with code using the MIPS ABI. This was limited to a
pointer-based data structures. Neither a CPU-focused bench- single pragma at the start and end of the library header, and
mark nor a real application showed significant slowdown. was required because zlib passes pointers across the library
The tcpdump codebase is very different from the bench- interface. This approach breaks binary compatibility for the
marks in terms of porting effort. Packet dissection involves library (though it retains source compatibility), so we also
substantial pointer arithmetic—ironically, frequently in ser- implemented a second version that preserves binary com-
vice of hand-crafted software bounds checking. Unfortu- patibility by copying structures whose layouts have changed
nately, tcpdump typically runs as root (to access raw packet whenever they are passed across the library boundary.
data from a network interface), and is often used for inspect- Figure 4 shows the results for compressing files of vary-
ing suspicious network traffic. This means that its packet ing sizes with the two modified and one unmodified version
parsers—written using extensive pointer arithmetic—are ex- of zlib, linked to the gzip program. Simply using capabili-
posed to malicious data. Crafting malicious packets that ties incurs no measurable overhead for large files and a small
overhead for small files (there is a small constant overhead to differentiate between fully GC’d memory and “pinned”
for setting up the capability environment). Copying data at pointers, which have stable integer values. With better hard-
the library boundaries to maintain binary compatibility in- ware support for protection, we argue that it becomes much
curs around a 21% overhead, independent of file size. easier to implement systems like these, with low-level code
running in the same environment to managed code.
6. Related work
Substantial prior work targets improving C memory safety. 7. Conclusion
Early examples include Cyclone [17], which explicitly broke Lack of memory safety in C leads to real and significant
compatibility with C to define a safer C dialect. Cyclone’s vulnerabilities, particularly as near-ubiquitous networking
abstraction is close to our model, but adds many static anno- for conventional computers and mobile devices has exposed
tations. Although Cyclone was not widely adopted, it influ- users to unprecedented malicious activity. This has reinvigo-
enced pointer annotation in current C compilers. rated interest in fine-grained memory protection to mitigate
The copious buffer overflow vulnerabilities in C code- exploit techniques. However, widespread assumptions about
bases have spurred development of a number of other pointer behavior, unwarranted by the C abstract machine but
bounds-checking systems. Research tools HardBound [12] consistent with the PDP-11 memory model, produce sub-
and SoftBound [22] add fine-grained bounds checking to stantial compatibility problems in real-world programs that
C, followed by commercial work, such as Intel’s Memory might most benefit from additional protection.
Protection Extensions [15]. None of these approaches han- To address these problems, we survey C-pointer idioms
dles some of the complex cases (for example, xor linked across a large open-source corpus to understand the effective
lists) outlined here, the key difference being that the Hard- consensus on pointer use that constrains memory-protection
Bound and SoftBound approaches will fail closed (prevent- schemes. We are then able to describe a range of points in
ing potentially dangerous memory accesses if the can’t track the protection and compatibility tradeoff space.
pointer provenance), whereas the Intel solution will fail open Observing potentially significant source-code compati-
(allowing unsafe access if the bounds can’t be determined). bility, we propose a new model, fully described and im-
Memory-related exploit techniques have spawned sub- plemented as the CHERIv3 ISA, that combines CHERI’s
stantial work in vulnerability scanning and dynamic mitiga- capability-system model with results from recent fat-pointer
tion [27]. Static analysis tools such as lint [14] and general research. Whole-program testing against conventional RISC,
bug pattern-matching tools [4] are effective for local anal- CHERIv2, and CHERIv3 illustrates significantly improved
yses, but are limited by weak type safety, global program source-code compatibility relative to a pure capability-
complexity, and difficult-to-analyze dynamic behaviors. Dy- system model, as well as modest performance improvement.
namic protections such as stack canaries and address-space These results confirm that it is possible to retain the strong
layout randomization offer probabilistic protection for spe- semantics of a capability-system memory model (which pro-
cific exploits [7], but suffer a constant escalation [29]. vides non-bypassable memory protection) without sacrific-
To run untrusted C code within a process that con- ing the advantages of a low-level language.
tains other data rights, SFI [30] and Google’s Native Client
(NaCl) [36] provide coarse-grained isolation for native code.
Similar techniques like Robusta [28] (and earlier work on Acknowledgments
process isolation [10]) isolate almost a million lines of C We thank our colleagues – especially Ross Anderson, Gre-
code libraries from JVM’s. Without isolation or a memory- gory Chadwick, Nirav Dave, Khilan Gudka, Steve Hand,
safe substrate, a single pointer error in the native code can Stephen Kell, Ben Laurie, Ilias Marinos, A Theodore Mar-
damage the entire VM, including the typesafe Java code. kettos, Ed Maste, Andrew W. Moore, Prashanth Mund-
SAFECode and SVA [8, 9] provide control flow integrity kur, Steven J. Murdoch, Robert Norton, Hassen Saidi, Pe-
and memory safety in TCB code, specifically in kernels, via ter Sewell, Stacey Son, and Bjoern Zeeb; we also thank our
compiler transforms. We believe that the run-time overhead anonymous reviewers for their feedback.
of these techniques would be lower with hardware assistance This work is part of the CTSRD and MRC2 projects that
described in this paper, and would provide powerful tools for are sponsored by the Defense Advanced Research Projects
quickly moving existing code to such a substrate. Agency (DARPA) and the Air Force Research Laboratory
Emscripten [37] is an interesting example of an unusual (AFRL), under contracts FA8750-10-C-0237 and FA8750-
deployment target for C, running C code in a JavaScript VM. 11-C-0249. The views, opinions, and/or findings contained
The runtime violates our exposure requirement by creating in this paper are those of the authors and should not be
the C heap in a single JavaScript object and not exposing the interpreted as representing the official views or policies,
JavaScript memory model. Microsoft’s various approaches either expressed or implied, of the Department of Defense or
to running C++ code in the .NET CLR [19] also provide the U.S. Government. We gratefully acknowledge Google,
some inspiration. These introduce different types of pointer Inc. for its sponsorship.
References introduction-to-intel-memory-protection-extensions,
[1] Is address space 1 reserved? URL http://lists.cs.uiuc. July 2013.
edu/pipermail/llvmdev/2015-January/080288.html. [16] ISO. ISO/IEC 9899:2011 Information technology —
Programming languages — C. International Organiza-
[2] Alelph One. Smashing the stack for fun and profit. Phrack
tion for Standardization, Geneva, Switzerland, Dec. 2011.
Magazine, 7:14–16, 1996.
URL http://www.iso.org/iso/iso_catalogue/
[3] ARM Architecture Reference Manual. ARMv8, for ARMv8- catalogue_tc/catalogue_detail.htm?csnumber=
A architecture profile. ARM Limited, 110 Fulbourn Road, 57853.
Cambridge, England CB1 9NJ, 2013.
[17] T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney,
[4] A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, and Y. Wang. Cyclone: A safe dialect of C. In Proceedings
C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few of the General Track of the Annual Conference on USENIX
billion lines of code later: Using static analysis to find bugs Annual Technical Conference, ATEC ’02, pages 275–288,
in the real world. Commun. ACM, 53(2):66–75, Feb. 2010. Berkeley, CA, USA, 2002. USENIX Association. ISBN 1-
ISSN 0001-0782. URL http://doi.acm.org/10.1145/ 880446-00-6. URL http://dl.acm.org/citation.cfm?
1646353.1646374. id=647057.713871.
[5] H.-J. Boehm and M. Weiser. Garbage collection in an unco- [18] A. Kwon, U. Dhawan, J. M. Smith, T. F. Knight, Jr., and
operative environment. Softw. Pract. Exper., 18(9):807–820, A. DeHon. Low-fat pointers: Compact encoding and efficient
Sept. 1988. ISSN 0038-0644. . URL http://dx.doi.org/ gate-level implementation of fat pointers for spatial safety
10.1002/spe.4380180902. and capability-based security. In Proceedings of the 2013
[6] R. Chandra, V. Padmanabhan, and M. Zhang. CRAWDAD ACM SIGSAC Conference on Computer and Communications
data set microsoft/osdi2006 (v. 2007-05-23). Downloaded Security, CCS ’13, pages 721–732, New York, NY, USA,
from http://crawdad.org/microsoft/osdi2006/, May 2007. 2013. ACM. ISBN 978-1-4503-2477-9. . URL http:
//doi.acm.org/10.1145/2508859.2516713.
[7] C. Cowan, P. Wagle, C. Pu, S. Beattie, and J. Walpole. Buffer
overflows: attacks and defenses for the vulnerability of the [19] Managed C++. Managed extensions for C++ specification.
decade. In DARPA Information Survivability Conference and http://msdn.microsoft.com/en-us/library/Aa712867 (accessed
Exposition, 2000. DISCEX ’00. Proceedings, volume 2, pages 2014/07/14).
119–129 vol.2, 2000. . [20] Microsoft Corporation. CONTAINING RECORD macro.
[8] J. Criswell, A. Lenharth, D. Dhurjati, and V. Adve. Secure URL http://msdn.microsoft.com/en-us/library/
virtual architecture: A safe execution environment for com- windows/hardware/ff542043%28v=vs.85%29.aspx.
modity operating systems. In SOSP ’07: Proceedings of the [21] Mitre. CWE/SANS top 25 most dangerous software errors,
Twenty First ACM Symposium on Operating Systems Princi- 2011. URL http://cwe.mitre.org/top25.
ples, October 2007. [22] S. Nagarakatte, J. Zhao, M. M. Martin, and S. Zdancewic.
[9] J. Criswell, N. Geoffray, and V. Adve. Memory safety for Softbound: Highly compatible and complete spatial memory
low-level software/hardware interactions. In Proceedings of safety for C. In Proceedings of the 2009 ACM SIGPLAN
the Eighteenth Usenix Security Symposium, August 2009. Conference on Programming Language Design and Imple-
mentation, PLDI ’09, pages 245–258, New York, NY, USA,
[10] G. Czajkowski, L. Daynes, and M. Wolczko. Automated and
2009. ACM. ISBN 978-1-60558-392-1. . URL http:
portable native code isolation. In Software Reliability Engi-
//doi.acm.org/10.1145/1542476.1542504.
neering, 2001. ISSRE 2001. Proceedings. 12th International
Symposium on, pages 298–307, Nov 2001. . [23] G. C. Necula, S. McPeak, and W. Weimer. Ccured: Type-
safe retrofitting of legacy code. In Proceedings of the 29th
[11] R. Dannenberg, W. Dormann, D. Keaton, R. Seacord, D. Svo-
ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
boda, A. Volkovitsky, T. Wilson, and T. Plum. As-if infinitely
gramming Languages, POPL ’02, pages 128–139, New York,
ranged integer model. In Software Reliability Engineering (IS-
NY, USA, 2002. ACM. ISBN 1-58113-450-9. . URL
SRE), 2010 IEEE 21st International Symposium on, pages 91–
http://doi.acm.org/10.1145/503272.503286.
100, Nov 2010. .
[24] M. Richards. BCPL: A Tool for Compiler Writing and System
[12] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic.
Programming. In Proceedings of the May 14-16, 1969, Spring
Hardbound: Architectural support for spatial safety of the C
Joint Computer Conference, AFIPS ’69 (Spring), pages 557–
programming language. SIGPLAN Not., 43(3):103–114, Mar.
566, New York, NY, USA, 1969. ACM. . URL http:
2008. ISSN 0362-1340. . URL http://doi.acm.org/10.
//doi.acm.org/10.1145/1476793.1476880.
1145/1353536.1346295.
[25] D. Ritchie, S. Johnson, M. Lesk, and B. Kernighan. UNIX
[13] J. Evans. A scalable concurrent malloc(3) implementation for
time-sharing system: The C programming language. Bell
FreeBSD. In BSDCan, 2006.
System Technical Journal, 57(6):1991–2019, July-Aug 1978.
[14] Gimpel Software. FlexeLint for C/C++, August 2014. URL [26] J. Saltzer and M. Schroeder. The protection of information
http://www.gimpel.com/html/flex.htm. in computer systems. Proceedings of the IEEE, 63(9):1278–
[15] Intel Plc. Introduction to Intel R memory protection exten- 1308, September 1975. URL http://www.multicians.
sions. http://software.intel.com/en-us/articles/ org.
[27] H. Shahriar and M. Zulkernine. Mitigating program security ion on Object Oriented Programming Systems Languages and
vulnerabilities: Approaches and challenges. ACM Comput. Applications Companion, SPLASH ’11, pages 301–312, New
Surv., 44(3):11:1–11:46, June 2012. ISSN 0360-0300. URL York, NY, USA, 2011. ACM. ISBN 978-1-4503-0942-4. .
http://doi.acm.org/10.1145/2187671.2187673. URL http://doi.acm.org/10.1145/2048147.2048224.
[28] M. Sun, G. Tan, J. Siefers, B. Zeng, and G. Morrisett. Bringing
Java’s wild native world under control. ACM Trans. Inf. Syst.
Secur., 16(3):9:1–9:28, Dec. 2013. ISSN 1094-9224. . URL
http://doi.acm.org/10.1145/2535505.
[29] L. Szekeres, M. Payer, T. Wei, and D. Song. Eternal war in
memory. In IEEE Symposium on Security and Privacy, 2013.
[30] R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Ef-
ficient software-based fault isolation. In SOSP ’93: Proceed-
ings of the fourteenth ACM Symposium on Operating Systems
Principles, pages 203–216, New York, NY, USA, 1993. ACM.
ISBN 0-89791-632-8.
[31] X. Wang, H. Chen, Z. Jia, N. Zeldovich, and M. F. Kaashoek.
Improving integer security for systems with KINT. In Pro-
ceedings of the 10th USENIX Conference on Operating Sys-
tems Design and Implementation, OSDI’12, pages 163–177,
Berkeley, CA, USA, 2012. USENIX Association. ISBN 978-
1-931971-96-6. URL http://dl.acm.org/citation.
cfm?id=2387880.2387897.
[32] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-
Lezama. Towards optimization-safe systems: Analyzing the
impact of undefined behavior. In Proceedings of the Twenty-
Fourth ACM Symposium on Operating Systems Principles,
SOSP ’13, pages 260–275, New York, NY, USA, 2013. ACM.
ISBN 978-1-4503-2388-8. . URL http://doi.acm.org/
10.1145/2517349.2522728.
[33] R. N. Watson, P. G. Neumann, J. Woodruff, J. Anderson,
D. Chisnall, B. Davis, B. Laurie, S. W. Moore, S. J. Mur-
doch, and M. Roe. Capability Hardware Enhanced RISC In-
structions: CHERI Instruction-set architecture. Technical Re-
port UCAM-CL-TR-850, University of Cambridge, Computer
Laboratory, Apr. 2014. URL http://www.cl.cam.ac.uk/
techreports/UCAM-CL-TR-850.pdf.
[34] R. N. M. Watson, P. G. Neumann, J. Woodruff, J. Ander-
son, D. Chisnall, B. Davis, B. Laurie, S. W. Moore, S. J.
Murdoch, and M. Roe. Capability Hardware Enhanced
RISC Instructions: CHERI Instruction-set architecture. Tech-
nical Report UCAM-CL-TR-864, University of Cambridge,
Computer Laboratory, 15 JJ Thomson Avenue, Cambridge
CB3 0FD, United Kingdom, phone +44 1223 763500, Dec.
2014. URL http://www.cl.cam.ac.uk/techreports/
UCAM-CL-TR-864.pdf.
[35] J. Woodruff, R. N. M. Watson, D. Chisnall, S. W. Moore,
J. Anderson, B. Davis, B. Laurie, P. G. Neumann, R. Norton,
and M. Roe. The CHERI capability model: Revisiting RISC in
an age of risk. In Proceedings of the 41st International Sym-
posium on Computer Architecture (ISCA 2014), June 2014.
[36] B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy,
S. Okasaka, N. Narula, and N. Fullagar. Native client: A
sandbox for portable, untrusted x86 native code. Commun.
ACM, 53(1):91–99, Jan. 2010. ISSN 0001-0782. . URL
http://doi.acm.org/10.1145/1629175.1629203.
[37] A. Zakai. Emscripten: An LLVM-to-JavaScript Compiler. In
Proceedings of the ACM International Conference Compan-