Background
We are experimenting with CFI in Fuchsia’s kernel using the normal suite of CFI schemes. We’ve run into a couple of issues around function which can be grouped into one of these categories:
- We jump from one kernel module to another but the destination address isn’t a normal function pointer known in LTO’d code. CFI will trap during these calls because we’re effectively jumping to an arbitrary address outside the current module.
- We indirectly call functions from a previous kernel module in the current kernel module. CFI will also trap when calling these functions because they are defined outside the current module.
- We compare addresses of functions defined outside the LTO-unit. These comparisons fail because CFI unconditionally replaces direct references to each function with its corresponding entry in the CFI jump table, but there’s no guaranteed order between different jump table entries.
extern "C" void region_start(); // Defined outside the LTO unit.
extern "C" void region_end(); // Defined outside the LTO unit.
void check(uintptr_t addr) {
assert((uintptr_t)region_start <= addr && addr < (uintptr_t)region_end);
}
None of these are indicative of actual bugs since these are instances where CFI can’t possibly know about these functions, but current workarounds for each of these aren’t very desirable.
Existing Solutions
[[no_sanitize(“cfi”)]]
on functions that would invoke indirect calls
That is, we have a large function body that does a handful of indirect calls and we suppress them by disabling CFI for the whole function. This is not very desirable because it omits checks for places where we would want CFI checks. If we ever move around the code that would do indirect calls, we’d need to also double check that this attribute would be attached to the new functions.
-fno-sanitize-cfi-canonical-jump-tables
This flag ensures that references to functions defined inside the LTO-unit refer to the real function definition rather than the CFI jump table. However, this doesn’t do the reciprocal that we want for issue (3). That is, references in the LTO-unit to functions outside the LTO-unit still refer to the jump table.
function_nocfi
This is a macro used by Linux that expands to asm for directly getting the address of a function. It works and addresses issue (3) but isn’t very clean and requires an implementation for each unique arch. Linux maintainers weren’t very interested in this solution either.
__builtin_function_start
This was originally added for the ClangBuiltLinux project as an alternative to function_nocfi. Wrapping a direct reference to a function with this ensures we can get the address of the function definition rather than the jump table entry. This is a cleaner approach that helps address issue (3) but doesn’t address issues (1) or (2).
Wrapper Class
This would involve having something akin to std::function
that would just have [[no_sanitize(“cfi”)]]
on its Call
method. This addresses issues (1) and (2) but it would be nicer if there was a compiler-based approach that would do this rather than having a downstream solution used by one project.
Proposed Attribute + Semantics
We would like to propose a new attribute called no_cfi
that is only applied to function types. The new attribute would have the following semantics:
- Indirect calls to a function type with this attribute will not be instrumented with CFI. That is, the indirect call will not be checked. Note that this only changes the behavior for indirect calls on pointers to function types having this attribute. It does not prevent all indirect function calls for a given type from being checked.
- All direct references to a function whose type has this attribute will always reference the true function definition rather than an entry in the CFI jump table.
- When a pointer to a function with this attribute is implicitly cast to a pointer to a function without this attribute, the compiler will give a warning saying this attribute is discarded. This warning can be silenced with an explicit C-style cast or C++
static_cast
.
Having an attribute should allow us to only make changes on appropriate function signatures and function pointer types.
Example Usage
#define NO_CFI __attribute__((no_cfi))
struct Args {
void (NO_CFI *no_cfi_func)();
};
// Module 1
void local_func() {}
// CFI is still enabled for the whole function but the `handoff` invocation is not checked.
void do_handoff(void (NO_CFI *unchecked_handoff)(struct Args *)) {
Args args = {&local_func};
...
unchecked_handoff(&args);
}
// Module 2 - We came here from the call to `unchecked_handoff`.
void handoff_entry(struct Args *args) {
args->no_cfi_func(); // No CFI check here. We can safely call `local_func` from Module 1.
...
void (*func)() = args->no_cfi_func; // warning: Cast discards `no_cfi` attribute.
}
extern "C" void NO_CFI region_start(); // Defined outside the LTO unit.
extern "C" void NO_CFI region_end(); // Defined outside the LTO unit.
void check(uintptr_t addr) {
// References to these functions are not instrumented and refer to the real function definition.
assert((uintptr_t)region_start <= addr && addr < (uintptr_t)region_end);
}
Implementation
I think implementation should be straightforward. On the AST-level, we track any indirect calls to a function whose type has this attribute and simply ignore CFI instrumentation at that callsite. Inside CodeGenFunction::EmitCall
, we do a check against the CallExpr
function type. There’s a block that emits the type test on an indirect call. We essentially just don’t enter this block if the CallExpr
type has this attribute. No unique IR emission should be needed here and all future passes which would depend on this intrinsic will ignore this callsite. If we ever get the address of a symbol (via a DeclRefExpr
) and that symbol is a function type with this attribute, we can wrap it with a no_cfi constant, then LowerTypeTestsModule::replaceCfiUses
will ignore this reference.