Constant-Time Coding Support
Summary
We (@kumarak, @frabert, @hbrodin, @wizardengineer, and myself of Trail of Bits) propose a Clang “constant-time selection” builtin for cryptographers to use to ensure that their compiled C and C++ code selects between values in constant time consistently across target architectures. Our builtin will selectively bypass optimizations that are beneficial for most compiled code but that can replace intended constant-time operations with variable-time jumps or branching, contrary to cryptographers’ needs and expectations. We would love feedback on our approach.
Motivation
An attacker who finds interesting variable-time control flow in compiled code can repeatedly time it to learn when it processes sensitive values, and to potentially even learn what those values are. Rather than using the ternary operator as is typical in non-cryptographic source code, a cryptographic library developer writing a selection between two values based on some condition often uses a bitwise recipe intended to protect this data from timing leaks.
mask = -(cond);
result = (mask & a) | (~mask & b);
Ideally, this recipe would ensure the resulting compiled selection between a
and b
based on cond
lacks variable-time target instructions like branching, jumps, or secret-dependent memory accesses, because variable-time instructions can expose sensitive data to any attacker who can take timing measurements. But recipes like this not only obfuscate code for non-cryptographers, they also cannot bypass all current and potential future LLVM IR and backend optimizations, which means their use does not always result in code that will execute in constant time.
Recent work [Geimer 2025, Schneider 2024] shows that early iterations of the InstCombine
pass (which runs during the -O1
opt pipeline) replace the constant-time selection recipe’s mask creation and bitwise operations with IR select
. Then, for x86-64 (for instance) select
IR instructions may initially be lowered to conditional move (cmovcc
) target instructions. This is good, since cmov
family instructions run in constant time. But backend optimizations like the x86-cmov-conversion
pass then replace cmovcc
with a conditional jump or branch, which means the pass output is no longer constant-time.
The source code developer can use verification tools [crocs-muni] to identify introduced variable-time instructions like these to try to prevent their use. But, the preventative measures the developer can take generally amount to: either writing raw constant-time assembly directly, which is not portable; or learning and using an academic language [Bacelar Almeida 2017, Cauligi 2017] that produces unportable assembly or produces LLVM bitcode that is vulnerable to the same backend optimizations as code otherwise compiled with Clang; or turning off core LLVM optimizations entirely [Geimer 2025], which is generally impractical since operations that should be constant-time normally run in the context of code that otherwise benefits from optimization.
Threat model
An attacker can remotely monitor time-sensitive code execution on some host over a network connection. They can choose their own program inputs. They can take multiple measurements, enabling them to overcome any noise or network jitter introduced by their remote position. A local attacker has all the remote attacker’s capabilities and can additionally run their own code on the same host where the code being timed runs. This capability also enables the local attacker to take high-precision measurements directly on the host. Both types of attackers (local, remote) are in scope for this work [Pornin 2025]. Types of timing attacks other than those that exploit variable-time branching are out of scope.
Examples
These are some uses of the constant-time selection recipe in C and C++ where, if one existed today, using a constant-time Clang builtin instead could prevent variable execution timings.
uint64_t constant_time_lookup(const size_t secret_idx,
const uint64_t table[16]) {
uint64_t result = 0;
for (size_t i = 0; i < 8; i++) {
const bool cond = i == secret_idx;
const uint64_t mask = (-(int64_t)cond);
result |= table[i] & mask;
}
return result;
}
constant_time_lookup:
xor eax, eax
xor ecx, ecx
jmp .LBB0_1
.LBB0_3:
mov rdx, qword ptr [rsi + 8*rcx]
.LBB0_4:
or rax, rdx
inc rcx
cmp rcx, 8
je .LBB0_5
.LBB0_1:
cmp rdi, rcx
je .LBB0_3
xor edx, edx
jmp .LBB0_4
.LBB0_5:
ret
Example 1: C code (top) reproduced from [Sprenkels 2019] and the corresponding assembly (bottom) from Compiler Explorer using Clang (20.1.0) for x86-64 with -O1
. The C code was written to run without a dependency on secret_idx
. Unfortunately, the resulting x86-64 asm includes a conditional jump based on this secret value (cmp rdi,rcx
followed by je .LBB0_3
) and a secret-dependent memory access (mov rdx, qword ptr [rsi + 8*rcx]
). This means it is vulnerable to timing attacks. Compiler Explorer link here.
void cmovznz4(uint64_t cin, uint64_t *x, uint64_t *y, uint64_t *r) {
uint64_t mask = ~FStar_UInt64_eq_mask(cin, (uint64_t)0U);
uint64_t r0 = (y[0U] & mask) | (x[0U] & ~mask);
uint64_t r1 = (y[1U] & mask) | (x[1U] & ~mask);
uint64_t r2 = (y[2U] & mask) | (x[2U] & ~mask);
uint64_t r3 = (y[3U] & mask) | (x[3U] & ~mask);
r[0U] = r0;
r[1U] = r1;
r[2U] = r2;
r[3U] = r3;
}
cmovznz4:
mv a5, a1
beqz a0, .LBB0_2
mv a5, a2
.LBB0_2:
beqz a0, .LBB0_5
addi a6, a2, 8
bnez a0, .LBB0_6
.LBB0_4:
addi a4, a1, 16
j .LBB0_7
.LBB0_5:
addi a6, a1, 8
beqz a0, .LBB0_4
.LBB0_6:
addi a4, a2, 16
.LBB0_7:
ld a7, 0(a5)
ld a5, 0(a6)
ld a6, 0(a4)
beqz a0, .LBB0_9
addi a1, a2, 24
j .LBB0_10
.LBB0_9:
addi a1, a1, 24
.LBB0_10:
ld a0, 0(a1)
sd a7, 0(a3)
sd a5, 8(a3)
sd a6, 16(a3)
sd a0, 24(a3)
ret
Example 2: A C function intended to be constant-time, reproduced from the appendix of [Schneider 2024] (top). The corresponding RISC-V rv64gc assembly (bottom) was produced in Compiler Explorer with Clang (trunk) with -O1
. This asm runs in variable time, since it includes branching dependent on the values of inputs to cmovznz4
. Compiler Explorer link here.
#define SIZE 256
#define CONSTANT 1665
void expand_secure(int16_t r[SIZE], const uint8_t msg[32]) {
unsigned int i,j;
int16_t mask;
for(i=0; i < SIZE/8; i++) {
for(j=0; j < 8; j++) {
mask = -(int16_t)((msg[i] >> j)&1);
r[8*i+j] = mask & CONSTANT;
}
}
}
expand_secure:
xor eax, eax
jmp .LBB0_1
.LBB0_5:
inc rax
add rdi, 16
cmp rax, 32
je .LBB0_6
.LBB0_1:
xor ecx, ecx
jmp .LBB0_2
.LBB0_4:
mov word ptr [rdi + 2*rcx], dx
inc rcx
cmp rcx, 8
je .LBB0_5
.LBB0_2:
movzx r8d, byte ptr [rsi + rax]
xor edx, edx
bt r8d, ecx
jae .LBB0_4
mov edx, 1665
jmp .LBB0_4
.LBB0_6:
ret
Example 3: C code intended to be constant-time, reproduced from [Purnal 2024] (top). The resulting assembly (bottom) produced in Compiler Explorer with Clang (trunk) for x86-64 with -O1
includes an input-dependent variable-time sequence of bt
, jae
, mov
. Compiler Explorer link here.
Technical Approach
Proposed work
Our paired Clang built-in and LLVM intrinsic will enable the source developer to bypass any optimizations that do not respect the constant-time selection recipe. __builtin_ct_select(cond, a, b)
will call a co-developed LLVM intrinsic llvm.ct.select(cond, a, b)
so that optimizations that would otherwise introduce variable-time, data-dependent control flow can be bypassed just for the selection. The intrinsic will then introduce a ctselect
pseudoinstruction. Later IR transformations or target instruction transformations (of machine IR) will recognize our pseudo-instruction and emit constant-time lowerings for it.
A previous effort around ten years ago [Simon 2018] implemented a similarly motivated Clang builtin for constant-time selection, but supported it only in the x86-64 backend. Unfortunately, this earlier work was neither proposed to this community for feedback, nor upstreamed. We also observed that this work does not persist the constant-timeliness property through modern x86-64 backend optimizations to the final x86-64 output [Simon 2017]. Closing this gap, when such instructions are available, each backend that sees our ctselect
pseudo-instruction will lower it using cmovcc
or the target-appropriate equivalent (like csel
for ARMv8+), or will use constant-time bitwise target instructions for lowering otherwise.
Example (pseudocode) usage
Before
mask = -(cond);
result = (a & mask) | (b & ~mask);
Proposed After
#define HAS_CT_SELECT __has_builtin(__builtin_ct_select)
#if HAS_CT_SELECT
#define CTSELECT(mask, a, b) __builtin_ct_select((mask),(a),(b))
#else
#define CTSELECT(mask, a, b) ((a) & (mask)) | (b) & ~(mask))
#endif
result = CTSELECT(mask, a, b);
Limitations
This work is scoped to address only branching-related timing attacks. This means that:
- There are (as previously mentioned) more types of side-channel attacks based on timing caching-related operations and similar that this work cannot mitigate.
- This work is not intended to, and in fact cannot, prevent hardware-level security issues like SPECTRE or Rowhammer.
Constant-time selection alone cannot handle all constant-time cryptographic coding needs (e.g., division). This means that further constant-time functionality either independent of this work or built on top of __builtin_ct_select
may need to be added in the future.
Our initial implementation does not use ARM DIT or Intel DOIT. Not every ARM target supports DIT, and not every Intel target supports DOIT. Adding support for DIT and DOIT to the ARM and x86 LLVM backends would require further significant changes and discussion. Moreover, enabling DOIT/DIT features requires OS-level privileges, e.g., to write to the related MSR (model-specific register) on x86. We think support for these instructions could be added at the source library level, outside LLVM, once our initial implementation is in place.
Open Implementation Questions
Avoiding node merging
Generating machine instructions that are guaranteed not to be subject to further merging via peephole optimizations is a requirement in order to maintain the constant-time guarantees we are describing here. Currently, the implementation we are sketching uses custom emission logic that takes advantage of the instruction lowering APIs to do this:
MachineInstr *MI = BuildMI(…, TII->get(AArch64::CSELWr), …);
MI->setFlag(MachineInstr::NoMerge);
We’d love to hear about what possible methods might exist to obtain the same effect today via regular instruction selection patterns. We sketched below roughly what we are thinking in a pseudocode TableGen target definition containing an example “NoMerge
” annotation to prevent further node merging when the annotation is included:
def : Pat<
(AArch64ctselect GPR32:$tval, GPR32:$fval, (i32 imm:$cc), NZCV),
(CSELWr !!NoMerge!! GPR32:$tval, GPR32:$fval, (i32 imm:$cc)) >;
This method would provide a way to specify target-specific lowerings for the new pseudoinstruction without the need to write custom expansion logic. However, the issue of how to safely handle the intrinsic for targets for which the pseudoinstruction does not have custom logic yet remains. In an ideal world, we’d be able to handle this at the SelectionDAG level by writing generic expansion logic that turns ctselect
into a tree of bitwise operations, tagged with something similar to the NoMerge
example tag. The idea would be to make sure that regular patterns cannot match against nodes tagged with this flag, and that this flag will then be maintained in the generated machine instructions. Such a mechanism does not currently exist in LLVM to the best of our knowledge, and would probably be a large architectural change. Any feedback on possible alternatives is welcome.
Fallback support
After implementing our core mechanism initially for at least the x86 and ARM backends, we plan to expand backend support to more architectures e.g., AArch64, MIPS, RISC-V. Since a fail-open strategy could result in unintuitive behaviour for the source developer, we currently plan to fail closed, meaning that any backend that does not yet implement ctselect
but receives input source that includes the builtin would fail to compile it and produce an error. The source developer could use has-builtin to check for ctselect
, in order to provide an alternative implementation at the source level for cases when ctselect
is not yet supported. We would appreciate thoughts on whether fail-closed or something else is most suitable here.
ARM Thumb support
While a source code developer may use the -mthumb
command-line flag or specify a target triple that includes Thumb to force Thumb instruction generation, the Clang default at present is to generate A32 (ARM) assembly only, unless the source developer has otherwise used the Thumb attribute in their code. For these reasons, for now we plan to implement ARM target lowering for our intrinsic only for A32. We would also like input on whether Thumb-mode support would be useful before we add it.
Future Work
According to [Bernstein 2024], constant-time boolean selection alone cannot support all cryptographic coding needs. Once we have implemented __builtin_ct_select
we will additionally publish a source library of constant-time helpers that use our selection builtin; this source-level work will also demonstrate how to correctly use the builtin in source code.
Beyond the initial implementation, we also see several promising directions for extending this work:
Further builtins
If there is appetite, we would propose extending our work at the LLVM and Clang level with builtins for more constant-time recipes [Aumasson 2019, Intel 2022].
Further languages
For example, Rust could also benefit from our implementation. This would provide Rust’s cryptographic ecosystem, including projects like RustCrypto and ring, with the same constant-time guarantees without extensive implementation effort. The modular nature of this work means that any language targeting LLVM IR could potentially reuse parts of our implementation like the intrinsic llvm.ct.select
.
Further architectures
Beyond the architectures currently targeted (x86-64, ARMv7, AArch64, RISC-V, MIPS-32), our approach naturally extends to other LLVM backends like WebAssembly (WASM), where constant-time guarantees are especially challenging due to the abstract nature of the execution environment and the variety of runtime implementations.