Given a function as such ; RUN: llc < %s define void @big_stack() "probe-stack"="inline-asm" uwtable { start: %_two_page_stack = alloca [8192 x i8], align 1 ret void } the following assembly will be generated: big_stack: .cfi_startproc subq $4096, %rsp movq $0, (%rsp) subq $3968, %rsp .cfi_def_cfa_offset 8072 addq $8064, %rsp .cfi_def_cfa_offset 8 retq Here the unwind tables are not accurate while stack probing is ongoing – the `rsp` is adjusted, but not the `cfa_offsets`. And so attempts to obtain a stack trace will fail if the current instruction is somewhere in between the instructions implementing the stack probing. This also occurs with the non-unrolled implementation of the stack probing: ; RUN: llc < %s define void @big_stack() "probe-stack"="inline-asm" uwtable { start: %_two_page_stack = alloca [64000 x i8], align 1 ret void } ---> big_stack: .cfi_startproc movq %rsp, %r11 subq $61440, %r11 .LBB0_1: subq $4096, %rsp movq $0, (%rsp) cmpq %r11, %rsp jne .LBB0_1 subq $2432, %rsp .cfi_def_cfa_offset 63880 addq $63872, %rsp .cfi_def_cfa_offset 8 retq however in the loop case the solution needs to involve allocation of a separate register as insertion of `.cfi` directives in a loop won't help in any way.
The correct assembly for the unrolled case would probably look a lot like this: big_stack: .cfi_startproc subq $4096, %rsp .cfi_def_cfa_offset 4096 movq $0, (%rsp) subq $3968, %rsp .cfi_def_cfa_offset 8072 addq $8064, %rsp .cfi_def_cfa_offset 8 retq or an equivalent using `.cfi_adjust_cfa_offset` directives.