49600 – probe-stack=inline-asm will produce invalid uwtables

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49600 - probe-stack=inline-asm will produce invalid uwtables

Summary: probe-stack=inline-asm will produce invalid uwtables

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-03-15 17:53 PDT by simonas+llvm.org
Modified:	2021-03-15 17:56 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description simonas+llvm.org 2021-03-15 17:53:21 PDT

Given a function as such

; RUN: llc < %s
define void @big_stack() "probe-stack"="inline-asm" uwtable {
start:
  %_two_page_stack = alloca [8192 x i8], align 1
  ret void
}

the following assembly will be generated:

big_stack:
	.cfi_startproc
	subq	$4096, %rsp
	movq	$0, (%rsp)
	subq	$3968, %rsp
	.cfi_def_cfa_offset 8072
	addq	$8064, %rsp
	.cfi_def_cfa_offset 8
	retq


Here the unwind tables are not accurate while stack probing is ongoing – the `rsp` is adjusted, but not the `cfa_offsets`. And so attempts to obtain a stack trace will fail if the current instruction is somewhere in between the instructions implementing the stack probing.

This also occurs with the non-unrolled implementation of the stack probing:

; RUN: llc < %s
define void @big_stack() "probe-stack"="inline-asm" uwtable {
start:
  %_two_page_stack = alloca [64000 x i8], align 1
  ret void
}

--->

big_stack:
	.cfi_startproc
	movq	%rsp, %r11
	subq	$61440, %r11
.LBB0_1:
	subq	$4096, %rsp
	movq	$0, (%rsp)
	cmpq	%r11, %rsp
	jne	.LBB0_1
	subq	$2432, %rsp
	.cfi_def_cfa_offset 63880
	addq	$63872, %rsp
	.cfi_def_cfa_offset 8
	retq

however in the loop case the solution needs to involve allocation of a separate register as insertion of `.cfi` directives in a loop won't help in any way.

Comment 1 simonas+llvm.org 2021-03-15 17:56:11 PDT

The correct assembly for the unrolled case would probably look a lot like this:

big_stack:
	.cfi_startproc
	subq	$4096, %rsp
	.cfi_def_cfa_offset 4096
	movq	$0, (%rsp)
	subq	$3968, %rsp
	.cfi_def_cfa_offset 8072
	addq	$8064, %rsp
	.cfi_def_cfa_offset 8
	retq

or an equivalent using `.cfi_adjust_cfa_offset` directives.