LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 40232 - [MergeFuncs] Thunk calls may be invalid under calling convention
Summary: [MergeFuncs] Thunk calls may be invalid under calling convention
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Interprocedural Optimizations (show other bugs)
Version: trunk
Hardware: PC All
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-05 02:07 PST by Nikita Popov
Modified: 2019-01-18 11:43 PST (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nikita Popov 2019-01-05 02:07:08 PST
Originally reported at https://github.com/rust-lang/rust/pull/57268.

Running

define ptx_kernel i32 @func1(i32 %a, i32 %b) {
  %x = add i32 %a, %b
  %y = add i32 %x, %b
  ret i32 %y
}
define ptx_kernel i32 @func2(i32 %a, i32 %b) {
  %x = add i32 %a, %b
  %y = add i32 %x, %b
  ret i32 %y
}

through opt -mergefunc results in

define ptx_kernel i32 @func1(i32 %a, i32 %b) {
  %x = add i32 %a, %b
  %y = add i32 %x, %b
  ret i32 %y
}

define ptx_kernel i32 @func2(i32, i32) {
  %3 = tail call ptx_kernel i32 @func1(i32 %0, i32 %1)
  ret i32 %3
}

However, while LLVM does not validate this an error, calling a ptx_kernel function from another ptx_kernel function is not legal and will error while assembling.

Leaving aside the question of whether merging functions on the NVPTX target makes sense, the correct way to merge these functions is probably the same as for interposable functions:

define i32 @0(i32 %a, i32 %b) {
  %x = add i32 %a, %b
  %y = add i32 %x, %b
  ret i32 %y
}

define ptx_kernel i32 @func1(i32, i32) {
  %3 = tail call i32 @0(i32 %0, i32 %1)
  ret i32 %y
}

define ptx_kernel i32 @func2(i32, i32) {
  %3 = tail call i32 @0(i32 %0, i32 %1)
  ret i32 %3
}

Apart from ptx_kernel, there are probably other cases that are affected by this as well (I'd imagine amdgpu_kernel and spir_kernel may have similar restrictions.)