Originally reported at https://github.com/rust-lang/rust/pull/57268. Running define ptx_kernel i32 @func1(i32 %a, i32 %b) { %x = add i32 %a, %b %y = add i32 %x, %b ret i32 %y } define ptx_kernel i32 @func2(i32 %a, i32 %b) { %x = add i32 %a, %b %y = add i32 %x, %b ret i32 %y } through opt -mergefunc results in define ptx_kernel i32 @func1(i32 %a, i32 %b) { %x = add i32 %a, %b %y = add i32 %x, %b ret i32 %y } define ptx_kernel i32 @func2(i32, i32) { %3 = tail call ptx_kernel i32 @func1(i32 %0, i32 %1) ret i32 %3 } However, while LLVM does not validate this an error, calling a ptx_kernel function from another ptx_kernel function is not legal and will error while assembling. Leaving aside the question of whether merging functions on the NVPTX target makes sense, the correct way to merge these functions is probably the same as for interposable functions: define i32 @0(i32 %a, i32 %b) { %x = add i32 %a, %b %y = add i32 %x, %b ret i32 %y } define ptx_kernel i32 @func1(i32, i32) { %3 = tail call i32 @0(i32 %0, i32 %1) ret i32 %y } define ptx_kernel i32 @func2(i32, i32) { %3 = tail call i32 @0(i32 %0, i32 %1) ret i32 %3 } Apart from ptx_kernel, there are probably other cases that are affected by this as well (I'd imagine amdgpu_kernel and spir_kernel may have similar restrictions.)