I filed this bug in May. I’m reposting it here on Discourse because my fix landed and @efriedma-quic wanted to discuss it.
The following example usage of coverage + HWASan + LTO will fail:
$ cat test.cc
__attribute__((weak)) bool foo = false;
__attribute__((weak)) void bar() {}
int main() {
if (foo) bar();
}
$ clang test.cc -O3 --target=aarch64-linux-android30 -flto -fsanitize=hwaddress -coverage --sysroot=$NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot --gcc-toolchain=$NDK/toolchains/llvm/prebuilt/linux-x86_64
ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x10): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9079256848778895360 is not in [-4294967296, 4294967295]; references section '.rodata..L.hwasan'
>>> referenced by ld-temp.o
ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x3c): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 8935141660703096832 is not in [-4294967296, 4294967295]; references section '.data..L__llvm_gcov_ctr.hwasan'
>>> referenced by ld-temp.o
ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x64): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9007199254741024768 is not in [-4294967296, 4294967295]; references section '.bss..L__llvm_gcov_ctr.1.hwasan'
>>> referenced by ld-temp.o
ld.lld: error: a.out.lto.o:(function __llvm_gcov_reset: .text.__llvm_gcov_reset+0x0): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 8935141660703096832 is not in [-4294967296, 4294967295]; references section '.data..L__llvm_gcov_ctr.hwasan'
>>> referenced by ld-temp.o
ld.lld: error: a.out.lto.o:(function __llvm_gcov_reset: .text.__llvm_gcov_reset+0xc): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9007199254741024768 is not in [-4294967296, 4294967295]; references section '.bss..L__llvm_gcov_ctr.1.hwasan'
>>> referenced by ld-temp.o
The cause is that the compiler-generated functions __llvm_gcov_writeout and __llvm_gcov_reset do not have target-features=+tagged-globals attributes. These functions are created by Function::createWithDefaultAttr. The general form of this problem is known, and as such there’s already an attempt in that function to copy certain attributes from the module flags onto the function. But for target features there’s no module flag since the whole point of target features is that they’re per-function.
I feel like this is probably a bug in Function::createWithDefaultAttr (which GCOV uses to create its compiler generated functions). It should probably set the target-features attribute to a list that it reads from the LLVMContext, or something like that (which Clang would fill in before running the pass pipeline). As things are, these generated functions won’t respect things like -ffixed-x* flags either. That would imply that Function::createWithDefaultAttr shouldn’t be used during LTO or in the backend. If looks like that’s mostly already the case except for a couple of the GPU backends that are calling this function to create constructors.
I implemented my proposal in CodeGen, IR: Add target-{cpu,features} attributes to functions created via createWithDefaultAttr(). by pcc · Pull Request #96721 · llvm/llvm-project · GitHub . Eli posted a comment disagreeing with the approach, and I’ll reply to him below:
This seems like an easy way to unintentionally pass state between different compilations.
Yes, that’s possible. It seems unlikely to me though because the code that prepares for each module would presumably set the fields beforehand. To avoid retaining state we could use an RAII object active during the compilation (Clang already does this for the diagnostic handler).
It seems very easy to try to use this API during LTO, and have it do nothing.
Yes, this is a potential concern. I also considered making this API fail if there was not an attribute setting in the LLVMContext, but this would cause the existing usage in the GPU backends to break. Another possibility is that we let the GPU backend users opt out of the RAII object requirement for now with a separate API call.
I’m pretty sure this breaks existing workflows involving AMDGPUCtorDtorLowering/NVPTXCtorDtorLowering.
I don’t see how it breaks anything. The behavior after my patch with LTO is the same: with or without my patch, we don’t set target-cpu/target-features. The behavior after my patch without LTO is “better” than it was before because now we set target-cpu/target-features on the functions (but Clang would have constructed the TargetMachine with the same default CPU/features as those on the LLVMContext, so there wouldn’t have been a user-visible behavior change at least for Clang).
And please don’t merge changes to core IR datastructures less than an hour after you post the patch. Please revert and start a Discourse discussion so we can discuss the right direction here.
I’m not sure if we should revert because that would reintroduce the bug, which I hope we can agree is a bug that ought to be fixed. The issue and the approach that I was considering were posted on the issue tracker for almost a month because I was hoping to get feedback on the approach before implementing anything, but unfortunately nobody replied. The time between when the patch was posted and landed doesn’t seem important because I implemented more or less what I said I would implement a month ago, so people had plenty of time to make an objection.
From a user’s perspective it seems better to have a compiler with the bug fixed (with a perhaps imperfect internal implementation) than a compiler with the bug. Any improvements that we might make to our internal API as a result of this discussion can be made when we agree on them.
Off the top of my head, it might make sense to pass the relevant attributes as arguments to the constructor for the relevant IR passes.
I considered passing information to the individual passes, but that seems like it would mean more API churn every time we want to pass these attributes to a new pass. Also, these attributes are not really associated with the passes but more with the module being built, so a context object seems like the best place to store it.