Functions generated by Function::createWithDefaultAttr should respect -target-features

I filed this bug in May. I’m reposting it here on Discourse because my fix landed and @efriedma-quic wanted to discuss it.


The following example usage of coverage + HWASan + LTO will fail:

$ cat test.cc
__attribute__((weak)) bool foo = false;

__attribute__((weak)) void bar() {}

int main() {
  if (foo) bar();
}
$ clang test.cc -O3 --target=aarch64-linux-android30 -flto -fsanitize=hwaddress -coverage --sysroot=$NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot --gcc-toolchain=$NDK/toolchains/llvm/prebuilt/linux-x86_64
ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x10): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9079256848778895360 is not in [-4294967296, 4294967295]; references section '.rodata..L.hwasan'
>>> referenced by ld-temp.o

ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x3c): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 8935141660703096832 is not in [-4294967296, 4294967295]; references section '.data..L__llvm_gcov_ctr.hwasan'
>>> referenced by ld-temp.o

ld.lld: error: a.out.lto.o:(function __llvm_gcov_writeout: .text.__llvm_gcov_writeout+0x64): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9007199254741024768 is not in [-4294967296, 4294967295]; references section '.bss..L__llvm_gcov_ctr.1.hwasan'
>>> referenced by ld-temp.o

ld.lld: error: a.out.lto.o:(function __llvm_gcov_reset: .text.__llvm_gcov_reset+0x0): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 8935141660703096832 is not in [-4294967296, 4294967295]; references section '.data..L__llvm_gcov_ctr.hwasan'
>>> referenced by ld-temp.o

ld.lld: error: a.out.lto.o:(function __llvm_gcov_reset: .text.__llvm_gcov_reset+0xc): relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 9007199254741024768 is not in [-4294967296, 4294967295]; references section '.bss..L__llvm_gcov_ctr.1.hwasan'
>>> referenced by ld-temp.o

The cause is that the compiler-generated functions __llvm_gcov_writeout and __llvm_gcov_reset do not have target-features=+tagged-globals attributes. These functions are created by Function::createWithDefaultAttr. The general form of this problem is known, and as such there’s already an attempt in that function to copy certain attributes from the module flags onto the function. But for target features there’s no module flag since the whole point of target features is that they’re per-function.

I feel like this is probably a bug in Function::createWithDefaultAttr (which GCOV uses to create its compiler generated functions). It should probably set the target-features attribute to a list that it reads from the LLVMContext, or something like that (which Clang would fill in before running the pass pipeline). As things are, these generated functions won’t respect things like -ffixed-x* flags either. That would imply that Function::createWithDefaultAttr shouldn’t be used during LTO or in the backend. If looks like that’s mostly already the case except for a couple of the GPU backends that are calling this function to create constructors.


I implemented my proposal in CodeGen, IR: Add target-{cpu,features} attributes to functions created via createWithDefaultAttr(). by pcc · Pull Request #96721 · llvm/llvm-project · GitHub . Eli posted a comment disagreeing with the approach, and I’ll reply to him below:

This seems like an easy way to unintentionally pass state between different compilations.

Yes, that’s possible. It seems unlikely to me though because the code that prepares for each module would presumably set the fields beforehand. To avoid retaining state we could use an RAII object active during the compilation (Clang already does this for the diagnostic handler).

It seems very easy to try to use this API during LTO, and have it do nothing.

Yes, this is a potential concern. I also considered making this API fail if there was not an attribute setting in the LLVMContext, but this would cause the existing usage in the GPU backends to break. Another possibility is that we let the GPU backend users opt out of the RAII object requirement for now with a separate API call.

I’m pretty sure this breaks existing workflows involving AMDGPUCtorDtorLowering/NVPTXCtorDtorLowering.

I don’t see how it breaks anything. The behavior after my patch with LTO is the same: with or without my patch, we don’t set target-cpu/target-features. The behavior after my patch without LTO is “better” than it was before because now we set target-cpu/target-features on the functions (but Clang would have constructed the TargetMachine with the same default CPU/features as those on the LLVMContext, so there wouldn’t have been a user-visible behavior change at least for Clang).

And please don’t merge changes to core IR datastructures less than an hour after you post the patch. Please revert and start a Discourse discussion so we can discuss the right direction here.

I’m not sure if we should revert because that would reintroduce the bug, which I hope we can agree is a bug that ought to be fixed. The issue and the approach that I was considering were posted on the issue tracker for almost a month because I was hoping to get feedback on the approach before implementing anything, but unfortunately nobody replied. The time between when the patch was posted and landed doesn’t seem important because I implemented more or less what I said I would implement a month ago, so people had plenty of time to make an objection.

From a user’s perspective it seems better to have a compiler with the bug fixed (with a perhaps imperfect internal implementation) than a compiler with the bug. Any improvements that we might make to our internal API as a result of this discussion can be made when we agree on them.

Off the top of my head, it might make sense to pass the relevant attributes as arguments to the constructor for the relevant IR passes.

I considered passing information to the individual passes, but that seems like it would mean more API churn every time we want to pass these attributes to a new pass. Also, these attributes are not really associated with the passes but more with the module being built, so a context object seems like the best place to store it.

1 Like

Is there some reason why we don’t carry default target features on the module? I think Eli is objecting to storing this data on the LLVMContext, and putting it on the Module would resolve his concerns.

There’s a general expectation that what goes onto the module is serialized. But in this case we specifically don’t want to serialize it. I’d be fine with putting it on the module but we’d need to make the serialization aspect clear.

Well, I think the motivation for not serializing it is to better support seamless full LTO module merging. I guess that means we want some kind of module flag that gets dropped during a module merge, but then any post-LTO instrumentation pass will lack a useful set of default target feature flags, which is the problem you’re solving in the non-LTO case. Hm.

A strawman solution might be to use a module flag with ModFlagBehaviorFirstVal to just pick the first set of target attributes, similar to the way the linker typically picks a prevailing symbol.

I guess that means we want some kind of module flag that gets dropped during a module merge, but then any post-LTO instrumentation pass will lack a useful set of default target feature flags, which is the problem you’re solving in the non-LTO case. Hm.

A strawman solution might be to use a module flag with ModFlagBehaviorFirstVal to just pick the first set of target attributes, similar to the way the linker typically picks a prevailing symbol.

This may work but it would require some care and cooperation from the backend to separate features like “this function must use different instructions to take the address of a global” and “this function must not touch this GPR” from features like “this function is allowed to use the fancy new vector instructions”. Call them “must” and “may” features. Only the “must” features would be on the module flag. Then I think we would then be able to choose any set of features in any compilation unit via something like ModFlagBehaviorFirstVal because of the general requirement that the entire program must comply with the “must” features.

But this seems subtle and difficult to get right. It also has the downside that we could end up miscompiling a program on a technicality just because of one translation unit that didn’t pass the correct flags (and maybe without LTO it wouldn’t have mattered that the flags were wrong because the module didn’t contain any code, or the code was dynamically unreachable etc).

A simpler and easier to understand policy would be to say that LTO passes do not have access to a set of default target feature flags. This implies that it shouldn’t be possible to ask for the flags at LTO time, either by construction of the API design or because asking for them causes an assertion failure or some other error. So if a pass needs to create “new” functions and the compilation flags cannot be determined from context, the pass would need to be moved to compile time. For example, an outliner pass may copy the flags from the caller, but it would be a bug to create a initializer function at LTO time (at least in the general case; maybe those GPU backend passes are fine because those backends don’t have “must” subtarget features, and in that case the backend could pass in a known good set of features).

This could be implemented with a module flag with a new behavior that gets dropped on serialization/merging but that seems like unnecessary additional complexity compared to having plain old non-serialized string fields on the Module or LLVMContext, which straightforwardly implements the behavior that we want in a way that would be obvious from reading the code.

Oh, for the non-target-features attributes, it maintains the existing behavior of deriving it from module-level flags. So now half the attributes come from module flags (LLVM Language Reference Manual — LLVM 19.0.0git documentation) and the other half come from the transient LLVMContext flags.

From my perspective, I didn’t see any of the discussion until the pull request, so it seemed quick. The issue tracker is too high traffic for anyone to read everything.

I guess this doesn’t actively break anything, so leaving it pending the conclusion of the discussion is okay.


Part of making sure people working on developing LLVM passes are productive is making things serializable. I understand that serialization in any form adds some complexity, but the benefit to developer productivity makes it worth it even in contexts where that serialization isn’t actually used by clang normally. Being able to write an “opt” command that mimics parts of what clang does is generally a good thing.

There are basically two approaches to allow this: some form of module metadata, or making them parameters to the passes. I don’t have a strong preference for which one is better. I don’t want to have a new kind of parameter for passes that’s passed differently just because that’s slightly more convenient.

Part of making sure people working on developing LLVM passes are productive is making things serializable. I understand that serialization in any form adds some complexity, but the benefit to developer productivity makes it worth it even in contexts where that serialization isn’t actually used by clang normally. Being able to write an “opt” command that mimics parts of what clang does is generally a good thing.

That makes sense. I generally avoid “opt” when working on passes and use the pipeline set up by Clang instead (mostly because I don’t want to manually write out the pipeline command), but I imagine that some people might prefer to use “opt” and we shouldn’t create a special case for them.

So I think we can agree on this:

  • Add default-target-cpu and default-target-features module flags.
  • Add a new module flag behavior that causes the flag to get dropped by the IR linker, and have Clang use that for the module flags.

[RISCV] Add canonical ISA string as Module metadata in IR. by topperc · Pull Request #80760 · llvm/llvm-project · GitHub (and the rest of that thread) seems relevant

So that RISC-V PR seems like it’s about collecting certain information from subtarget attributes and emitting it into the object file, either into the e_flags field of the ELF or a RISC-V attribute section, by taking the union of all of the subtarget attributes. I imagine this is emulating what a linker would do when linking object files together.

This is solving a different problem to ours: they were trying to figure out how the backend should prepare object file information for functions compiled with different subtarget attributes, while we’re trying to figure out how the compiler-hosted passes should access the singular default subtarget attributes specifically without letting the backend access them because it makes no sense to ask for the singular default subtarget attributes in that context.

In the RISC-V case, there doesn’t seem to be a reason to store the information in module flags, though. Instead, the object file emitter could enumerate the functions in the module, collect the functions’ subtarget attributes and use the union of the subtarget attributes to emit the information into the object file.

Maybe my suggestion wouldn’t work, though? In that thread, @topperc said:

We can’t just once since it might have been compiler with an attribute likes target_verson.

I think there might have been a word or two missing, so I don’t understand what this is trying to say.

The problem here is that we allow programs to do runtime CPU checks. So the minimum set of features required to run a given function might be different from the minimum set of features required to run a program which contains that function.

To make that work, the set of features marked in the module needs to come exclusively from command-line flags, not frontend attributes on a specific function.


In any case, it’s not quite the same thing as the set of target features for sanitizer constructors: some “target features” refer to things that aren’t actually architectural features.

The problem here is that we allow programs to do runtime CPU checks. So the minimum set of features required to run a given function might be different from the minimum set of features required to run a program which contains that function.

To make that work, the set of features marked in the module needs to come exclusively from command-line flags, not frontend attributes on a specific function.

Doesn’t that assume that users consistently use target attributes for runtime CPU check guarded functions? I’ve seen a combination of that and building the translation units containing the guarded functions with -mcpu or -march. Of course, in the latter case the object file attributes will end up wrong.

If we want these object file attributes to be more useful, I think a different approach is necessary. For example, the attributes could be derived from the subtarget features of the main function and functions in .init_array, because those functions will be called unconditionally if the binary is run.

But if we want to make the assumption for now that target attributes are used consistently, then I suppose it would be valid to use a module flag to convey the information. The module flag could perhaps just directly contain the information that would go into the object file (e.g. the e_flags bitmask), so there’s less temptation to use it for a different purpose.

In any case, it’s not quite the same thing as the set of target features for sanitizer constructors: some “target features” refer to things that aren’t actually architectural features.

Indeed, I cited a couple of examples in my first message. I think that is a point in favor of only encoding in module flags what will actually get written to the object file.

I looked at implementing this using module flags. It ended up being significantly more complicated than implementing it using global named metadata, so I used a global named metadata node instead. I posted the WIP patch here for feedback: