-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN)
, i1xN
and x86amx
#140763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_gcc |
This comment has been minimized.
This comment has been minimized.
x86amx
for i32x256
for AMX intrinsics
x86amx
for i32x256
for AMX intrinsicsx86amx
and i32x256
for AMX intrinsics
This comment has been minimized.
This comment has been minimized.
I think you can use |
That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen. |
Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance |
I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs. @rustbot author |
Reminder, once the PR becomes ready for a review, use |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@sayantn Taking the address of an intrinsic is invalid LLVM IR. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Simplify implementation of Rust intrinsics by using type parameters in the cache The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function. (originally was part of #140763, split off later) `@rustbot` label A-codegen A-LLVM r? codegen
☔ The latest upstream changes (presumably #142259) made this pull request unmergeable. Please resolve the merge conflicts. |
Simplify implementation of Rust intrinsics by using type parameters in the cache The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function. (originally was part of rust-lang/rust#140763, split off later) `@rustbot` label A-codegen A-LLVM r? codegen
Simplify implementation of Rust intrinsics by using type parameters in the cache The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function. (originally was part of rust-lang/rust#140763, split off later) `@rustbot` label A-codegen A-LLVM r? codegen
☔ The latest upstream changes (presumably #142521) made this pull request unmergeable. Please resolve the merge conflicts. |
- Remove redundant bitcasts at callsite
- Correct usage of invalid intrinsics in tests
Simplify implementation of Rust intrinsics by using type parameters in the cache The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function. (originally was part of rust-lang/rust#140763, split off later) `@rustbot` label A-codegen A-LLVM r? codegen
Here are my high level thoughts on what this PR does:
I think to handle something like I also think that this can be split into multiple changes. I think that the intrinsic validation can be implemented independently of the auto-casting bits. |
Yeah, it would be better with a toggle-able warning (I was also facing problems with too many warnings, that's why I put it behind Could you link to some refs on what the redesign will look like, from an API perspective? I can see your reasoning for not adding autocasts, but to actually error on signature mismatches, we need autocasts for structs at least (https://github.com/rust-lang/rust/pull/140763/files#r2128418128). If this issue was resolved, and we don't care about other autocasts, we can probably use Nonetheless, I will open another PR for the AMX, |
Mostly the same, with the caveat that intrinsic have to be constructed from intrinsic ID and either the function type or type overloads. From a string name only via auto-upgrade. It's possible to verify whether a function type is valid for an intrinsic ID, but it's no longer associated with a unique name.
Do you have an example of an intrinsic currently using struct return? But generally, doing the struct auto-cast is fine, as long as it's approach from an angle of "the intrinsic ABI always requires literal structs". So the signature we would be generating on the Rust side would be the correct one in the first place, and the re-packing into a different type is part of the ABI adjustments. |
There are actually quite a few of them in stdarch. One example otoh will be
Will the intrinsic names still be mangled in IR with the overload types? Or is it a more fundamental change so that LLVM will automatically pull out the correct overloading when it sees an intrinsic call? |
The intrinsic names will not be mangled in IR. It will be printed using the intrinsic base name without the mangling suffix (internally it's just an entirely unnamed function, only identified by the intrinsic ID). |
Ah, so all the overloads will be printed with just the base name in IR. This certainly complicates my work, and makes it a lot harder to add autocasts for overloaded intrinsics without going through the IITDesc table, which is a lot of effort. |
This PR changes how LLVM intrinsics are codegen
Explanation of the changes
Current procedure
This is the same for all functions, LLVM intrinsics are not treated specially
f32 (f32)
due to the Rust signaturePros
Cons
-Zverify-llvm-ir
to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.x86amx
type, and (almost) all intrinsics that have vectors ofi1
types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.-Zverify-llvm-ir
won't complain. Eventually it will error out due to the non-existing function (courtesy of the linker). I don't think this is a behavior we want.What this PR does
LLVMIntrinsicGetType
to directly get the function type of the intrinsic from LLVM.Note
This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR
Regardless, the undermentioned functionalities work for all intrinsics
AutoUpgrade
d by LLVM. If not, that means it is an invalid intrinsic, and we error out.Pros
x86amx
and injectingllvm.x86.cast.vector.to.tile
andllvm.x86.cast.tile.to.vector
s in callsite)Note
I don't intend for these bypasses to be permanent (at least the
bf16
andi1
ones, thex86amx
bypass seems inevitable). A better approach will be introducing abf16
type in Rust, and allowingrepr(simd)
withbool
s to get Rust-nativei1xN
s. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, aslink_llvm_intrinsics
is perma-unstable.This PR adds bypasses for
bf16
(viai16
),bf16xN
(viai16xN
),i1xN
(viaiM
, whereM
is the smallest power of 2 s.t.M >= N
, unlessN <= 4
, where we useM = 8
), andx86amx
(via 8192-bit vectors). This will unblock AVX512-VP2INTERSECT, AMX and a lot of bf16 intrinsics in stdarch. This PR also automatically destructures structs if the types don't exactly match (this is required for us to start emitting hard errors on mismmatches).Cons
Possible ways to extend this to overloaded intrinsics (future)
Parse the mangled intrinsic name to get the type parameters
LLVM has a stable mangling of intrinsic names with type parameters (in
LLVMIntrinsicCopyOverloadedName2
), so we can parse the name to get the type parameters, and then just do the same thing.Pros
Cons
TargetExt
types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs, as LLVM adds more mangling to the names.Use the
IITDescriptor
table and the Rust function signatureWe can use the base name to get the
IITDescriptor
s of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.Pros
TargetExt
types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)Cons
llvm.sqrt.bf16
until we havebf16
types in Rust. Because if we are usingu16
s (or any other type) asbf16
s, then the matcher will deduce that the signature isu16 (u16)
notbf16 (bf16)
(which would lead to an error becauseu16
is not a valid type parameter forllvm.sqrt
), even though the intended type parameter is specified in the name.IITDescriptorKind
sThese 2 approaches might give different results for same function. Let's take
The name-based approach will decide that the type parameter is
bf16
, and the LLVM signature isi1 (bf16)
and will inject some bitcasts at callsite.The
IITDescriptor
-based approach will decide that the LLVM signature isi1 (u16)
, and will see that the name given doesn't match the expected name (llvm.is.constant.u16
), and will error out.Other things that this PR does
unadjusted
ABI to facilitate the implementation of AMX (otherwise passing 8192-bit vectors to the intrinsic won't be allowed). This is "safe" because this ABI is only used to link to LLVM intrinsics, and passing vectors of any lengths to LLVM intrinsics is fine, because they don't exist in machine level.bitcast
s incg_llvm/builder::check_call
(now renamed ascast_arguments
due to its new counterpartcast_return
). This was old code from when Rust used to pass non-erased lifetimes to LLVM.Reviews are welcome, as this is my first time actually contributing to
rustc
After CI is green, we would need a try build and a rustc-perf run.
@rustbot label T-compiler A-codegen A-LLVM
r? codegen