-
Notifications
You must be signed in to change notification settings - Fork 289
Prepare portable packed vector types for RFCs #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stdsimd/mod.rs
Outdated
@@ -240,101 +241,6 @@ | |||
/// we'll be using SSE4.1 features to implement hex encoding. | |||
/// | |||
/// ``` | |||
/// #![feature(cfg_target_feature, target_feature, stdsimd)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea about what happened here. Did rustfmt delete all of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it look like rustfmt
just deleted all of this (the fmt
commit equals the previous one + cargo fmt --all
). @nrc does this look familiar? It looks like rustfmt
deleted a chunk of a comment :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome, thanks @gnzlbg!
coresimd/ppvt/api/bool_vectors.rs
Outdated
|
||
#[cfg_attr(feature = "cargo-clippy", allow(expl_impl_clone_on_copy))] | ||
impl Clone for $id { | ||
#[inline] // currently needed for correctness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't think this should be necessary any more w/ the changes in upstream rust-lang/rust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests passed without this, I just saw that the types in the x86
module were still doing this and decided to add it for consistency. I can remove them there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I think at this point in time they can all move to #[derive(Clone)]
impl $id { | ||
/// Lane-wise addition of the vector elements. | ||
#[inline(always)] | ||
pub fn add(self) -> $elem_ty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these reductions I'd personally only expect add
and mul
to be here (but called sum
and product
to avoid shadowing Add::add
and Mul::mul
). The sub
, div
, and rem
reductions seem odd, although perhaps someone's requested them before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So llvm only provides llvm.experimental.vector.reduce.{add, fadd, mul, fmul, and, or, xor, smax, smin, umax, umin}
, which means sub
, div
, and rem
cannot really be implemented here any better than in a third party crate. I provided them for completeness, but I think it makes sense to leave them out . Writing tests for them felt weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I've renamed add/mul to sum/product (just like the Iterator methods) and removed the sub/div/rem reductions.
($id:ident, $elem_ty:ident) => { | ||
impl $id { | ||
/// Lane-wise bitwise `and` of the vector elements. | ||
pub fn and(self) -> $elem_ty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of like the div/rem reductions above are we sure these make sense to add as well? There's certainly nothing wrong with them they just seem a little odd I think in terms of functionality.
I think we'll also want to perhaps select different names to avoid conflcits with Or::or
and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh in the meantime though I think we'll want #[inline]
on these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all these 3 are provided by llvm. They are necessary to implement the reductions of boolean vectors (all
,any
,none
), but maybe we shouldn't provide them for the integer types (llvm supports that though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaving these here for now since I can just map them directly to the llvm intrinsics. We might just decide to never stabilize these and expose them only for boolean vectors via all
,any
,none
.
coresimd/ppvt/api/arithmetic_ops.rs
Outdated
} | ||
|
||
impl ops::AddAssign for $id { | ||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW technically #[inline(always)]
isn't needed for anything, but there's also not much harm vs #[inline]
I think for such small methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this and recalled that with ThinLTO #[inline(always)]
should be necessary anymore, but then looked at what the types in the x86
module were doing (and the vector types before that), and they were all still using #[inline(always)]
.
I think we should do a pass through the library and see if we can replace most of the #[inline(always)]
attributes with just #[inline]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filled #340 for this.
coresimd/ppvt/api/fmt.rs
Outdated
if i > 0 { | ||
write!(f, ", ")?; | ||
} | ||
write!(f, "{:#x}", self.extract(i))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here if you do self.extract(i).fmt(f)
it'll automatically forward formatting flags like #
which means we may not need to hardcode the #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
coresimd/ppvt/api/hash.rs
Outdated
} | ||
unsafe { | ||
let mut bytes: A = mem::uninitialized(); | ||
self.store_aligned_unchecked(&mut bytes.data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function be simplified to:
A { vec: *self }.data.hash(state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try.
I should fill an issue to discuss the semantics of this, it basically just has the vector as a slice of bytes. This is a bit differently of how Hash
works for arrays, where first the length is hashes, and then each element is hashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stdsimd/mod.rs
Outdated
//! slots[1] = hex(*byte & 0xf); | ||
//! } | ||
//! } | ||
//! ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all actually intended to be the rustdoc documentation for the arch
module (which will show up in libstd's docs soon), mind leaving it on the arch
module instead of the stdsimd
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I got a complaint about the stdsimd
module not having any documentation and thought this was the other way around, I'll change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@alexcrichton so I was able to implement most of the reductions on top of |
Hm I wonder if the added tests are stressing out rustc a bit much? Travis looks like it's timing out quite a lot :( |
Looks good to me to merge modulo CI |
@alexcrichton yes build times have doubled. I could split the simd types into their own crate, and import it from coresimd. |
Hm splitting crates I don't think will be feasible due to integration into libstd, do you know why this takes so long to compile? |
I think it's because of the tests (compiling |
I am also unable to do re-compilations of the crate without doing a |
This commit cleans up the implementation of the Portable Packed Vector Types (PPTV), adds some new features, and makes some breaking changes. The implementation is moved to `coresimd/src/ppvt` (they are still exposed via `coresimd::simd`). As before, the vector types of a certain width are implemented in the `v{width}` submodules. The `macros.rs` file has been rewritten as an `api` module that exposes the macros to implement each API. It should now hopefully be really clear where each API is implemented, and which types implement these APIs. It should also now be really clear which APIs are tested and how. - boolean vectors of the form `b{element_size}x{number_of_lanes}`. - reductions: arithmetic, bitwise, min/max, and boolean - only the facade, and a naive working implementation. These need to be implemented as `llvm.experimental.vector.reduction.{...}` but this needs rustc support first. - FromBits trait analogous to `{f32,f64}::from_bits` that perform "safe" transmutes. Instead of writing `From::from`/`x.into()` (see below for breaking changes) now you write `FromBits::from_bits`/`x.into_bits()`. - portable vector types implement `Default` and `Hash` - tests for all portable vector types and all portable operations (~2000 new tests). - (hopefully) comprehensive implementation of bitwise transmutes and lane-wise casts (before `From` and the `.as_...` methods where implemented "when they were needed". - documentation for PPTV (not great yet, but better than nothing) - conversions/transmutes from/to x86 architecture specific vector types - `store/load` API has been replaced with `{store,load}_{aligned,unaligned}` - `eq,ne,lt,le,gt,ge` APIs now return boolean vectors - The `.as_{...}` methods have been removed. Lane-wise casts are now performed by `From`. - `From` now perform casts (see above). It used to perform bitwise transmutes. - `simd` vectors' `replace` method's result is now `#[must_use]`.
Why are boolean vectors represented as integer vectors containing 0 or -1? I know that some popular platforms don't have native vectors with one bit elements but why does that impact the portable types? |
@rkruppe Those are the values that the simd comparison instructions return on at least The API of boolean vector doesn't expose these values (it translate these values to/from In any case, we should document this, since this is relevant for those explicitly calling |
I know, that's what I was alluding to, but again, why does that have to impact the portable types? This is like saying i64 arithmetic doesn't exist on 32 bit targets so the portable
That is not true. Clang does lower C-language vector compares to |
I'll give this a try. |
@rkruppe so I tried to use bools, but got a "SIMD vector element type should be machine type" error: https://play.rust-lang.org/?gist=a0331d2eb68fec6c5e32b8a49356cd8d&version=nightly |
Come to think of it, I actually doubt there is a portable way to expose Probably best to use a memory layout compatible with |
Cargo.toml
Outdated
lto = false | ||
debug-assertions = true | ||
codegen-units = 1 | ||
panic = 'unwind' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: should be a newline at the end of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert these changes before merging. These profiles were added only to test if setting codegen-units to 1 would improve either compiletimes or remove some issues of incremental compilation.
@rkruppe the question is, should we only expose From the POV of portable operations exposing From the POV of the architecture specific intrinsics, some of them return "boolean vectors" of a larger width stored in either integer or floating point registers. I don't know if |
Could you elaborate on this? (or are you on IRC?). My question is: why does this matter? |
@alexcrichton so this should be good to go modulo compile-times. |
Ok thanks! I'd like to dig into this a bit first and investigate compile times a bit, I'll do that now. |
Hm ok there may have been a recent rustc regression that was fixed, in any case looks like it's not too slow now, and yeah it's all almost entirely tests which we can of course move around later if need be. Thanks again @gnzlbg! |
This commit cleans up the implementation of the Portable Packed SIMD Vectors
(PPSV), adds some new features, and makes some breaking changes.
The implementation is moved to
coresimd/ppsv
(they arestill exposed via
coresimd::simd
).As before, the vector types of a certain width are implemented in the
v{width}
submodules. The
macros.rs
file has been rewritten as anapi
module thatexposes the macros to implement each API.
It should now hopefully be really clear where each API is implemented, and which types
implement these APIs. It should also now be really clear which APIs are tested and how.
Additions
b{element_size}x{number_of_lanes}
.llvm.experimental.vector.reduction.{...}
modulo bugs.{f32,f64}::from_bits
that perform "safe" transmutes.Instead of writing
From::from
/x.into()
(see below for breaking changes) now one writesFromBits::from_bits(x)
/x.into_bits()
.Default
andHash
cargo test
a lot, and increases the memory requirements...).casts (before
From
and the.as_...
methods where implemented "when they were needed").Breaking changes
{store,load}{,_unchecked}
API has been replaced with{store,load}_{aligned,unaligned}{,_unchecked}
eq,ne,lt,le,gt,ge
APIs now return boolean vectors.as_{...}
methods have been removed. Lane-wise casts are now performed viaFrom
/Into
.From/Into
traits now perform lane-wise casts (see above). Previously they used to perform bitwise transmutes.simd
vectors'replace
method's result is now#[must_use]
; executingreplace
and dropping the result is an easy to make error.