Skip to content

Prepare portable packed vector types for RFCs #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Mar 5, 2018

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Mar 2, 2018

This commit cleans up the implementation of the Portable Packed SIMD Vectors
(PPSV), adds some new features, and makes some breaking changes.

The implementation is moved to coresimd/ppsv (they are
still exposed via coresimd::simd).

As before, the vector types of a certain width are implemented in the v{width}
submodules. The macros.rs file has been rewritten as an api module that
exposes the macros to implement each API.

It should now hopefully be really clear where each API is implemented, and which types
implement these APIs. It should also now be really clear which APIs are tested and how.

Additions

  • boolean vectors of the form b{element_size}x{number_of_lanes}.
  • reductions: arithmetic (sum, product), bitwise (and, or, xor), min/max, and boolean (all, any, none) - mainly implemented via llvm.experimental.vector.reduction.{...} modulo bugs.
  • FromBits trait analogous to {f32,f64}::from_bits that perform "safe" transmutes.
    Instead of writing From::from/x.into() (see below for breaking changes) now one writes
    FromBits::from_bits(x)/x.into_bits().
  • portable vector types implement Default and Hash
  • tests for all portable vector types and all portable operations (~2000 new tests; this does hurts compile-time of cargo test a lot, and increases the memory requirements...).
  • (hopefully) comprehensive implementation of bitwise transmutes and lane-wise
    casts (before From and the .as_... methods where implemented "when they were needed").
  • documentation for PPTV (not great yet, but better than nothing)
  • conversions/transmutes from/to x86 architecture specific vector types

Breaking changes

  • {store,load}{,_unchecked} API has been replaced with {store,load}_{aligned,unaligned}{,_unchecked}
  • eq,ne,lt,le,gt,ge APIs now return boolean vectors
  • The .as_{...} methods have been removed. Lane-wise casts are now performed via From/Into.
  • From/Into traits now perform lane-wise casts (see above). Previously they used to perform bitwise transmutes.
  • simd vectors' replace method's result is now #[must_use]; executing replace and dropping the result is an easy to make error.

stdsimd/mod.rs Outdated
@@ -240,101 +241,6 @@
/// we'll be using SSE4.1 features to implement hex encoding.
///
/// ```
/// #![feature(cfg_target_feature, target_feature, stdsimd)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about what happened here. Did rustfmt delete all of this?

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it look like rustfmt just deleted all of this (the fmt commit equals the previous one + cargo fmt --all). @nrc does this look familiar? It looks like rustfmt deleted a chunk of a comment :/

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, thanks @gnzlbg!


#[cfg_attr(feature = "cargo-clippy", allow(expl_impl_clone_on_copy))]
impl Clone for $id {
#[inline] // currently needed for correctness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think this should be necessary any more w/ the changes in upstream rust-lang/rust

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests passed without this, I just saw that the types in the x86 module were still doing this and decided to add it for consistency. I can remove them there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I think at this point in time they can all move to #[derive(Clone)]

impl $id {
/// Lane-wise addition of the vector elements.
#[inline(always)]
pub fn add(self) -> $elem_ty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these reductions I'd personally only expect add and mul to be here (but called sum and product to avoid shadowing Add::add and Mul::mul). The sub, div, and rem reductions seem odd, although perhaps someone's requested them before?

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So llvm only provides llvm.experimental.vector.reduce.{add, fadd, mul, fmul, and, or, xor, smax, smin, umax, umin}, which means sub, div, and rem cannot really be implemented here any better than in a third party crate. I provided them for completeness, but I think it makes sense to leave them out . Writing tests for them felt weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've renamed add/mul to sum/product (just like the Iterator methods) and removed the sub/div/rem reductions.

($id:ident, $elem_ty:ident) => {
impl $id {
/// Lane-wise bitwise `and` of the vector elements.
pub fn and(self) -> $elem_ty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of like the div/rem reductions above are we sure these make sense to add as well? There's certainly nothing wrong with them they just seem a little odd I think in terms of functionality.

I think we'll also want to perhaps select different names to avoid conflcits with Or::or and such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh in the meantime though I think we'll want #[inline] on these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all these 3 are provided by llvm. They are necessary to implement the reductions of boolean vectors (all,any,none), but maybe we shouldn't provide them for the integer types (llvm supports that though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaving these here for now since I can just map them directly to the llvm intrinsics. We might just decide to never stabilize these and expose them only for boolean vectors via all,any,none.

}

impl ops::AddAssign for $id {
#[inline(always)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW technically #[inline(always)] isn't needed for anything, but there's also not much harm vs #[inline] I think for such small methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and recalled that with ThinLTO #[inline(always)] should be necessary anymore, but then looked at what the types in the x86 module were doing (and the vector types before that), and they were all still using #[inline(always)].

I think we should do a pass through the library and see if we can replace most of the #[inline(always)] attributes with just #[inline].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filled #340 for this.

if i > 0 {
write!(f, ", ")?;
}
write!(f, "{:#x}", self.extract(i))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here if you do self.extract(i).fmt(f) it'll automatically forward formatting flags like # which means we may not need to hardcode the #

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
unsafe {
let mut bytes: A = mem::uninitialized();
self.store_aligned_unchecked(&mut bytes.data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function be simplified to:

A { vec: *self }.data.hash(state)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try.

I should fill an issue to discuss the semantics of this, it basically just has the vector as a slice of bytes. This is a bit differently of how Hash works for arrays, where first the length is hashes, and then each element is hashed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stdsimd/mod.rs Outdated
//! slots[1] = hex(*byte & 0xf);
//! }
//! }
//! ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was all actually intended to be the rustdoc documentation for the arch module (which will show up in libstd's docs soon), mind leaving it on the arch module instead of the stdsimd module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I got a complaint about the stdsimd module not having any documentation and thought this was the other way around, I'll change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 3, 2018

@alexcrichton so I was able to implement most of the reductions on top of llvm.experimental.vector.reduction.{...}. However, for floating-point vector types, sum and product produce code-gen errors with everything I've tried (passing 0. as $elem_ty and mem::uninitialized() as an accumulator).

@alexcrichton
Copy link
Member

Hm I wonder if the added tests are stressing out rustc a bit much? Travis looks like it's timing out quite a lot :(

@alexcrichton
Copy link
Member

Looks good to me to merge modulo CI

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 4, 2018

@alexcrichton yes build times have doubled. I could split the simd types into their own crate, and import it from coresimd.

@alexcrichton
Copy link
Member

Hm splitting crates I don't think will be feasible due to integration into libstd, do you know why this takes so long to compile?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

I think it's because of the tests (compiling coresimd without tests is slower than before, but still pretty quick).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

I am also unable to do re-compilations of the crate without doing a cargo clean first. Doing a cargo test, one line change, cargo test again, makes the second cargo test require huge amounts of memory and HDD space..

gnzlbg added 19 commits March 5, 2018 11:26
This commit cleans up the implementation of the Portable Packed Vector Types
(PPTV), adds some new features, and makes some breaking changes.

The implementation is moved to `coresimd/src/ppvt` (they are
still exposed via `coresimd::simd`).

As before, the vector types of a certain width are implemented in the `v{width}`
submodules. The `macros.rs` file has been rewritten as an `api` module that
exposes the macros to implement each API.

It should now hopefully be really clear where each API is implemented, and which types
implement these APIs. It should also now be really clear which APIs are tested and how.

- boolean vectors of the form `b{element_size}x{number_of_lanes}`.
- reductions: arithmetic, bitwise, min/max, and boolean - only the facade,
  and a naive working implementation. These need to be implemented
  as `llvm.experimental.vector.reduction.{...}` but this needs rustc support first.
- FromBits trait analogous to `{f32,f64}::from_bits` that perform "safe" transmutes.
  Instead of writing `From::from`/`x.into()` (see below for breaking changes) now you write
  `FromBits::from_bits`/`x.into_bits()`.
- portable vector types implement `Default` and `Hash`
- tests for all portable vector types and all portable operations (~2000 new tests).
- (hopefully) comprehensive implementation of bitwise transmutes and lane-wise
  casts (before `From` and the `.as_...` methods where implemented "when they were needed".
- documentation for PPTV (not great yet, but better than nothing)
- conversions/transmutes from/to x86 architecture specific vector types

- `store/load` API has been replaced with `{store,load}_{aligned,unaligned}`
- `eq,ne,lt,le,gt,ge` APIs now return boolean vectors
- The `.as_{...}` methods have been removed. Lane-wise casts are now performed by `From`.
- `From` now perform casts (see above). It used to perform bitwise transmutes.
- `simd` vectors' `replace` method's result is now `#[must_use]`.
@hanna-kruppe
Copy link

Why are boolean vectors represented as integer vectors containing 0 or -1? I know that some popular platforms don't have native vectors with one bit elements but why does that impact the portable types?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

@rkruppe Those are the values that the simd comparison instructions return on at least x86 to represent true (0xFFF...) and false (0x000...), and those are the values that the portable vector comparisons (and the reductions) of LLVM also return. Since the boolean vectors are currently implemented as vectors of iX, !(0 as iX) and 0 as iX is what they use.

The API of boolean vector doesn't expose these values (it translate these values to/from bool), and it also doesn't expose any conversions nor bitwise transmutes from other vector types.

In any case, we should document this, since this is relevant for those explicitly calling mem::transmute, which is something one might want to do to transmute the result of an intrinsic returning a comparison into a boolean vector type.

@hanna-kruppe
Copy link

Those are the values that the simd comparison instructions return on at least x86

I know, that's what I was alluding to, but again, why does that have to impact the portable types? This is like saying i64 arithmetic doesn't exist on 32 bit targets so the portable i64xN types should be implemented in terms of i32xM for M = 2 N. When a target doesn't support some portable vector type, the type should be legalized by the backend. (Sometimes that doesn't work in practice, like for i128 on some targets, but I know that i1 vectors can be legalized on x86 at least.)

and those are the values that the portable vector comparisons (and the reductions) of LLVM also return.

That is not true. icmp eq <N x i32> returns <N x i1>, for example. I haven't checked reductions but since they are overloaded, in principle they should also work with i1.

Clang does lower C-language vector compares to icmp (returning an i1 vector) + sext so that the end result is an integer vector containing 0 and -1, but this is a front end decision (inherited from GCC), not anything inherent about LLVM IR.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

That is not true. icmp eq returns , for example. I haven't checked reductions but since they are overloaded, in principle they should also work with i1.

I'll give this a try.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

@rkruppe so I tried to use bools, but got a "SIMD vector element type should be machine type" error: https://play.rust-lang.org/?gist=a0331d2eb68fec6c5e32b8a49356cd8d&version=nightly

@hanna-kruppe
Copy link

Come to think of it, I actually doubt there is a portable way to expose <N x i1> to Rust -- because Rust types can be stored in memory, but <N x i1> is stored either like <N x i8> (one byte per element) or iN (packed, individual bits not addressable) depending on the target. And indeed rustc won't let you do #[repr(simd)] struct BoolVec(bool, bool, ...) currently.

Probably best to use a memory layout compatible with [bool; N] or (bool, bool, ...): u8 elements, 0 or 1.

Cargo.toml Outdated
lto = false
debug-assertions = true
codegen-units = 1
panic = 'unwind'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: should be a newline at the end of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert these changes before merging. These profiles were added only to test if setting codegen-units to 1 would improve either compiletimes or remove some issues of incremental compilation.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

@rkruppe the question is, should we only expose b8x{2,4,6,8,32,74} or should we also expose wider boolean vector types?

From the POV of portable operations exposing b8xN should be enough, because the actual size of the boolean vector type is irrelevant.

From the POV of the architecture specific intrinsics, some of them return "boolean vectors" of a larger width stored in either integer or floating point registers. I don't know if b8xN is enough to provide type-safe wrappers around these intrinsics that have zero-runtime cost. The original simd crate had types like b32fx8 probably for this purpose.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

Come to think of it, I actually doubt there is a portable way to expose to Rust -- because Rust types can be stored in memory, but is stored either like (one byte per element) or iN (packed, individual bits not addressable) depending on the target.

Could you elaborate on this? (or are you on IRC?). My question is: why does this matter?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

@alexcrichton so this should be good to go modulo compile-times.

@alexcrichton
Copy link
Member

Ok thanks! I'd like to dig into this a bit first and investigate compile times a bit, I'll do that now.

@alexcrichton
Copy link
Member

Hm ok there may have been a recent rustc regression that was fixed, in any case looks like it's not too slow now, and yeah it's all almost entirely tests which we can of course move around later if need be. Thanks again @gnzlbg!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants