Skip to content

revise #[inline(always)] vs #[inline] #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gnzlbg opened this issue Mar 3, 2018 · 12 comments
Closed

revise #[inline(always)] vs #[inline] #340

gnzlbg opened this issue Mar 3, 2018 · 12 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 3, 2018

With ThinLTO #[inline(always)] should not be necessary anymore, most of its uses can be just #[inline].

@alexcrichton
Copy link
Member

FWIW this was already done for all the vendor intrinsics, and I think it'll work to just switch everything to #[inline]

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 5, 2018

So i've change all uses of #[inline(always)] to #[inline] in #338

@gnzlbg gnzlbg closed this as completed Mar 5, 2018
@jasondavies
Copy link
Contributor

I just ran into some issues relating to lack of inlining. In order to test the SHA extensions work, I've converted some C code to Rust (it's mostly just calls to SSE/SHA intrinsics). However, a couple of SSE intrinsics were not being inlined and caused a 2x slowdown relative to the original C code. I determined the lack of inlining by manually inspecting the generated assembly.

Changing to #[inline(always)] on these couple of intrinsic functions fixed the slowdown issue. The intrinsics were:

  • _mm_blend_epi16
  • _mm_alignr_epi8

I'm using nightly (75af15ee6 2018-03-20).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

@jasondavies Thanks for the feedback.

Could you tell us how are you compiling your code?

Are you just using cargo --release with the "defaut" release profile, or have you manually tuned the release profile in your Cargo.toml or .cargo/config.toml ? If you are using cargo --release + default profile, could you try enabling "fat" LTO and see if this improves?

@gnzlbg gnzlbg reopened this Mar 22, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

You can try that with RUSTFLAGS="-C lto=fat" cargo build --release

@jasondavies
Copy link
Contributor

Sorry, forgot to include that: yes, I'm using --release (I created a new example in examples/ for testing): cargo build --release --examples.

I tried RUSTFLAGS="-C lto=fat" cargo build --release --examples just now and it did not resolve the slowdown issue.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

Ok one final question, are you using stdsimd, or the simd facilities via std/core ?

If you are using the simd facilities via std/core, could you try if using stdsimd with LTO enabled solves your issue? If it doesn't, it would be cool if you have a github repo were you could just push a branch that we can use to reproduce this, or if you prefer, if you could post a minimal working example (e.g. in the playground) that reproduces the issue.


EDIT: we have had problems with inlining in the past, which is why everything used to be #[inline(always)] but that hits compile-times really hard, and ThinLTO should be able to inline the intrinsics in all cases, although it might sometimes decide not to do so. ThinLTO is not that old, so there might still be some pieces that need some ironing.

@alexcrichton
Copy link
Member

If intrinsics aren't being inlined it's almost for sure due to mismatched target features enabled on functions rather than #[inline] vs #[inline(always)]. @jasondavies are you sure that the function you'd like them inlined into has enabled the right features? If so, is there something we could poke around to see the codegen of?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 22, 2018

+1 to what @alexcrichton said: if you call, for example, an AVX function from an SSE one, the AVX function will never be inlined.

@jasondavies
Copy link
Contributor

jasondavies commented Mar 22, 2018

Ah, that could be it! I'm using stdsimd. Looks like I need to enable some more target features as I only had enable = "sha".

@jasondavies
Copy link
Contributor

Yes, that fixed it. Thanks!

@alexcrichton
Copy link
Member

Awesome, good to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants