We've recently been working on a change [1] in rustc to how we implement LTO in rustc itself. The change we're implementing is to leverage LLVM's support for an object file format where the object file has native machine code but also has LLVM bitcode embedded into a custom section. This allows us to ship binaries which can both be natively linked but also LTO'd against without having any auxiliary files of our own. The WebAssembly target, however, does not support this feature. This bug is intended to be a feature request for implementing this for the wasm target, allowing producers to embed LLVM bitcode into a blessed section name which would be recognized by LLVM's LTO pipeline and additionally stripped by default in LLD. I'm currently filing this on LLD because it seems like it may mostly affect that, but I can also move it around if that would help! [1]: https://github.com/rust-lang/rust/pull/70458
Interesting. Is this a feature that currently exists in lld for other targets? Or is this something completely new? Currently LTO is used in the linker when you pass it bitcode object files. If you pass objects files that contain both bitcode and native code, how does the linker know which one you want to use? Would we need to add a new flag? Does such a flag exist in the ELF version lld already?
I believe this feature has existed for around 5 years or so at this point (doing some googling around). Looking back through the history though and poking around the codebase it looks like this was primarily introduced for LLVM bitcode distribution for the app store on iOS (it's the -fembed-bitcode option on clang). In some sense this is generally just a desire to get -fembed-bitcode working for the wasm target (which may be why LLD is the wrong place to file this?). For LLD we could also use a "please ignore this custom section" option because for now we're making our own custom section in wasm object files which we want LLD to ignore
Certainly I can look at making `-fembed-bitcode` work with the wasm object format. I'm still a little confused though, who its supposed to read this section if not the linker? i.e. what are the current consumers of this section? I'm happy to implement this in the linker too, if that is something other backends do.
The current consumer that we're thinking of is rustc itself, not any LLVM tooling. From what I've seen LLVM has various APIs to read bitcode from an object file, but it's up to embeddings to make use of it. I thought that linkers would transparently pick up the bitcode when LTO was being performed, but I haven't confirmed this. For context rustc has its own LTO passes which would use this support. Currently it works for all platforms other than wasm, but implementing support for bitcode in wasm object files would enable it to work on wasm as well!
Ok, sounds to good me. How do you extract the LTO section within rustc? Do rely on some API such as llvm's libObject and look for a certain named section?
I looks like `-fembed-bitcode` does currently work with wasm. However the the two sections `.llvmbc` and `.llvmcmd` that it generates appear in the object file as data segments: ``` Data[3]: - segment[0] <.data.bardata> memory=0 size=4 - init i32=0 - 0000000: 0100 0000 .... - segment[1] <.llvmbc> memory=0 size=2296 - init i32=16 - 0000010: 4243 c0de 3514 0000 0500 0000 620c 3024 BC..5.......b.0$ - 0000020: 4d59 be66 5dfb b44f 1bc8 2444 0132 0500 MY.f]..O..$D.2.. - 0000030: 210c 0000 ee01 0000 0b02 2100 0200 0000 !.........!..... ... - segment[2] <.llvmcmd> memory=0 size=117 - init i32=2320 - 0000910: 2d74 7269 706c 6500 7761 736d 3332 002d -triple.wasm32.- - 0000920: 656d 6974 2d6f 626a 002d 6d72 656c 6178 emit-obj.-mrelax - 0000930: 2d61 6c6c 002d 6665 6d62 6564 2d62 6974 -all.-fembed-bit - 0000940: 636f 6465 3d61 6c6c 002d 6665 6d62 6564 code=all.-fembed - 0000950: 2d62 6974 636f 6465 3d61 6c6c 002d 6469 -bitcode=all.-di - 0000960: 7361 626c 652d 6c6c 766d 2d70 6173 7365 sable-llvm-passe - 0000970: 7300 2d66 6e6f 2d72 6f75 6e64 696e 672d s.-fno-rounding- - 0000980: 6d61 7468 00 math. ``` This means that are part of the same `section` which is the data section. We has a similar issue with the `.clangast` section: https://bugs.llvm.org/show_bug.cgi?id=35928 https://reviews.llvm.org/D74531 I believe the real long term solution is: https://github.com/WebAssembly/tool-conventions/issues/138 Which would mean the data segments and individual function would appear to libObject as "sections". In the short term we hardcode certain section names to appear instead as custom sections (as in https://reviews.llvm.org/D7453).
Ah yeah so we have some custom C++ bindings in rustc, and we're calling `object::IRObjectFile::findBitcodeInObject` to find bitcode in object files right now. Also nice! If it already works then I think we just need a way to get `object::IRObjectFile::findBitcodeInObject` to find it and to have LLD drop the data/sections by default during linking.