LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 36096 - Regression: Invalid *.lib produced on MSVC upgrading from 4.0.1
Summary: Regression: Invalid *.lib produced on MSVC upgrading from 4.0.1
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Object (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-25 11:52 PST by Alex Crichton
Modified: 2018-07-20 13:50 PDT (History)
8 users (show)

See Also:
Fixed By Commit(s):


Attachments
Object to insert in archive (46.02 KB, application/octet-stream)
2018-01-25 11:52 PST, Alex Crichton
Details
archive link.exe thinks is corrupt (55.00 KB, application/x-archive)
2018-01-25 11:57 PST, Alex Crichton
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Crichton 2018-01-25 11:52:28 PST
Created attachment 19750 [details]
Object to insert in archive

Over at rust-lang/rust we've been a little slow to the uptick of landing LLVM changes, but we're getting around now to upgrading from LLVM 4.0.1 to 6.0.0. Unfortunately though we're seeing what I think is a regression in behavior of the `llvm-ar.exe` tool on Windows MSVC.

I've run a bisection and it pointed at https://reviews.llvm.org/D29892 so I'm cc'ing those there to hope that y'all can help out with this issue! Unfortunately I don't have a small reproduction to gist here but I can certainly work to make it smaller if need be!

The rust compiler isn't necessarily standard about how it uses LLVM, for example we don't literally use `llvm-ar.exe` but rather call `writeArchive` directly. I believe that the various invocations are the same, however, in terms of reproduction at the command line. 

In any case, the bug that I'm seeing happens when we do something like:

* First we compile a C++ library to an archive using its own build system (CMake). That project is https://github.com/WebAssembly/binaryen. 
* Next, the Rust compiler opens up this archive using the `Archive` class in LLVM and iterates over it with `Archive::child_iterator`.
* We then create a new archive with `writeArchive` where some members are freshly generated object files (aka the Rust code) and the remaining members are those from the previous `Archive` opened earlier. We're using `NewArchiveMember::getOldMember` to insert these preexisting members into the new archive.
* Later this archive that was generated is then passed to `link.exe`, MSVC's linker
* The bug happens here where MSVC's linker spits out a message "library is corrupt" (with no other information).

Using LLVM 4.0.1 `link.exe` doesn't spit out this error message, but using LLVM 6.0.0 `link.exe` spits out the error message. 

While I haven't managed to create a tiny test case I have managed to reduce this somewhat. Specifically I've been taking the object file attached to this bug and executing:

    llvm-ar.exe crus libfoo.a archive.obj

When the resulting archive is fed to `link.exe` on LLVM 4.0.1 then `link.exe` spits out a ton of undefined symbol errors (as the archive is missing all the other objects). On LLVM 6.0.0, however, `link.exe` simply spits out "library is corrupt". Unfortunately a direct invocation of `link.exe` isn't working, I've been using it with all the other libraries and files on the originally failing `link.exe` command. I'm not too familiar with `link.exe`, though, so y'all may know how to create a more directly failing test case!

Upon running a bisection it ended up pointing at https://github.com/llvm-mirror/llvm/commit/5d7d0e869f7abea7d1022e4b65a75a97dc2e54a6. This commit was reverted a few hours later but a slight tweak was relanded and I believe it's stayed since. The "suspicious" line for archive writing was the change to `ArchiveWriter.cpp` where `SF_Indirect` symbols were changed to be included in the archive.

I've tested out LLVM 6.0.0 with just the change to `ArchiveWriter.cpp` reverted from that patch (aka remove the branch that checks `SF_Indirect`). That seems to at least restore the old behavior and produces a working archive which isn't considered corrupt by `link.exe`.

Ok so that's all the information that I currently have at this time, but I realize it's not a great amount of information as it can't be trivially reproduced yet! I'm hoping though that y'all cc'd here can help me out and either recognize what's going on here or suggest a way to help minimize.
Comment 1 Alex Crichton 2018-01-25 11:56:02 PST
Oh I should point out that this may also just be uncovering a preexisting bug, I'm not actually certain that the change in question caused a regression per se. What I am hoping though is that y'all are a little more familiar with the archive format on MSVC than I and can hopefully spot a regression more easily!
Comment 2 Alex Crichton 2018-01-25 11:57:32 PST
Created attachment 19751 [details]
archive link.exe thinks is corrupt
Comment 3 Rui Ueyama 2018-01-25 12:01:08 PST
Yeah, that `SF_Indirect` line is suspicious, but I'm not sure if it's the cause of the issue. CCing Rafael as he knew very well about the archiver.
Comment 4 Martell Malone 2018-02-15 13:53:09 PST
I used SymbolRef::SF_Indirect as a marker for denoting a weak external under direction from pcc.
I can't seem to reproduce this issue by using llvm-lib or llvm-dlltool.
Alex,I'm not sure about the rust code paths when using llvm but is it possible that the rust compiler be using this flag for something that should be specifically for ELF targets but just not protected by some condition because it never affected COFF before?

I'm going to add Peter here to ask if it would be a good idea to switch to a different marker.
Comment 5 Martell Malone 2018-02-15 13:56:30 PST
Ugh, adding a user to the CC list automatically posted my message.
Here is a more grammatically correct version.

I used SF_Indirect as a marker for denoting a weak external under direction from pcc.
I can't seem to reproduce this issue by using llvm-lib or llvm-dlltool.
Is it possible that the rust compiler is using this flag for something that should be specifically for ELF targets but just not protected by some condition because it never affected COFF before?

I'm going to add Peter here to ask if it would be a good idea to switch to a different marker.
Comment 6 Peter Collingbourne 2018-02-15 15:02:40 PST
I think the problem here is that llvm-ar includes weak externals in the archive's symbol table, while neither lib.exe nor mingw32 ar do. link.exe does not expect weak externals to appear in the symbol table, so it considers the file corrupt. So we should at least stop adding weak externals to the symbol table when the file is being created by llvm-ar.

That leaves what to do about the weak externals created by llvm-dlltool. I'm not actually sure that llvm-dlltool is doing the right thing here. The test case  llvm/test/tools/llvm-dlltool/coff-weak-exports.def currently creates an import library with a weak external TestFunction and an external reference to WeakTestFunction, but if I use binutils dlltool or lib.exe to create an import library from that I see a strong definition of TestFunction which also appears in the archive symbol table. So maybe llvm-dlltool should be doing the same and we should remove the SF_Indirect part from llvm-ar.
Comment 7 Alex Crichton 2018-02-16 19:42:09 PST
(In reply to Martell Malone from comment #5)
> Is it possible that the rust compiler is using this flag for something that
> should be specifically for ELF targets but just not protected by some
> condition because it never affected COFF before?


AFAIK we don't do anything fancy with archives or flags or anything like that, I'm not even sure what SF_Indirect/weak means on MSVC so we at least shouldn't be using it intentionally!

That being said Peter's last response also sounds like the problem we saw!
Comment 8 Martell Malone 2018-02-20 22:26:56 PST
(In reply to Peter Collingbourne from comment #6)
> I think the problem here is that llvm-ar includes weak externals in the
> archive's symbol table, while neither lib.exe nor mingw32 ar do. link.exe
> does not expect weak externals to appear in the symbol table, so it
> considers the file corrupt. So we should at least stop adding weak externals
> to the symbol table when the file is being created by llvm-ar.
> 
> That leaves what to do about the weak externals created by llvm-dlltool. I'm
> not actually sure that llvm-dlltool is doing the right thing here. The test
> case  llvm/test/tools/llvm-dlltool/coff-weak-exports.def currently creates
> an import library with a weak external TestFunction and an external
> reference to WeakTestFunction, but if I use binutils dlltool or lib.exe to
> create an import library from that I see a strong definition of TestFunction
> which also appears in the archive symbol table. So maybe llvm-dlltool should
> be doing the same and we should remove the SF_Indirect part from llvm-ar.

The public version of lib.exe does not support creating weak aliases but I think it is safe to assume that MS have an internal tool or internal version of lib.exe that does support the creation of weak externals when building the msvcrt and I assume when building windows in general.
Mingw-w64 needs this support to built the crt.

When we create a library with llvm-dlltool that has weak externals lib.exe does not consider the library to be corrupt or invalid.
Which is why I think 

Removing support from llvm-ar to create weak aliases will break mingw-w64 with clang and lld.

Binutils dlltool does it's own hackery and creates a format lib.exe does not support and is out of spec, we can not use this with MSVC LINK or LLD and the behavior is generally undefined.

I think the issue here is more about the difference between a COFF weak alias and a normal weak external.
This is why I was asking Alex if rust was doing something it should only be doing for ELF for COFF.

Maybe putting this behind a flag for COFF when using llvm-ar is an option.
We use llvm-ar to combine weak external libs with regular libraries for example libcxx and libpsapi to support libcxx on mingw-w64.
We would then still have the ability to combine valid libs correctly.

Adding Martin here because he has also done a lot of work on mingw-w64 support.

Martin any thoughts or impacts on having llvm-ar ignore weak externals?
Comment 9 Martin Storsjö 2018-02-21 00:38:38 PST
(In reply to Martell Malone from comment #8)
> (In reply to Peter Collingbourne from comment #6)
> > I think the problem here is that llvm-ar includes weak externals in the
> > archive's symbol table, while neither lib.exe nor mingw32 ar do. link.exe
> > does not expect weak externals to appear in the symbol table, so it
> > considers the file corrupt. So we should at least stop adding weak externals
> > to the symbol table when the file is being created by llvm-ar.
> > 
> > That leaves what to do about the weak externals created by llvm-dlltool. I'm
> > not actually sure that llvm-dlltool is doing the right thing here. The test
> > case  llvm/test/tools/llvm-dlltool/coff-weak-exports.def currently creates
> > an import library with a weak external TestFunction and an external
> > reference to WeakTestFunction, but if I use binutils dlltool or lib.exe to
> > create an import library from that I see a strong definition of TestFunction
> > which also appears in the archive symbol table. So maybe llvm-dlltool should
> > be doing the same and we should remove the SF_Indirect part from llvm-ar.
> 
> The public version of lib.exe does not support creating weak aliases but I
> think it is safe to assume that MS have an internal tool or internal version
> of lib.exe that does support the creation of weak externals when building
> the msvcrt and I assume when building windows in general.

Yes, the msvc provided msvcrt.lib contains a bunch of weak symbols that work like the aliases you can create with llvm-dlltool like this.

> Mingw-w64 needs this support to built the crt.
> 
> When we create a library with llvm-dlltool that has weak externals lib.exe
> does not consider the library to be corrupt or invalid.
> Which is why I think 
> 
> Removing support from llvm-ar to create weak aliases will break mingw-w64
> with clang and lld.
> 
> Binutils dlltool does it's own hackery and creates a format lib.exe does not
> support and is out of spec, we can not use this with MSVC LINK or LLD and
> the behavior is generally undefined.
> 
> I think the issue here is more about the difference between a COFF weak
> alias and a normal weak external.
> This is why I was asking Alex if rust was doing something it should only be
> doing for ELF for COFF.
> 
> Maybe putting this behind a flag for COFF when using llvm-ar is an option.
> We use llvm-ar to combine weak external libs with regular libraries for
> example libcxx and libpsapi to support libcxx on mingw-w64.
> We would then still have the ability to combine valid libs correctly.
> 
> Adding Martin here because he has also done a lot of work on mingw-w64
> support.
> 
> Martin any thoughts or impacts on having llvm-ar ignore weak externals?

No, I don't think that's the right way to go.

In my llvm-mingw setup, I'm quite reliant on llvm-ar preserving these symbols in the library symbol index. I did a test build with this hunk of ArchiveWriter.cpp reverted, and it fails when lld fails to find some symbols that were provided by aliases. And differentiating between llvm-dlltool and llvm-ar isn't the right thing to do either, because you can use llvm-ar to later update the import library produced by llvm-dlltool.


I think the main issue is to differentiate between the weak aliases (which are ok to keep in the library symbol index) and the plain weak undefined symbols (which can't be in the index, and which make link.exe error out).


To add some more detail on the matter: If you run llvm-nm on the provided archive.obj, you'll find (among other things) this:
00000000 R ??_C@_1O@OKIDBCFO@?$AA?$CB?$AAe?$AAr?$AAr?$AAo?$AAr?$AA?$AA@
         w ??_Eexception@std@@UAEPAXI@Z
00000000 T ??_G_Generic_error_category@std@@UAEPAXI@Z
(the exception one is the relevant, I included some others for context).

On the other hand, if you create an import library with a weak alias like this:
$ cat weak.def 
LIBRARY weak.dll
EXPORTS
printf
foobar
_alias == foobar
$ llvm-dlltool -d weak.def -l weak.lib -m i386
$ llvm-nm weak.lib
weak.dll:
00000000 T __imp__printf
00000000 T _printf

weak.dll:
00000000 T __imp__foobar
00000000 T _foobar

weak.dll:
00000000 a @comp.id
00000000 a @feat.00
         w _alias
         U _foobar

weak.dll:
00000000 a @comp.id
00000000 a @feat.00
         w __imp__alias
         U __imp__foobar

In this case, the symbols _alias and __imp__alias look like the problematic ones from archive.obj, but to link.exe they're different.
Comment 10 Peter Collingbourne 2018-02-21 11:56:11 PST
> Yes, the msvc provided msvcrt.lib contains a bunch of weak symbols that work like the aliases you can create with llvm-dlltool like this.

I couldn't find anything like that in my copy of msvcrt.lib, but I did find some object files in oldnames.lib that seem to fit what I think you are describing. If I run one of them through obj2yaml it looks like this:

--- !COFF
header:          
  Machine:         IMAGE_FILE_MACHINE_UNKNOWN
  Characteristics: [  ]
sections:        
  - Name:            .drectve
    Characteristics: [ IMAGE_SCN_LNK_INFO, IMAGE_SCN_LNK_REMOVE ]
    Alignment:       1
    SectionData:     ''
symbols:         
  - Name:            '@comp.id'
    Value:           13082782
    SectionNumber:   -1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            '@feat.00'
    Value:           17
    SectionNumber:   -1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_STATIC
  - Name:            __imp__utime32
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
  - Name:            __imp_utime
    Value:           0
    SectionNumber:   0
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_WEAK_EXTERNAL
    WeakExternal:    
      TagIndex:        2
      Characteristics: IMAGE_WEAK_EXTERN_SEARCH_ALIAS
...

And indeed, if I use lib.exe to create a .lib from that I see an archive symbol table entry for __imp_utime. I then edited the .yaml to replace IMAGE_WEAK_EXTERN_SEARCH_ALIAS with IMAGE_WEAK_EXTERN_SEARCH_LIBRARY and ran lib.exe again. This time, there was no archive symbol table entry.

I suspect that the right fix here would be to stop setting SF_Undefined if characteristics == IMAGE_WEAK_EXTERN_SEARCH_ALIAS, and remove the SF_Indirect check from ArchiveWriter.
Comment 11 Rafael Ávila de Espíndola 2018-02-22 15:49:46 PST
(In reply to Peter Collingbourne from comment #10)
> > Yes, the msvc provided msvcrt.lib contains a bunch of weak symbols that work like the aliases you can create with llvm-dlltool like this.
> 
> I couldn't find anything like that in my copy of msvcrt.lib, but I did find
> some object files in oldnames.lib that seem to fit what I think you are
> describing. If I run one of them through obj2yaml it looks like this:
> 
> --- !COFF
> header:          
>   Machine:         IMAGE_FILE_MACHINE_UNKNOWN
>   Characteristics: [  ]
> sections:        
>   - Name:            .drectve
>     Characteristics: [ IMAGE_SCN_LNK_INFO, IMAGE_SCN_LNK_REMOVE ]
>     Alignment:       1
>     SectionData:     ''
> symbols:         
>   - Name:            '@comp.id'
>     Value:           13082782
>     SectionNumber:   -1
>     SimpleType:      IMAGE_SYM_TYPE_NULL
>     ComplexType:     IMAGE_SYM_DTYPE_NULL
>     StorageClass:    IMAGE_SYM_CLASS_STATIC
>   - Name:            '@feat.00'
>     Value:           17
>     SectionNumber:   -1
>     SimpleType:      IMAGE_SYM_TYPE_NULL
>     ComplexType:     IMAGE_SYM_DTYPE_NULL
>     StorageClass:    IMAGE_SYM_CLASS_STATIC
>   - Name:            __imp__utime32
>     Value:           0
>     SectionNumber:   0
>     SimpleType:      IMAGE_SYM_TYPE_NULL
>     ComplexType:     IMAGE_SYM_DTYPE_NULL
>     StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
>   - Name:            __imp_utime
>     Value:           0
>     SectionNumber:   0
>     SimpleType:      IMAGE_SYM_TYPE_NULL
>     ComplexType:     IMAGE_SYM_DTYPE_NULL
>     StorageClass:    IMAGE_SYM_CLASS_WEAK_EXTERNAL
>     WeakExternal:    
>       TagIndex:        2
>       Characteristics: IMAGE_WEAK_EXTERN_SEARCH_ALIAS
> ...
> 
> And indeed, if I use lib.exe to create a .lib from that I see an archive
> symbol table entry for __imp_utime. I then edited the .yaml to replace
> IMAGE_WEAK_EXTERN_SEARCH_ALIAS with IMAGE_WEAK_EXTERN_SEARCH_LIBRARY and ran
> lib.exe again. This time, there was no archive symbol table entry.
> 
> I suspect that the right fix here would be to stop setting SF_Undefined if
> characteristics == IMAGE_WEAK_EXTERN_SEARCH_ALIAS, and remove the
> SF_Indirect check from ArchiveWriter.

At what level should this be done? Should COFFSymbolRef::isWeakExternal return false for aliases for example?
Comment 12 Peter Collingbourne 2018-02-28 13:32:34 PST
(In reply to Rafael Ávila de Espíndola from comment #11)
> (In reply to Peter Collingbourne from comment #10)
> > > Yes, the msvc provided msvcrt.lib contains a bunch of weak symbols that work like the aliases you can create with llvm-dlltool like this.
> > 
> > I couldn't find anything like that in my copy of msvcrt.lib, but I did find
> > some object files in oldnames.lib that seem to fit what I think you are
> > describing. If I run one of them through obj2yaml it looks like this:
> > 
> > --- !COFF
> > header:          
> >   Machine:         IMAGE_FILE_MACHINE_UNKNOWN
> >   Characteristics: [  ]
> > sections:        
> >   - Name:            .drectve
> >     Characteristics: [ IMAGE_SCN_LNK_INFO, IMAGE_SCN_LNK_REMOVE ]
> >     Alignment:       1
> >     SectionData:     ''
> > symbols:         
> >   - Name:            '@comp.id'
> >     Value:           13082782
> >     SectionNumber:   -1
> >     SimpleType:      IMAGE_SYM_TYPE_NULL
> >     ComplexType:     IMAGE_SYM_DTYPE_NULL
> >     StorageClass:    IMAGE_SYM_CLASS_STATIC
> >   - Name:            '@feat.00'
> >     Value:           17
> >     SectionNumber:   -1
> >     SimpleType:      IMAGE_SYM_TYPE_NULL
> >     ComplexType:     IMAGE_SYM_DTYPE_NULL
> >     StorageClass:    IMAGE_SYM_CLASS_STATIC
> >   - Name:            __imp__utime32
> >     Value:           0
> >     SectionNumber:   0
> >     SimpleType:      IMAGE_SYM_TYPE_NULL
> >     ComplexType:     IMAGE_SYM_DTYPE_NULL
> >     StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
> >   - Name:            __imp_utime
> >     Value:           0
> >     SectionNumber:   0
> >     SimpleType:      IMAGE_SYM_TYPE_NULL
> >     ComplexType:     IMAGE_SYM_DTYPE_NULL
> >     StorageClass:    IMAGE_SYM_CLASS_WEAK_EXTERNAL
> >     WeakExternal:    
> >       TagIndex:        2
> >       Characteristics: IMAGE_WEAK_EXTERN_SEARCH_ALIAS
> > ...
> > 
> > And indeed, if I use lib.exe to create a .lib from that I see an archive
> > symbol table entry for __imp_utime. I then edited the .yaml to replace
> > IMAGE_WEAK_EXTERN_SEARCH_ALIAS with IMAGE_WEAK_EXTERN_SEARCH_LIBRARY and ran
> > lib.exe again. This time, there was no archive symbol table entry.
> > 
> > I suspect that the right fix here would be to stop setting SF_Undefined if
> > characteristics == IMAGE_WEAK_EXTERN_SEARCH_ALIAS, and remove the
> > SF_Indirect check from ArchiveWriter.
> 
> At what level should this be done? Should COFFSymbolRef::isWeakExternal
> return false for aliases for example?

The symbol is still a weak external (and, as far as I can tell, has the same semantics at the object file level), so I don't think we should do that. We probably just want to add an accessor function that exposes the characteristics to COFFSymbolRef and use that in the function that computes the flags.
Comment 13 Martell Malone 2018-03-11 14:48:35 PDT
Currently working on a fix here
https://reviews.llvm.org/D44357
Comment 14 Martin Storsjö 2018-07-20 13:50:13 PDT
This should have been fixed now in SVN r337613.