Skip to content

Object: Don't error out on malformed bitcode files. #96848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pcc
Copy link
Contributor

@pcc pcc commented Jun 27, 2024

An error reading a bitcode file most likely indicates that the file
was created by a compiler from the future. Normally we don't try to
implement forwards compatibility for bitcode files, but when creating
an archive we can implement best-effort forwards compatibility by
treating the file as a blob and not creating symbol index entries for
it. lld and mold ignore the archive symbol index, so provided that
you use one of these linkers, LTO will work as long as lld or the
gold plugin is newer than the compiler. We only ignore errors if the
archive format is one that is supported by a linker that is known to
ignore the index, otherwise there's no chance of this working so we
may as well error out. We print a warning on read failure so that
users of linkers that rely on the symbol index can diagnose the issue.

This is the same behavior as GNU ar when the linker plugin returns
an error when reading the input file. If the bitcode file is actually
malformed, it will be diagnosed at link time.

Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (pcc)

Changes

An error reading a bitcode file most likely indicates that the file
was created by a compiler from the future. Normally we don't try to
implement forwards compatibility for bitcode files, but when creating
an archive we can implement best-effort forwards compatibility by
treating the file as a blob and not creating symbol index entries for
it. lld and mold ignore the archive symbol index, so provided that
you use one of these linkers, LTO will work as long as lld or the
gold plugin is newer than the compiler. We only ignore errors if the
archive format is one that is supported by a linker that is known to
ignore the index, otherwise there's no chance of this working so we
may as well error out. We print a warning on read failure so that
users of linkers that rely on the symbol index can diagnose the issue.

This is the same behavior as GNU ar when the linker plugin returns
an error when reading the input file. If the bitcode file is actually
malformed, it will be diagnosed at link time.


Full diff: https://github.com/llvm/llvm-project/pull/96848.diff

2 Files Affected:

  • (modified) llvm/lib/Object/ArchiveWriter.cpp (+35-4)
  • (modified) llvm/test/Object/archive-malformed-object.test (+21-5)
diff --git a/llvm/lib/Object/ArchiveWriter.cpp b/llvm/lib/Object/ArchiveWriter.cpp
index 913b74c110b36..c6d443ff9d15a 100644
--- a/llvm/lib/Object/ArchiveWriter.cpp
+++ b/llvm/lib/Object/ArchiveWriter.cpp
@@ -482,7 +482,8 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
@@ -490,8 +491,38 @@ getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
+                                    "warning: " + Buf.getBufferIdentifier() +
+                                        ": ");
+        return nullptr;
+      case object::Archive::K_AIXBIG:
+      case object::Archive::K_COFF:
+      case object::Archive::K_DARWIN:
+      case object::Archive::K_DARWIN64:
+        return ObjOrErr.takeError();
+      }
+    }
     return std::move(*ObjOrErr);
   } else {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(Buf);
@@ -820,7 +851,7 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
   if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) {
     for (const NewArchiveMember &M : NewMembers) {
       Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
-          getSymbolicFile(M.Buf->getMemBufferRef(), Context);
+          getSymbolicFile(M.Buf->getMemBufferRef(), Context, Kind);
       if (!SymFileOrErr)
         return createFileError(M.MemberName, SymFileOrErr.takeError());
       SymFiles.push_back(std::move(*SymFileOrErr));
diff --git a/llvm/test/Object/archive-malformed-object.test b/llvm/test/Object/archive-malformed-object.test
index a92762975bda6..7492dc513492e 100644
--- a/llvm/test/Object/archive-malformed-object.test
+++ b/llvm/test/Object/archive-malformed-object.test
@@ -1,5 +1,6 @@
 ## Show that the archive library emits error messages when adding malformed
-## objects.
+## object files and skips symbol tables for "malformed" bitcode files, which
+## are assumed to be bitcode files generated by compilers from the future.
 
 # RUN: rm -rf %t.dir
 # RUN: split-file %s %t.dir
@@ -9,19 +10,28 @@
 # RUN: llvm-as input.ll -o input.bc
 # RUN: cp input.bc good.bc
 # RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"
-# RUN: not llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
+# RUN: llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
+
+# llvm-nm will fail when it tries to read the malformed bitcode file, but
+# it's supposed to print the archive map first, which in this case it
+# won't because there won't be one.
+# RUN: not llvm-nm --print-armap bad.a | count 0
 
 ## Malformed bitcode object is the last file member of archive if the symbol table is required.
 # RUN: rm -rf bad.a
-# RUN: not llvm-ar rc bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
+# RUN: llvm-ar rc bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
+# RUN: not llvm-nm --print-armap bad.a | FileCheck %s --check-prefix=ARMAP
 
 ## Malformed bitcode object if the symbol table is not required for big archive.
+## For big archives we print an error instead of a warning because the AIX linker
+## presumably requires the index.
 # RUN: rm -rf bad.a
 # RUN: not llvm-ar --format=bigarchive rcS bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
 # RUN: rm -rf bad.a
 # RUN: not llvm-ar --format=bigarchive rcS bad.a good.bc input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
 
 # ERR1: error: bad.a: 'input.bc': Invalid bitcode signature
+# WARN1: warning: input.bc: Invalid bitcode signature
 
 ## Non-bitcode malformed file.
 # RUN: yaml2obj input.yaml -o input.o
@@ -29,17 +39,23 @@
 
 # ERR2: error: bad.a: 'input.o': section header table goes past the end of the file: e_shoff = 0x9999
 
-## Don't emit an error if the symbol table is not required for formats other than the big archive format.
-# RUN: llvm-ar --format=gnu rcS good.a input.o input.bc
+## Don't emit an error or warning if the symbol table is not required for formats other than the big archive format.
+# RUN: llvm-ar --format=gnu rcS good.a input.o input.bc 2>&1 | count 0
 # RUN: llvm-ar t good.a | FileCheck %s --check-prefix=CONTENTS
 
 # CONTENTS:      input.o
 # CONTENTS-NEXT: input.bc
 
+# ARMAP: Archive map
+# ARMAP-NEXT: foo in good.bc
+# ARMAP-EMPTY:
+
 #--- input.ll
 target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-pc-linux"
 
+@foo = global i32 1
+
 #--- input.yaml
 --- !ELF
 FileHeader:

# RUN: not llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1

# llvm-nm will fail when it tries to read the malformed bitcode file, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

##

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# llvm-nm will fail when it tries to read the malformed bitcode file, but
# it's supposed to print the archive map first, which in this case it
# won't because there won't be one.
# RUN: not llvm-nm --print-armap bad.a | count 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2>&1 and check the stderr as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only want to test that there is no archive map. The test happens to use the functionality of llvm-nm. We don't particularly care whether (how) llvm-nm fails afterwards, because this is not a test of llvm-nm. We need to use not but that's only necessary to prevent the llvm-nm failure from causing a test failure.

With 2>&1 we would need to test that the first line of llvm-nm output is something like

llvm-nm: error: bad.a(input.bc): Invalid bitcode signature

But I don't think there's a way to use FileCheck to check that the first line of output matches a pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I thought a CHECK-FIRST directive was added to FileCheck, to allow that, but evidently not. If you did want to check that something is the first line though, I think you can do something like this:

# CHECK: {{^}}
# CHECK-SAME: thing I'm interested in

This works, because the {{^}} matches the start of the line, and since every line has a start, it matches specifically the first line. The CHECK-SAME then pins the thing it's checking to the previously-matched line, i.e. the first line.

(I am ambivalent on whether you should do this for this specific case, but I think it may be worth a comment explaining why you're doing this llvm-nm invocation here either way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. That's an interesting trick, but it's probably not worth it here.

@@ -9,37 +10,52 @@
# RUN: llvm-as input.ll -o input.bc
# RUN: cp input.bc good.bc
# RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"
# RUN: not llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-ar.cpp:1045 says the default archive format is decided by the default target triple. If it is COFF/AIXBIG/DARWIN, there would still be an error.

We need to enumerate different archive formats...

rm -f bad.a && llvm-ar --format=gnu rc bad.a input.bc
rm -f bad.a && llvm-ar --format=darwin rc bad.a input.bc
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I don't see this being addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had done the second part but not the first. Done now.

case object::Archive::K_BSD:
case object::Archive::K_GNU:
case object::Archive::K_GNU64:
llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on this warning being printed inside the library. This will prevent e.g. client tools treating the warning as an error. I would prefer one of two options:

  1. Continue returning it up the stack, and report the warning further up (perhaps using something in the error code within the error to identify it as opposed to other genuine errors).

  2. Pass in a callback function that is called when this case is hit. Client code can choose then to handle the error in whatever way it chooses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have a system for warnings with different warning levels though, at least not outside of Clang. Taking your example of a client that wants warnings to be errors, it would not be sufficient to return it up the stack from here because Expected is a sum type not a product type. Such a client would likely not want the file to be created at all if there were a warning, so the callback may be a better choice to support such a client (which would do something like return true if the passed-in error should actually be treated as an error) but to some degree that's just an elaborate way of passing in a bool WarningsAreErrors argument, and maybe that's all that the client would want or need.

So I think we should avoid trying to design for hypothetical clients and instead add features on an as-needed basis, especially if the way to support the hypothetical client is not obvious. All in-tree clients want a warning printed to stderr, so that's what I implemented here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the analysis. llvm::logAllUnhandledErrors is probably not the most elegant approach, but it is the most practical one. Passing in bool WarningsAreErrors would require updating quite a few functions and end up adding lots of complexity.

writeArchive is primarily used by in-tree clients. A hypothetical client, even if exists, would unlikely create an archive with a new bitcode member that would cause the warning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but to some degree that's just an elaborate way of passing in a bool WarningsAreErrors argument, and maybe that's all that the client would want or need.

I disagree here - the important difference is that, not just the choice fo whether to continue, but the choice of where/how to render the warning is in the hands of the client - this is core to LLVM's API-centric design. If an API user is multithreaded and wants to buffer output, or render it differently (in a GUI), etc, they should be able to.

So I think we should avoid trying to design for hypothetical clients and instead add features on an as-needed basis, especially if the way to support the hypothetical client is not obvious. All in-tree clients want a warning printed to stderr, so that's what I implemented here.

I don't think we need to design for hypothetical clients - but hold to a fairly generic goal that LLVM's is designed as reusable library components and it's not too speculative/arbitrary to not print directly from a library because clients may have different needs for how their output is produced/rendered/handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to design for hypothetical clients - but hold to a fairly generic goal that LLVM's is designed as reusable library components and it's not too speculative/arbitrary to not print directly from a library because clients may have different needs for how their output is produced/rendered/handled.

I hear what you're saying but if we say "don't write to stderr from a library" then we need to answer "what should we do instead". If LLVM already had a diagnostic API with programmatic control over diagnostic levels and writeArchive were already using it then it would be a no-brainer to expose the warning via that API. But if that isn't actually implemented yet then at that point it becomes non-obvious what the right thing to do is, and we need to do something -- and that's when we start needing to think about what the client might need (i.e. "design for hypothetical clients"). If/when the actual client comes along that needs this we'd probably discover that what we designed was not sufficient and would need to redo the API for the client's needs, which would take more effort overall than just designing it for the actual client in the first place. And if a client doesn't care about any of this and just wants us to write to stderr they would need to change their usage code multiple times for no benefit to them.

I suppose the minimal thing we could do here is add a llvm::raw_ostream &Warn to writeArchive and that's where the warnings would go. A client that wants warnings as errors can error out and delete the output file if anything was written to the stream, a buffered client can save the output and write it in one go and a GUI can split on newlines. That seems fairly minimal, doesn't require writing dead code and should be easy enough to replace with something else if it becomes necessary. But really I think we should just wait for an actual client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like it, as I don't think a stream passed in is any better than passing in the suggested callback.

I'm sorry that you don't like it. I don't like it either (I don't think we'll find an API that all of us like, this seems like a very bikesheddable topic), but it seems like a reasonable compromise to me. The reason for the stream based API is that we know that Error is the wrong API for warning use cases because of its poor handling of contexts, so we should avoid propagating it into the caller. The stream is a minimal API extension that'll be easier to change later.

Having clients have to trawl through text to find the "warning: " text embedded in the original text, or somehow have to add it inline in the stream after each newline, or something else seems pretty messy compared to this ^.

That's not what a client would need to do. As I said, the messages added to the buffer will already contain "warning: ", so they will just print the buffer text as-is followed by "error: warnings treated as errors". That's not ideal but it's good enough, and if it doesn't work for someone they are welcome to propose their own patch with a better error handling API.

doesn't quite explain the /different/ context in the example - one with the full path name, one with the short name...

That's Buf.getBufferIdentifier() (full name) vs M.MemberName (short name). The latter isn't passed into getSymbolicFile but I suppose it could be passed in as well to make the warning a bit more consistent.

I wouldn't be averse to omitting the pass-through behavior on the first pass of this - and having warnings-as-errors be implemented entirely in the caller (they see a warning, emit it as an error (so I think the callback should still pass an Error) but provide no return value from the handler to allow it to be transformed into an error - caller can then record they emitted an error and combine that with any error result/failure code coming from the main API)

So when getSymbolicFile returns a warning it will do

Warn(createFileError(MemberName, ObjOrErr.takeError());
return nullptr;

and when it returns an error it will do

return ObjOrErr.takeError();

?

I'm not sure about that. If we only add the context sometimes it will read like a mistake. Another approach would be to remove the call to createFileError from the caller of getSymbolicFile and add calls to createFileError to getSymbolicFile whenever it creates an error or warning. So now warnings and errors look the same:

return errorOrValue(Warn(createFileError(MemberName, ObjOrErr.takeError())), nullptr); // warning

or

return createFileError(MemberName, ObjOrErr.takeError()); // error

But that's generally more error prone, it'll be easier to miss adding a call to createFileError especially if we follow the pattern elsewhere in the code to return warnings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have each layer of the callstack do the minimal context addition necessary so that context isn't lost? In this particular case, the getSymbolicFile function wouldn't need to do anything other than pass the error to the callback directly, the caller would add the member name via a wrapper around the callback, since it is the function that knows it, and then this process would proceed up the stack as more additional context is gained at each level (obviously cases in the stack that don't have any additional context to add don't need to wrap the callback themselves). We do something similar in the Object library in some places, e.g. here: in this case the lower code reports the low-level error, the code in the link adds information about the referencing section, and the calling code will ultimately add the file name, before it all ends up getting printed as a warning in llvm-readobj.

I acknowledge the situation is slightly different here, since we're distinguishing between warnings and errors, whereas llvm-readobj treats most things as warnings, so can continue to pass Error instances up the stack, but the overall approach should still apply, just using callbacks (with each layer wrapping the callback if needed), or even a string passed back upwards that is added to then ultimately passed to a callback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't quite explain the /different/ context in the example - one with the full path name, one with the short name...

That's Buf.getBufferIdentifier() (full name) vs M.MemberName (short name). The latter isn't passed into getSymbolicFile but I suppose it could be passed in as well to make the warning a bit more consistent.

I was thinking the opposite direction -if the current error handling doesn't have its context passed in, but attached later when going up the callstack, the analogous behavior with an warning callback would be to attach the context in the callback (at whatever layer was adding it to the error, we'd wrap the callback in a "context adding callback" as discussed in a few places in this thread)

But it does present issues if that warning can then be passed back out of the callback and become an error - which then gets the context added to it again.

I would be OK with not addressing that issue for now - having a callback of void(Error), doing the context wrapping as described ^ (so it has the same context as the error used to have, added in the same level in the call stack).

I think it's reasonable to leave the "if I want a werror-like behavior that also early-exits, I should figure out how to deal with context adding during warning without then duplicating that context addition when the warning-promoted-to-error gets propagated up the stack too" - I can think of a few possible solutions to that problem, but I think they're worth having as a separate discussion closer to the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same idea about wrapping the lambda but I'm not a fan of the amount of boilerplate needed when calling a function that might warn.

return createFileName(FileName, foo([&](Error Err) { Warn(createFileName(FileName, Err)); });

And you still need a post-facto fixup in the caller -- except now it's "was a callback called" and not "is a buffer empty". So I think this makes the code worse than raw_ostream &Warn because it's more subtle, more verbose in the callee and probably the caller, less intuitive, and shoehorns bad APIs into places where they don't fit, making it harder to remove them. But I guess it's not that much worse, it only happens in one place for now so I guess I'm not entirely opposed to doing that.

Side note, here is the usage code for an API that I would consider replacing Error with:

void foo(StringRef FileName, DiagHandler &Diag) {
  auto FDiag = Diag.withFileName(FileName);
  bar(FDiag);
  if (FDiag.hasErrors())
    return;
  baz(FDiag);
}

void bar(DiagHandler &Diag) {
   Diag.warn("some warning"); // could be promoted to error by Diag implementation
   Diag.error("some error");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Created using spr 1.3.6-beta.1
Copy link

github-actions bot commented Jul 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.6-beta.1

## Malformed bitcode object if the symbol table is not required for big archive.
## For big archives we print an error instead of a warning because the AIX linker
## presumably requires the index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diggerlin, could you confirm that the big archive format does require the archive symbol index, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EsmeYi / @hubert-reinterpretcast are either of you able to advise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with XCOFF, but it presumably requires the index.
I believe all of PE/COFF, XCOFF, Mach-O linkers require the index.

ELF linkers like lld and mold have ignored the index completely (https://maskray.me/blog/2022-01-16-archives-and-start-lib). lld's wasm port has followed up, but other ports keep using the index. (I do want to help, but changing the other ports has a very low priority in my task list...)

# llvm-nm will fail when it tries to read the malformed bitcode file, but
# it's supposed to print the archive map first, which in this case it
# won't because there won't be one.
# RUN: not llvm-nm --print-armap bad.a | count 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I thought a CHECK-FIRST directive was added to FileCheck, to allow that, but evidently not. If you did want to check that something is the first line though, I think you can do something like this:

# CHECK: {{^}}
# CHECK-SAME: thing I'm interested in

This works, because the {{^}} matches the start of the line, and since every line has a start, it matches specifically the first line. The CHECK-SAME then pins the thing it's checking to the previously-matched line, i.e. the first line.

(I am ambivalent on whether you should do this for this specific case, but I think it may be worth a comment explaining why you're doing this llvm-nm invocation here either way)

@@ -9,37 +10,52 @@
# RUN: llvm-as input.ll -o input.bc
# RUN: cp input.bc good.bc
# RUN: %python -c "with open('input.bc', 'a') as f: f.truncate(10)"
# RUN: not llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=ERR1
# RUN: llvm-ar rc bad.a input.bc 2>&1 | FileCheck %s --check-prefix=WARN1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I don't see this being addressed?

Created using spr 1.3.6-beta.1
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but probably worth giving it a bit for others to respond to the latest round of changes.


## Malformed bitcode object if the symbol table is not required for big archive.
## For big archives we print an error instead of a warning because the AIX linker
## presumably requires the index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EsmeYi / @hubert-reinterpretcast are either of you able to advise?

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error-handling-wise, WFM - thanks for your patience/iteration/discussion


## Malformed bitcode object if the symbol table is not required for big archive.
## For big archives we print an error instead of a warning because the AIX linker
## presumably requires the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with XCOFF, but it presumably requires the index.
I believe all of PE/COFF, XCOFF, Mach-O linkers require the index.

ELF linkers like lld and mold have ignored the index completely (https://maskray.me/blog/2022-01-16-archives-and-start-lib). lld's wasm port has followed up, but other ports keep using the index. (I do want to help, but changing the other ports has a very low priority in my task list...)

@pcc
Copy link
Contributor Author

pcc commented Jul 18, 2024

Let me land this as-is. The behavior for big archives is unchanged. If the AIX folks want to change the behavior, they can do that in a followup.

@pcc pcc merged commit c675a9b into main Jul 18, 2024
7 checks passed
@pcc pcc deleted the users/pcc/spr/object-dont-error-out-on-malformed-bitcode-files branch July 18, 2024 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants