[RFC] Improve map-files for effective analysis and debugging

A map-file describes the image memory layout, along with symbols and sections from individual input files. A map-file is important for analyzing the output image layout. We believe map-files would be much more useful for analysis and debugging if they contained more information about the link.

The goal of the RFC is to discuss whether we should add more information in the map-file to make it more useful. To initiate the discussion, we will present examples and reasoning in favor of making more information about the build, particularly linking, available to users.

The new information does not necessarily have to be in a map-file. Emitting a new file by the linker with detailed information about the link can be explored as well. However, we think map-file is the best place to store this information as we have a precedent for making map-files more detailed. GNU LD map-file and many proprietary linkers map-files contains significantly more information than LLD map-file.

If we do decide to add more information in the map-file, we should be careful to not add too much information. Map-files should maintain a balance between conciseness and details. A map-file should be beneficial for both quick verification and in-depth debugging. Perhaps we can explore an option such as ā€˜-MapDetail’ that would add additional information in the map-file when required.

Motivation

Making map-file richer in detail has several benefits. Let’s see some of them.

  • Better understanding of the output image and linker decisions. With more information in the map-file, we can better understand the output image layout and linker decisions. For example, LLD map-file does not contain memory configuration, segment information and padding details.

  • Key component for debugging. LLD map-files lack information about input section descriptions, section properties and so on. This lack of information undermines the map-file potential as a key component for debugging. In contrast, GNU LD map-file provides much richer information. The following section will describe why additional information would be useful for debugging. A richer in detail map-file can significantly expedite the debugging of embedded images.

    A detailed map-file would also facilitate rich comparisons between layouts of different variants of the same project. This is particularly useful in analyzing code size regressions.

  • Empower both novices and experts to analyze builds and debug issues. A detailed map-file with clear documentation on how to navigate and utilize it will empower users to independently analyze and debug issues. It will also help non-expert users to gain a deeper understanding of the linker through practical experience.

Few examples of map-file improvements

Here, by improvements, I mean more information in the map-file that would be beneficial for analysis and debugging.

To keep the RFC brief and concise, this section only contains brief description of the improvements. We have created a gist map-file-improvements with more details and additional improvements. We have also added a GNU inspired map-file format there to help initiate discussion for a more detailed map-file format. Please visit the gist if you are interested in more details.

  • Input and output section properties: section types and section flags. Example: if an output section type changes from SHT_NOBITS to SHT_PROGBITS after a build change, it can be difficult to determine the exact cause of the change. By including section type information for both output and input sections in the map-file, we can easily identify which input section is responsible for changing the output section type to SHT_PROGBITS.

  • Input section description mapping with input sections. Example: if an input section ā€˜foo’, which is expected to match KEEP(*foo*), was unexpectedly garbage-collection, then the user may want to check which input section description foo is inadvertently being matched to.

  • Linker script expression evaluations. There is no clear way to debug linker script expression evaluations. For example, if a linker script assert is failing unexpectedly, then there is no way to debug why the assert is failing. One way to facilitate debugging of linker script expression evaluations is by annotating linker script expression along with their values in the map-file.

  • Padding information. There is no clear way for locating all the padding in the output image using the LLD map-file. A rogue input section with an invalid alignment might perturb an image badly. In such a case, it is beneficial to be able to quickly find all the paddings in the output image layout to make it easier to locate any undesirable padding. This information is accessible using the GNU ld map-file by searching for *fill* in the map-file.

Please let us know your thoughts on making the map-file more detailed.

2 Likes

You have managed to catch me just before I go on vacation for the week before Easter. Will have to keep this brief.

I’ve added @MaskRay as the LLD maintainer too.

The LLVM Embedded Toolchains working group is this Thursday (LLVM Embedded Toolchains Working Group sync up) you might want to join that call to discuss. I sadly won’t be there but I can catch up on the notes.

There’s been a group of people wanting to improve the LLD map file output at each of the most recent LLVM Dev meetings. I think the suggestions are broadly in line with suggestions at those meetings. What we’ve been lacking so far is someone with enough bandwidth to make an implementation and propose a patch.

Some thoughts for the discussion:

  • Should all of those extra outputs be enabled at once? Could people opt in to extra diagnostics to keep the Map file small? If this were the case each individual topic could be discussed individually with more focus.
  • The topic of a machine readable map file ouptut format has come up before. Would this be a good format for some of the additional information.
  • The expression evaluation may need more significant effort. I can remember a discussion suggesting that an AST format that is walked may be required to do this well.

Happy to participate in more detailed discussions, but would be good to separate that gist into smaller parts to make it easier to start.

Thanks for looking into this and providing your valuable feedback, @smithp35

Yes, the topic of a machine readable map file output format is definitely useful when you want to compare map files between builds.

We have a custom protobuf schema that we have used for storing map file information(relational schema) in a easy to use way.

We have found that google protobuf would be a good way to start here for machine readable formats (especially the wide range of language hooks).

We don’t want to introduce dependency on external libraries such as protobufs. LLVM already has JSON serialization/deserialization library, and a number of LLVM tools already support JSON as input/output format. We already discussed support for emitting map files in JSON format at past meetings and sync ups and there was a general agreement that this is the direction we want to take.

Regarding the expression evaluation, I don’t think this information belongs into a link map. Instead, we’d like to improve the linker script parser and introduce a better internal representation that could be serialized to a file (you can think of it as linker IR) to help with linker script debugging. I believe this would address the common issues you described.

Thanks @petrhosek

There are some advantages if we could choose binary map files using google protobufs.

  • Compresses pretty well when building large projects
  • Easy to debug issues and able to store varying amounts of optional information
  • Easy to check for toolchain compatibility and tools that rely on backwards compatibility
  • Easy to write tools such as summarizing information, computing differences of images
  • Create microservices using gRPC and hook them into modern IDE environments such as ā€œMS visual studio codeā€

I see that the linker map file is one source of truth to debug linker scripts and debug memory layout of an image.

Linkers do print the value of expressions but only print the final value, it would be useful to print the individual expressions and how the linker evaluated when a user wants to debug image layout.

Serializing the linker script to an linker script IR is an interesting idea.

I know the AMDGPU target uses msgpack for a binary metadata format. We actually used to have some protobuf code in the openmp project but I deleted it awhile back, so there’s no one that uses it AFAIK.

Proto3 has JSON encoding so if we design the JSON schema carefully to avoid unsupported constructs, you’ll be able to deserialize the generated JSON into a protobuf without needing a dependency on the protobuf library. This is the approach we used for GoogleTest JSON output.

Hi @smithp35

Thank you for the quick response.

You have managed to catch me just before I go on vacation for the week before Easter.

I am glad we were able to catch you :). I wish you a great vacation!

The LLVM Embedded Toolchains working group is this Thursday (LLVM Embedded Toolchains Working Group sync up ) you might want to join that call to discuss.

That’s great to know. Would it be possible to discuss this RFC in the upcoming embedded toolchain meeting? I can briefly present the RFC and some of the improvements to initiate the discussion for map-file format.

There’s been a group of people wanting to improve the LLD map file output at each of the most recent LLVM Dev meetings. I think the suggestions are broadly in line with suggestions at those meetings.

That’s great to know!

  • Should all of those extra outputs be enabled at once? Could people opt in to extra diagnostics to keep the Map file small? If this were the case each individual topic could be discussed individually with more focus.

Yes, we should definitely allow users to selectively enable which extra information they want in the map-file. We can perhaps have a core set of information which would always be part of the map-file and users can opt in for additional information by using options such as --MapDetail linker-script-expression, --MapDetail input-section-description and so on. I think if we go this route, then we should also add a wildcard option --MapDetail all for having map-file as detailed as possible.

  • The topic of a machine readable map file ouptut format has come up before. Would this be a good format for some of the additional information.

This sounds great! As @Shankar_Easwaram mentioned, we have a custom protobuf schema for storing map-file and making it more parse-friendly :). We have had good experience with the protobuf for this so far. This being said, we should definitely explore and discuss more options.

  • The expression evaluation may need more significant effort. I can remember a discussion suggesting that an AST format that is walked may be required to do this well.

Yes, expression evaluation would need significant effort, but in our experience, it’s definitely worth the efforts. Having this information significantly speeds up debugging time in some cases.

Thank you for mentioning about the AST format. This sounds interesting.

Thank you for your feedback.

That’s great to know. Would it be possible to discuss this RFC in the upcoming embedded toolchain meeting? I can briefly present the RFC and some of the improvements to initiate the discussion for map-file format.

Should be possible. Assuming the meeting is on (day before Easter Friday) and the agenda isn’t already full @voltur01 can add it.

Nice to see a summary of improvements that could be made to the -Map file output.

One requirement that I would like to point out is performance. We have customers who always generate a map file when linking and whilst more features are useful we wouldn’t want to degrade performance (actually we would like to improve it!).

We also have a proprietary binary map file section that we output to linked ELF files in our downstream toolchain to facilitate hot-patching. We did discuss upstreaming this previously: [llvm-dev] [RFC] Debug sections for hot-patching LLD's ELF output. IMO a nice implementation would be that the map file information is efficiently stored in binary form to a section in the linked ELF file. Then customers could use external tools to generate text ā€œviewsā€ from that data (for example the current -Map file output form).

FYI I added to the WG sync up agenda LLVM Embedded Toolchains Working Group call this Thursday, Mar 28th

Coming from a server background, I lack experience with embedded programming. This might make me unique in folks involved in this discussion thread.
While using -Map has been helpful with large binaries, I know when to rely on a debugger instead of getting lost in linker output.

I agree that adding information to the map file can be valuable. However, coming from a performance-focused background, I’m cautious about linker complexity and output verbosity.
There’s a sweet spot where adding more details might not be as beneficial.

Yes, expression evaluation would need significant effort, but in our experience, it’s definitely worth the efforts. Having this information significantly speeds up debugging time in some cases.

The expression evaluation sounds valuable for certain tasks, but the internal format (using Expr = std::function<ExprValue()>;) imposes challenging to dump intermediate results.
I’m concerned about the trade-off between complexity and potential benefit.

#SHT_PROGBITS,SHF_ALLOC|SHF_EXECINSTR,16

The section type and flags look very verbose to me. If this really needs to be dumped, perhaps just dump input sections with different type/flags from the output section. I am less certain about the value of alignments.

We did discuss upstreaming this previously: [llvm-dev] [RFC] Debug sections for hot-patching LLD’s ELF output . IMO a nice implementation would be that the map file information is efficiently stored in binary form to a section in the linked ELF file.

Thank you for sharing this @bd1976bris. This was an interesting read. Storing the map file as a debug section in the elf file can certainly be beneficial. If we write a binary map file to an external file (instead of a text map file), then would we get similar performance benefits as writing the map file in a debug section in the elf file?

One issue with storing map file as a debug section in the output elf file is that it can significantly increase the output elf size. For large projects, map-file can take up a lot of storage space. I know we can always strip the section out when needed, but it might be a little inconvenient at times. It’s also nice and handy to directly use the favorite editor to read the map-file without any required processing. This being said, this approach certainly have nice benefits as well. I really liked the implicit association of the ELF and the map, which we get from this idea.

Coming from a server background, I lack experience with embedded programming. This might make me unique in folks involved in this discussion thread.

A different background always helps in discussions :).

The linker output can help in discovering the source of the issues much more quickly. It’s also easier than using a debugger for understanding linker decisions.

The expression evaluation sounds valuable for certain tasks, but the internal format (using Expr = std::function<ExprValue()>; ) imposes challenging to dump intermediate results.
I’m concerned about the trade-off between complexity and potential benefit.

I think it’s a valid concern and definitely warrants a discussion. I think the benefit is significant here. This should be discussed in detail.

The section type and flags look very verbose to me. If this really needs to be dumped, perhaps just dump input sections with different type/flags from the output section.

It’s verbose but it’s also very helpful. We can discuss just dumping sections with different type or flags. I think it would be nice to have the complete information at hand for quick verification.

Hi everyone,

I am updating here with the details from this RFC discussion at the last Embedded Toolchain Meeting (28th March 2024). First, I want to thank everyone who was present in the discussion for their time and @voltur01 for allowing me to present the RFC.

There were many interesting questions and points raised during the discussion.
We discussed three main topics:

  • Where to put detailed link information?
  • Map file format
  • Map file structure

Regarding ā€˜Where to put detailed link information?’, we briefly discussed the viable options: map file, a completely new file, and a separate section such ā€˜.linkmap’ in the output file as suggested by @bd1976bris in the comment above. The consensus was to use a map file for the detailed link information. Someone suggested that we can perhaps explore different files for different information.

Regarding ā€˜Map file structure’, we need to carefully decide what information to include in the map file. We need to discuss it further. We also discussed conditionally reporting information in the map file so that the map file stays concise, and we only emit computationally expensive information if requested. There was a general agreement for this.

Regarding ā€˜Map file format’, the consensus was to use JSON format for the map file. Someone suggested that we do not necessarily need to have one map file format that satisfies both human-readability and machine-parse-friendly requirements, and perhaps we can support two different map file formats for the two needs. I think it’s a good idea and we should explore it further.

I think the immediate action items for us in the upcoming Embedded Toolchain meeting would be to discuss and decide what all information should be always present in the map file and which information should be made available conditionally.

Hi everyone,

I am updating here with the discussion highlights about this RFC at the last Embedded Toolchain Meeting (25th April 2024).

Map file format
We have decided to use JSON format as the native LLD map file format. LLVM will provide utility scripts for transforming JSON map files to the old LLD map file format and GNU LD map file format. These utility scripts will also demonstrate to users how to effectively extract information and transform the new JSON map file.

Details to add in the map file
The first goal is to make the new map file format details on par with the GNU LD map file details. This is a good start, as the GNU LD map file is richer in detail than the (old) LLD map file. Any further information should be discussed separately before adding it to the map file.

Linker script expression annotation

Annotating linker script expression with the values in the map file is accepted to be a valuable functionality. This feature will be implemented after the LLD linker script expression has been redesigned to be more friendly for diagnostic purposes.

The LLD’s current lambda-based linker script expression evaluation design is not optimal for diagnostics and retrieving intermediate expression values. @petrhosek has mentioned that there is a plan to re-do the LLD parser to create proper AST, and then build IR. This new design will facilitate the annotation of linker script expressions with the values in the map file.

Tools version and the link command-line

We decided to add the tools version and the link command line information to the new map file. It’s one of the information which will be in the new map file but is not in the GNU LD map file.

Conditionally emit some information in the map file

We decided to have a functionality to conditionally emit (expensive to compute) information by specifying such information using command-line options.


@MaskRay Please let us know your thoughts on these decisions.

LG.
The link map has no compatibility goal, but we can provide scripts (not in C++, not linked into the executable) to help users to migrate.

Thanks for suggesting we align the new map file format with GNU ld. I agree we shall have a richer format providing information closer to GNU ld.
Let’s prioritize the most relevant information but not blindly copy everything GNU ld’s map files provide.

I see potential benefits in rearchitecting ExprValue/Expr to use an AST instead of std::function.
This investigation is valuable, but I also want to ensure we carefully consider the complexity involved, particularly about ā€œbuilding an IRā€.
A more concrete analysis of the trade-offs would be helpful.

LG.

I agree. The option can support + separated values --xxx=x+y+z+w for a curated list of optional information.

Hi All. Thanks the proposal. My understanding is that the current GNU-like output format will be removed? I am unsure about the consequences of this, it seems that we have many users of the -Map file output in Sony. Our team will provide feedback here as soon as we can on this.

There may be some more information in the meeting notes: LLVM Embedded Toolchains Working Group sync up - #62 by voltur01

One proposal was that if lld had a JSON output format for the map file then the existing lld map file could switch to GNU ld format, with a conversion script from JSON to ā€œoldā€ lld format for those that preferred/relied the old format. I expect retaining the existing lld format wouldn’t be too onerous, but may help with maintenance in the long term.