# Headline libunwind miscalculates the length of the .eh_frame section and sometimes reads off the end of the containing segment, leading to segfaults and sadness. # Discovery We discovered this in a rust project at our company, the program would segfault deep in _Unwind_Backtrace but only in a few specific circumstances: * Compiling against the MUSL libc, which implies: * Static linking of all libraries * Using a static build of libunwind from LLVM (version 3.9) * Only one of our codebases * Attempts to reproduce in a more controlled environment have all failed * The codebase in question is **big** resulting in an 84Mb binary # Backtrace (courtesy of GDB + line number links reverse engineered... Rustc builds libunwind without debug symbols) ``` #0 0x0000000000fa733b in libunwind::LocalAddressSpace::get32(unsigned long) () https://github.com/llvm-mirror/libunwind/blob/release_39/src/AddressSpace.hpp#L131 #1 0x0000000000faa0a2 in libunwind::CFI_Parser<libunwind::LocalAddressSpace>::findFDE(libunwind::LocalAddressSpace&, unsigned long, unsigned long, unsigned int, unsigned long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info*, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*) () https://github.com/llvm-mirror/libunwind/blob/release_39/src/DwarfParser.hpp#L184 #2 0x0000000000fa983d in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::getInfoFromDwarfSection(unsigned long, libunwind::UnwindInfoSections const&, unsigned int) () https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L900 #3 0x0000000000fa923d in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) () https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L1241 #4 0x0000000000fa8fff in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() () #5 0x0000000000fa820e in unw_step () #6 0x0000000000fa6ac0 in _Unwind_Backtrace () ``` # Analysis * Looking at frame #1 it appears that, although `p < ehSectionEnd`, it points into invalid memory, playing around suggests that `ehSectionEnd` is massively beyond the range of readable memory. * Tracking back, we see that `ehSectionEnd` is `sects.dwarf_section_length` which is populated by the code at https://github.com/llvm-mirror/libunwind/blob/release_39/src/AddressSpace.hpp#L390-L441 (called from https://github.com/llvm-mirror/libunwind/blob/release_39/src/UnwindCursor.hpp#L1214). * This code is a little weird, and I'm not 100% it's doing anything sensible, but I've assumed it's doing the following: * Searching for a `LOAD` segment that contains the address we're interested in and remembering this segment's base address and length * Searching for a `GNU_EH_FRAME` segment, parsing it (assuming it contains exactly one `.eh_frame_ptr` section, which I guess is the spec for a segment of this type) to find the location of the `.eh_frame` section * Assuming that the `.eh_frame` points into the LOAD segment we found (not sure this is guaranteed, but maybe?) * Storing off the `.eh_frame` section's length by assuming it's the `LOAD` segment's `memsz` * This last bullet is where I think the bug lies, the `.eh_frame` is not always (never?) the first section in the `LOAD` segment, so the length of that section is strictly less than the size of the `LOAD` segment # Proposed fix Change https://github.com/llvm-mirror/libunwind/blob/release_39/src/AddressSpace.hpp#L434 to: cbdata->sects->dwarf_section_length = object_length - (cbdata->sects->dwarf_section - cbdata->sects->dso_base); This is better than the existing code (since the span from `cbdata->sects->dwarf_section` to `cbdata->sects->dwarf_section + cbdata->sects->dwarf_section_length` is all contained in the `LOAD` segment and is thus valid memory, but it's still not perfectly correct (as that span contains any sections after the `.eh_frame` section in the `LOAD` segment). I'm not sure there is a way to determine the end of a `.eh_frame` section, otherwise I'd recommend using that here. OTOH this change prevented the segfault I was seeing, without affecting backtrace generation. # Affected versions I found this on version 3.9, but the issue is still present in version 6.0 and seems to have been around since at least version 3.7 (and all versions in between). # Platform Linux Mint 18.1 Serena (based on Ubuntu 16.04 Xenial Xerus) on x86_64. Also seen on Centos 7 on x86_64. Segfaulting code compiled with Rustc version 1.22.0, 1.22.1, 1.23.0, 1.24.0-beta or 1.25.0-nightly. # Open Questions * Why doesn't this always crash? * Why do we only see this in `musl`-targeting Rust builds? * Is it simply that musl binaries are statically linked and thus bigger and this means the section layout is different? * Is there a way to spot the end of the `.eh_frame` section? * Is it always followed by a `.eh_frame_ptr` section? Can we distinguish the two? * Is there a "this section is XXX long" or "this section contains XXX records header?
More extensive write-up in https://github.com/rust-lang/rust/issues/47551#issuecomment-406709987. The address we're looking for has no unwind information (it's pretty much `_start` or similar and our `crt1.o` doesn't have unwind information). In this case, libunwind reads at least the first 32-bit number after the `.eh_frame` section and also might read another 32-bit number some (unspecified) distance beyond that... which might be in invalid memory space. I've realised that my proposed fix below is insufficient, it would allow libunwind to read random sections of the binary as if they were `.eh_frame` entries, normally this would be ok, but a sufficiently unlucky/carefully crafted binary might cause the `FDE` search to spuriously succeed within that section, leading to undefined behaviour. A proper fix (in the unwinder) is to set the section length to the length of the section, rather than the length of the segment. I'm not sure of the best (portable) way to do this, but I assume it's fairly easy, `readelf` and `dwarfdump` can both do it for example.
I suspect that the right fix for this issue is to link against a crtend*.o object file providing an .eh_frame terminator. i.e. Maybe this is really an Alpine Linux and/or rustc driver bug. I believe that the libgcc unwinder has similar behavior as libunwind. The length of .eh_frame is unknown, so the unwinder scans .eh_frame until it hits a terminator. I uploaded a proposed libunwind change that disables .eh_frame scanning when .eh_frame_hdr is present, but that might be too aggressive (D86256). Back in July, I poked at Alpine Linux a bit and posted this comment: https://github.com/rust-lang/rust/issues/47551#issuecomment-663800293 > I've realised that my proposed fix below is insufficient, it would > allow libunwind to read random sections of the binary as if they were > `.eh_frame` entries, normally this would be ok, but a sufficiently > unlucky/carefully crafted binary might cause the `FDE` search to > spuriously succeed within that section, leading to undefined > behaviour. There are some code paths that crash on invalid CIE/FDE entries. e.g.: - LocalAddressSpace::getULEB128 - LocalAddressSpace::getEncodedP - assert() in CFI_Parser<A>::parseCIE Even if the entries appear valid, libunwind can/will read beyond the dwarf_section_length mark: - CFI_Parser<A>::findFDE doesn't verify that nextCFI fits within ehSectionEnd. - LocalAddressSpace::getEncodedP ignores its `end` parameter, except for LEB encodings. For an LEB encoding, exceeding the end marker aborts the process.
I committed D86256, which: - fixes the end of .eh_frame calculation, when there is an fde hint, and - for PT_GNU_EH_FRAME, relies on the zero terminator instead of attempting to calculate the .eh_frame section length I think there was a rustc driver issue still, where rustc should link a crtend*.o file on Alpine Linux. FWIW, the section table isn't generally part of a PT_LOAD segment, which I think makes reading it in the unwinder infeasible. e.g. In some typical cases, perhaps the unwinder could open the dl_phdr_info::dlpi_name file, but I can think of many ways that would break (e.g. the vDSO or dynamic loader, filesystem changes, a DSO loaded from a file offset with android_dlopen_ext, etc).
> I committed D86256 [...] Sorry, I committed D87750, and abandoned D86256.