Hines CompilingAndroidKeynote
Hines CompilingAndroidKeynote
*This was/is a really HUGE effort by many other people/teams/companies. We are just the messengers. :)
Making large changes is an adventure
● Change via decree/mandate can work, …
● But we found it much easier to build up through sub-quests.
○ Initial Clang/LLVM work was not intending to replace GCC.
○ Eventually, a small group of people saw change as the only reasonable path forward.
○ Small, incremental improvements/changes are easier.
○ Got partners, vendors, and even teams from other parts of Google involved early.
○ Eventually, the end goal was clear:
■ “It’s time to have just one compiler for Android. One that can help find (and mitigate)
security problems.”
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Grow your support
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
A Brief History of LLVM and Android
● 2010 — RenderScript project begins
○ Used LLVM bitcode as portable IR (despite repeated warnings NOT to). :P
○ On-device bitcode JIT (later becomes AOT, but actual code generation is done on device).
○ Uses same LLVM on-device as for building host code with Clang/LLVM - we <3 bootstrapping!
● March 2012 — LOCAL_CLANG appears (Gitiles).
○ Compiler-rt (for ASan), libpng, and OpenSSL are among the first users.
○ Other users appear as extension-related ABI issues spring up.
● April 2014 — Clang for platform != LLVM on-device (AOSP / Gitiles).
● July 2014 — All host builds use Clang (AOSP / Gitiles).
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
LOCAL_CLANG
● Flag for Android’s build system.
● If set to true, use Clang to compile this module.
● If not defined, use the regular compiler.
● Pretty simple, right?
● If set to false, use GCC to compile this module.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
LOCAL_CLANG := false
● Need to retain some instances of GCC-specific testing.
○ Bionic (libc) needed to check that headers/libraries could still work for native application
developers using GCC (NDK).
● Some tests were a little too dependent on GCC implementation details:
○ __stack_chk_guard explicitly extern-ed in and mutated in bionic (libc) tests!
● Other areas where we just didn’t know how to fix bugs yet.
○ Valgrind was the last instance of this escape to be fixed in AOSP.
■ Wrong clobbers for inline assembly in 1 case.
■ ABI + runtime library issues (we’ll chat about aeabi later).
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Escape hatches are vital
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Escape hatches are vital
● If we had to turn off Clang entirely each time we hit a bug, none of us would
be here right now.
● We would be chained to our desk fixing bugs still.
● Lots of people working on this makes it parallel, so long as everyone can
make progress — all or nothing is a bottleneck you can’t afford.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Two Builds for the Price of Two
● A simultaneous, obvious extension of LOCAL_CLANG was the concept of the
default platform build.
● Original default was GCC.
● We were eventually able to set up a separate build target (actually multiple
device targets) that used Clang as the default toolchain.
● Why didn’t we do this first?
○ Because devices didn’t boot with Clang...
○ And many things didn’t even compile successfully with Clang!
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Example: aeabi functions
void __aeabi_memcpy(void *dest, void *src, int size) // Please ignore the ‘int’. ;)
{
memcpy(dest, src, size);
}
● Looks pretty harmless, but GCC and Clang treat Android ABI differently, at
least for lowering calls to the runtime memcpy (RTLIB:MEMCPY).
void __aeabi_memcpy(void *dest, void *src, int size)
{
__aeabi_memcpy(dest, src, size); // Infinite loop!!!
}
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Side-by-side builds are great
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Side-by-side builds are great
● The ability to measure and “compare” things is why software engineering isn’t
just an art*.
○ Correctness/Conformance Testing
○ Code size
○ Performance
○ …
● Helped prevent early regressions — compiler-dependent build breaks go to
code submitters, and not just the wacky toolchain folks.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Bugs happen ...
Sometimes it is the compiler
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Assembly parsing is hard
● What does the following assembly code do?
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Undefined Behavior
● Signed integer overflow :(
○ -fwrapv makes this defined.
○ Can expose other bugs (in addition to harming performance).
● Nonnull manifested a few ways in Android:
○ Removing this checks in Binder. (AOSP / Gitiles)
■ sp<IBinder> IInterface::asBinder()
{
return this ? onAsBinder() : NULL;
}
■ Except people had been calling (nullptr)->asBinder() in lots of places.
● Further cleanup replaced this with a static method. (AOSP / Gitiles)
○ // src == nullptr
if (!src || !dst) size = 0;
memcpy(dst, src, size);
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Inline Assembly Revisited
● Legacy wrapper functions:
○ Do some minor action up front.
○ Pass existing caller arguments through to another (possibly tail) call.
○ Maybe return a different value (always 0 in these cases).
● Input/Output/Clobber constraints might not matter until one day the compiler
says that they do. (AOSP / Gitiles)
● SWEs work to make the compiler happy, even if it isn’t correct (enough).
○ Clang stomped all the arguments/returns for the inline assembly, while GCC didn’t bother
touching any of the argument/return registers.
○ Nobody noticed until we tried to switch to Clang.
○ Even a GCC update or slight change to the source files (due to inlining) could have caused a
bug that would likely be misattributed as a “miscompile”.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Lots of empathy for other teams
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Lots of empathy for other teams
● They are going to have undefined behavior.
● They are going to have general bugs that got exposed by the transition.
● They need support, not an adversary. C++ is a worthy enough adversary for all
of us.
● You’re going to want their empathy/understanding when it is a compiler bug.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
A Continued History of LLVM and Android
● 2012 - 2016 — Everything you just saw.
● December 2014 — First side-by-side (mostly) Clang build for Nexus 5.
● January 2016 — Android Platform defaults to Clang.
● April 2016 — 99% Android Platform Clang (valgrind was the last!)
● August 2016 — Forbid non-Clang builds (AOSP / Gitiles).
○ Whitelist for legacy projects (started in AOSP / Gitiles).
● October 2016 — 100% Clang userland for Google Pixel.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
The Platform Numbers
● 597 git projects in aosp/master (10/18/2017).
○ 37M LOC C/C++ source/header files in aosp/master alone.
○ 2M LOC assembly additional!
○ 25.3M LOC of C/C++ is in aosp/master external/*.
The above data was generated using David A. Wheeler's 'SLOCCount' on a fresh checkout of aosp/master. It does not include
duplicates or generated source files either.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
BONUS - How to deprecate something in a short time!
● STLPort (a C++ runtime library) was a blocker for switching to Clang (and
libc++).
● “Unbundled” Android 1st party apps didn’t want to switch to libc++/Clang.
● It’s hard to incentivize good behavior.
○ “Nothing really changes”, maintenance is viewed as “unnecessary churn”, ...
○ But we want/need to remove deprecated components in a reasonable timeframe.
○ Sound familiar yet? This story probably resonates with many of us here.
● Enter the “Sleep Constructor”.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
The Sleep Constructor
__attribute__((constructor))
void incentivize_stlport_users() {
ALOGE("Hi! I see you're still using stlport. Please stop doing that.\n");
ALOGE("All you have to do is delete the stlport lines from your makefile\n");
ALOGE("and then you'll get the shiny new libc++\n");
sleep(8);
}
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Platform Takeaways
● Grow your support.
● Escape hatches are vital!
● Side-by-side builds are great.
● Bugs happen — Sometimes it is the compiler.
○ People are going to be upset when this happens, so ...
● Lots of empathy for other teams
○ s/other teams/everyone/ for when it is actually the compiler.
● When being nice fails — Sleep Constructor!
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem
Linux Kernel in 2014/2015
● Patches provided by LLVMLinux (http://llvm.linuxfoundation.org)
● Some work upstreamed
● Large out-of-tree patchstack, last updated in January 2015
(source: include/linux/llist.h)
Sometimes It’s the Kernel ...
Loop only terminates if pointer underflow and pointer overflow cancel each other
out. Not defined behavior!
(source: kernel/futex.c)
… But Sometimes It’s the Compiler
Clang assigns the NULL constant to a register that can’t be loaded from:
CC kernel/futex.o
/tmp/futex-f1b216.s: Assembler messages:
/tmp/futex-f1b216.s:14498: Error: integer 64-bit register expected at operand 2
-- `prfm pstl1strm,[xzr]'
/tmp/futex-f1b216.s:14499: Error: operand 2 should be an address with base
register (no offset) -- `ldxr w12,[xzr]'
/tmp/futex-f1b216.s:14502: Error: operand 3 should be an address with base
register (no offset) -- `stlxr w13,w10,[xzr]'
● One out-of-tree patch still needed for ARM64 (LLVM bug 30792)
● Production ready?
Pixel 2
Benefits
● Consistent toolchain for kernel and userspace
● LLVM development beyond critical mass
● Better static analysis + dynamic analysis (sanitizers)
○ Sanitizers developed first in LLVM, have significantly more features
○ KASAN+ramdumps helps A LOT, recommended for dedicated dogfooders
● Additional compiler warning flag coverage
● More tools planned in the future (control-flow integrity, LTO, PGO)
● Shake out undefined behaviors
● Improve both kernel and compiler code bases
LLVM bugs found/hit from Linux Kernel
effort
● [AArch64] -mgeneral-regs-only inconsistent with gcc
● false(?) -Wsequence warning
● typeof(const members of struct), -std=gnu89, and -Wduplicate-decl-specifier
vs gcc7.1
● Wrong relocation type in relocatable LTO link
● Clang integrated assembler doesn't accept asm macro defined in one asm
directive and used in another
● Invalid LDR instruction with XZR
New warnings for our kernel (that found bugs)
● -Wlogical-not-parenthesis
● -Warray-bounds
● -Wunused-function
● -Wimplicit-enum-conversion
● -Wformat-extra-args
● -Wframe-larger-than=
● -Wignored-attributes
● -Wduplicate-decl-specifier
● -Wshift-overflow
● -Wself-assign
● -Wsection
● -Wtautological-pointer-compare
● -Wparentheses-equality
● -Wenum-conversion
● -Wliteral-conversion
● -Wheader-guard
● -Wnon-literal-null-conversion
● -Waddress-of-packed-member disabled :(
* 4.4 and 4.9 LTS Chromium/Android forks, ToT (4.14-rc5) (assuming no one broke anything since this morning)
§ ARCH=arm64 || ARCH=x86_64
Testing
● Presubmit (compile+boot tests)
○ Clang
○ GCC
○ KASAN
○ lint
● Postsubmit
○ fuzzing
○ regression testing
Try it today!
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && \