0% found this document useful (0 votes)
87 views55 pages

CompilerTalk 2019

Uploaded by

javibds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views55 pages

CompilerTalk 2019

Uploaded by

javibds
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

21 compilers and

3 orders of magnitude
in 60 minutes
a wander through a weird landscape to the heart of compilation

Spring 2019

!1
Hello!
• I am someone who has
worked (for pay!) on some
compilers: rustc, swiftc, gcc,
clang, llvm, tracemonkey, etc.

• Ron asked if I could talk about


compiler stuff I know, give
perspective on the field a bit.

• This is for students who


already know roughly how to
write compilers, not going to
cover that!

!2
the speaker, in 1979
I like compilers!

• Relationship akin to "child


with many toy dinosaurs".

• Some are bigger and scarier.


We will look at them first.

• Some are weird and


wonderful. We will visit them
along the way.

• Some are really tiny!

!3
Borrowsaur fighting a Thunkasaur
Goal for talk
• I expect gap between class
projects and industrial
compilers is overwhelming.

• Want to explore space


between, demystify and make
more design choices clear.

• Reduce terror, spark curiosity,


encourage trying it as career!

• If I can compiler, so can you!

!4
Plan of talk

• Describe a few of the giants.

• Talk a bit about what makes them so huge & complex.

• Wander through the wilderness (including history) looking


for ways compilers can vary, and examining specimens.

• Also just point out stuff I think is cool / underappreciated.

!5
Caveats
• I'm not a teacher or very good at giving talks.

• Lots of material, not ideal to stop for questions unless


you're absolutely lost. Gotta keep pace!

• But: time at end for questions and/or email followup.


Happy to return to things you're curious about. Slides are
numbered! Jot down any you want to ask about.

• Apologies: not as much industry-talk as I promised. Will


try for some. But too many dinosaurs for show and tell!

!6
Part 1: some giants

!7
Specimen #1

Clang
• ~2m lines of C++: 800k lines LValue CodeGenFunction::EmitLValue(const Expr *E) {
ApplyDebugLocation DL(*this, E);

clang plus 1.2m LLVM. Self


switch (E->getStmtClass()) {
default: return EmitUnsupportedLValue(E, "l-value expression");

hosting, bootstrapped from GCC.


case Expr::ObjCPropertyRefExprClass:
llvm_unreachable("cannot emit a property reference directly");

case Expr::ObjCSelectorExprClass:
return EmitObjCSelectorLValue(cast<ObjCSelectorExpr>(E));


case Expr::ObjCIsaExprClass:
C-language family (C, C++, return EmitObjCIsaExpr(cast<ObjCIsaExpr>(E));
case Expr::BinaryOperatorClass:

ObjC), multi-target (23).

return EmitBinaryOperatorLValue(cast<BinaryOperator>(E));
case Expr::CompoundAssignOperatorClass: {
QualType Ty = E->getType();
if (const AtomicType *AT = Ty->getAs<AtomicType>())
Ty = AT->getValueType();


if (!Ty->isAnyComplexType())

Single AST + LLVM IR.


}
return EmitCompoundAssignmentLValue(cast<CompoundAssignOperator>(E));
return EmitComplexCompoundAssignmentLValue(cast<CompoundAssignOperator>(E));

case Expr::CallExprClass:
case Expr::CXXMemberCallExprClass:

• 2007-now, large multi-org team.

case Expr::CXXOperatorCallExprClass:
case Expr::UserDefinedLiteralClass:
return EmitCallExprLValue(cast<CallExpr>(E));

• Good diagnostics, fast code.

• Originally Apple, more


permissively licensed than GCC.

!8
Specimen #2

Swiftc
• ~530k lines of C++ plus 2m RValue RValueEmitter::visitIfExpr(IfExpr *E, SGFContext C) {
auto &lowering = SGF.getTypeLowering(E->getType());

lines clang and LLVM. Many if (lowering.isLoadable() || !SGF.silConv.useLoweredAddresses()) {


// If the result is loadable, emit each branch and forward its result
// into the destination block argument.
same authors. Not self-hosting.
Condition cond = SGF.emitCondition(E->getCondExpr(),
/*invertCondition*/ false,
SGF.getLoweredType(E->getType()),
NumTrueTaken, NumFalseTaken);
cond.enterTrue(SGF);

• Newer app-dev language.

SILValue trueValue;
{
auto TE = E->getThenExpr();
FullExpr trueScope(SGF.Cleanups, CleanupLocation(TE));
trueValue = visit(TE).forwardAsSingleValue(SGF, TE);
}

• Tightly integrated with clang,


cond.exitTrue(SGF, trueValue);

cond.enterFalse(SGF);

interop with C/ObjC libraries.

SILValue falseValue;
{
auto EE = E->getElseExpr();
FullExpr falseScope(SGF.Cleanups, CleanupLocation(EE));
falseValue = visit(EE).forwardAsSingleValue(SGF, EE);


}

Extra SIL IR for optimizations.


cond.exitFalse(SGF, falseValue);

• Multi-target, via LLVM.

• 2014-now, mostly Apple.

!9
Specimen #3

Rustc
• ~360k lines of Rust, plus 1.2m fn expr_as_rvalue(
&mut self,

lines LLVM. Self-hosting, mut block: BasicBlock,


scope: Option<region::Scope>,
expr: Expr<'tcx>,

bootstrapped from OCaml.


) -> BlockAnd<Rvalue<'tcx>> {
debug!(
"expr_as_rvalue(block={:?}, scope={:?}, expr={:?})",
block, scope, expr
);

• Newer systems language.


let this = self;
let expr_span = expr.span;
let source_info = this.source_info(expr_span);


match expr.kind {
Two extra IRs (HIR, MIR).
ExprKind::Scope {
region_scope,
lint_level,
value,
} => {

• Multi-target, via LLVM.


let region_scope = (region_scope, source_info);
this.in_scope(region_scope, lint_level, block, |this| {

})
this.as_rvalue(block, scope, value)

• 2010-now, large multi-org team.

• Originally mostly Mozilla. And yes


I did a lot of the initial bring-up so
my name is attached to it forever;
glad it worked out!

!10
Aside: what is this "LLVM"?
• Notice the last 3 languages all end in
LLVM. "Low Level Virtual Machine"
https://github.com/llvm/llvm-project

• Strongly typed IR, serialization format,


library of optimizations, lowerings to
many target architectures.

• "One-stop-shop" for compiler backends.

• 2003-now, UIUC at first, many industrial


contributors now.

• Longstanding dream of compiler


engineering world, possibly most
successful attempt at it yet.

• Here is a funny diagram of modern


compilers from Andi McClure (https://
runhello.com/)

!11
Specimen #4

GNU Compiler Collection (GCC)


• ~2.2m lines of mostly C, C++. 600k static int
find_reusable_reload (rtx *p_in, rtx out, enum reg_class rclass,
lines Ada. Self-hosting, bootstrapped {
enum reload_type type, int opnum, int dont_share)

from other C compilers.

rtx in = *p_in;
int i;

if (earlyclobber_operand_p (out))
return n_reloads;

• Multi-language (C, C++, ObjC, Ada, D, for (i = 0; i < n_reloads; i++)


if ((reg_class_subset_p (rclass, rld[i].rclass)
Go, Fortran), multi-target (21).
|| reg_class_subset_p (rld[i].rclass, rclass))
/* If the existing reload has a register, it must fit our class. */
&& (rld[i].reg_rtx == 0
|| TEST_HARD_REG_BIT (reg_class_contents[(int) rclass],

• Language & target-agnostic TREE AST


true_regnum (rld[i].reg_rtx)))
&& ((in != 0 && MATCHES (rld[i].in, in) && ! dont_share
&& (out == 0 || rld[i].out == 0 || MATCHES (rld[i].out, out)))
and RTL IR. Challenging to work on.
|| (out != 0 && MATCHES (rld[i].out, out)
&& (in == 0 || rld[i].in == 0 || MATCHES (rld[i].in, in))))
&& (rld[i].out == 0 || ! earlyclobber_operand_p (rld[i].out))
&& (small_register_class_p (rclass)


|| targetm.small_register_classes_for_mode_p (VOIDmode))
1987-present, large multi-org team.
&& MERGABLE_RELOADS (type, rld[i].when_needed, opnum, rld[i].opnum))
return i;

• Generates quite fast code.

• Originally political project to free


software from proprietary vendors.
Licensed somewhat protectively.

!12
Part 2: why so big?

!13
Size and economics
• Compilers get big because the development costs are seen as
justified by the benefits, at least to the people paying the bills.

• Developer productivity: highly expressive languages, extensive


diagnostics, IDE integration, legacy interop.

• Every drop of runtime performance: shipping on billions of


devices or gigantic multi-warehouse fleets.

• Covering & exploiting all the hardware: someone makes a new


chip, they pay for an industrial compiler to make use of it.

• Writing compilers in verbose languages: for all the usual reasons


(compatibility, performance, familiarity).

!14
Tradeoffs and balance
• This is ok!

• The costs and benefits are context dependent.

• Different contexts, weightings: different compilers.

• Remainder of talk will be exploring those differences.

• Always remember: balancing cost tradeoffs by context.

• Totally biased subset of systems: stuff I think is interesting


and worth knowing, might give hope / inspire curiosity.

!15
Part 3: variations
(this part is much longer)

!16
Variation #1

Fewer optimizations

• In some contexts, "all the optimizations" is too much.

• Too slow to compile, too much memory, too much


development / maintenance effort, too inflexible.

• Common in JITs, or languages with lots of indirection


anyways (dynamic dispatch, pointer chasing): optimizer
can't do too well anyways.

!17
Proebsting's law
• "Compiler Advances Double
Computing Power Every 18 Years"

• Sarcastic joke / real comparison to


Moore's law: hardware doubles
power every 18 months. Swamps
compilers.

• Empirical observation though!


Optimizations seem to only win
~3-5x, after 60+ years of work.

• Less-true as language gains more


abstractions to eliminate (i.e.
specialize / de-virtualize). More
true if lower-level. Scott, Kevin. On Proebsting's Law. 2001

!18
Frances Allen
Got All The Good Ones
• 1971: "A Catalogue of
Optimizing Transformations".

• The ~8 passes to write if


you're going to bother.

• Inline, Unroll (& Vectorize),


CSE, DCE, Code Motion,
Constant Fold, Peephole.

• That's it. You're welcome.

• Many compilers just do those,


get ~80% best-case perf.
https://commons.wikimedia.org/wiki/File:Allen_mg_2528-3750K-b.jpg - CC BY-SA 2.0

!19
Specimen #5

V8
• 660k lines C++ including backends. Not // Shared routine for word comparison against zero.
void InstructionSelector::VisitWordCompareZero(Node* user, Node* value,
self-hosting.
FlagsContinuation* cont) {
// Try to combine with comparisons against 0 by simply inverting the branch.
while (value->opcode() == IrOpcode::kWord32Equal && CanCover(user, value)) {
Int32BinopMatcher m(value);

• JavaScript compiler in Chrome, Node.

if (!m.right().Is(0)) break;

user = value;
value = m.left().node();
cont->Negate();

• Multi-target (7), multi-tier JIT. }

if (CanCover(user, value)) {
Optimizations mix of classical stuff and switch (value->opcode()) {
case IrOpcode::kWord32Equal:
dynamic language stuff from Smalltalk.
cont->OverwriteAndNegateIfEqual(kEqual);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kInt32LessThan:
cont->OverwriteAndNegateIfEqual(kSignedLessThan);

• Multiple generations of optimization and return VisitWordCompare(this, value, kX64Cmp32, cont);


case IrOpcode::kInt32LessThanOrEqual:
cont->OverwriteAndNegateIfEqual(kSignedLessThanOrEqual);
IRs. Always adjusting for sweet spot of return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kUint32LessThan:
runtime perf vs. compile time, memory, cont->OverwriteAndNegateIfEqual(kUnsignedLessThan);
return VisitWordCompare(this, value, kX64Cmp32, cont);
case IrOpcode::kUint32LessThanOrEqual:
maintenance cost, etc.
cont->OverwriteAndNegateIfEqual(kUnsignedLessThanOrEqual);
return VisitWordCompare(this, value, kX64Cmp32, cont);

• Recently added slower (non-JIT)


interpreter tier, removed others.

• 2008-present, mostly Google, open


source.

!20
Variation #2

Compiler-friendly implementation
(and input) languages
• Note: your textbook has 3 implementation flavours. Java, C,
ML. No coincidence.

• ML designed as implementation language for symbolic logic


(expression-tree wrangling) system: LCF (1972).

• LCF written in Lisp. Lisp also designed as implementation


language for symbolic logic system: Advice Taker (1959).

• Various family members: Haskell, OCaml, Scheme, Racket.

• All really good at defining and manipulating trees. ASTs, types,


IRs, etc. Usually make for much smaller/simpler compilers.

!21
Specimen #6

Glasgow Haskell Compiler (GHC)


• 180k lines Haskell, self-
stmtToInstrs :: CmmNode e x -> NatM InstrBlock
stmtToInstrs stmt = do
dflags <- getDynFlags

hosting, bootstrapped from


is32Bit <- is32BitPlatform
case stmt of
CmmComment s -> return (unitOL (COMMENT s))
CmmTick {} -> return nilOL
Chalmers Lazy ML compiler.
CmmUnwind regs -> do
let to_unwind_entry :: (GlobalReg, Maybe CmmExpr) -> UnwindTable
to_unwind_entry (reg, expr) = M.singleton reg (fmap toUnwindExpr expr)
case foldMap to_unwind_entry regs of


tbl | M.null tbl -> return nilOL
Pure-functional language, very | otherwise -> do
lbl <- mkAsmTempLabel <$> getUniqueM
return $ unitOL $ UNWIND lbl tbl
advanced type-system.
CmmAssign reg src
| isFloatType ty -> assignReg_FltCode format reg src
| is32Bit && isWord64 ty -> assignReg_I64Code reg src
| otherwise -> assignReg_IntCode format reg src


where ty = cmmRegType dflags reg
Several tidy IRs after AST: format = cmmTypeFormat ty

CmmStore addr src

Core, STG, CMM. Custom | isFloatType ty -> assignMem_FltCode format addr src
| is32Bit && isWord64 ty -> assignMem_I64Code addr src
| otherwise -> assignMem_IntCode format addr src

backends.
where ty = cmmExprType dflags src
format = cmmTypeFormat ty

• 1991-now, initially academic


researchers, lately Microsoft
after they were hired there.

!22
Specimen #7

Chez Scheme
• 87k lines Scheme (a Lisp), self- (define asm-size
(lambda (x)

hosting, bootstrapped from C- (case (car x)


[(asm) 0]
[(byte) 1]

Scheme.
[(word) 2]
[else 4])))

(define asm-move
(lambda (code* dest src)

• 4 targets, good performance,


(Trivit (dest src)
(record-case src
[(imm) (n)

incremental compilation.
(if (and (eqv? n 0) (record-case dest [(reg) r #t] [else #f]))
(emit xor dest dest code*)
(emit movi src dest code*))]
[(literal) stuff (emit movi src dest code*)]
[else (emit mov src dest code*)]))))

• Written on "nanopass framework" (define-who asm-move/extend


(lambda (op)

for compilers with many similar (lambda (code* dest src)


(Trivit (dest src)
(case op
IRs. Chez has 27 different IRs!
[(sext8) (emit movsb src dest code*)]
[(sext16) (emit movsw src dest code*)]
[(zext8) (emit movzb src dest code*)]
[(zext16) (emit movzw src dest code*)]
[else (sorry! who "unexpected op ~s" op)])))))

• 1984-now, academic-industrial,
mostly single developer. Getting
down to the size-range where a
compiler is small enough to be
that.

!23
Specimen #8

Poly/ML
• 44k lines SML, self-hosting.
| cgOp(PushToStack(RegisterArg reg)) =
let
val (rc, rx) = getReg reg
in


opCodeBytes(PUSH_R rc, if rx then SOME{w=false, b = true,
Single machine target (plus else NONE)
x=false, r = false }

end
byte-code), AST + IR, classical | cgOp(PushToStack(MemoryArg{base, offset, index})) =
opAddressPlus2(Group5, LargeInt.fromInt offset, base, index, 0w6)

optimizations. Textbook style.


| cgOp(PushToStack(NonAddressConstArg constnt)) =
if is8BitL constnt
then opCodeBytes(PUSH_8, NONE) @ [Word8.fromLargeInt constnt]
else if is32bit constnt


then opCodeBytes(PUSH_32, NONE) @ int32Signed constnt

Standard platform for else


let
val opb = opCodeBytes(Group5, NONE)

symbolic logic packages in


val mdrm = modrm(Based0, 0w6 (* push *), 0w5 (* PC rel *))

opb @ [mdrm] @ int32Signed(tag 0)

Isabelle and HOL.


|
end
cgOp(PushToStack(AddressConstArg _)) =
(
case targetArch of
Native64Bit => (* Put it in the constant area. *)


let

1986-now, academic, mostly in


val opb = opCodeBytes(Group5, NONE)
val mdrm = modrm(Based0, 0w6 (* push *), 0w5 (* PC rel *));

single developer. |
end
opb @ [mdrm] @ int32Signed(tag 0)

Native32Bit => opCodeBytes(PUSH_32, NONE) @ int32Signed(tag 0)


| ObjectId32Bit =>

!24
Specimen #9

CakeML
• 58k lines SML, 5 targets, self- val WordOp64_on_32_def = Define `
WordOp64_on_32 (opw:opw) =
hosting.
dtcase opw of
| Andw => list_Seq [Assign 29 (Const 0w);
Assign 27 (Const 0w);
Assign 33 (Op And [Var 13; Var 23]);


Assign 31 (Op And [Var 11; Var 21])]
9 IRs, many simplifying passes.
| Orw => list_Seq [Assign 29 (Const 0w);
Assign 27 (Const 0w);
Assign 33 (Op Or [Var 13; Var 23]);
Assign 31 (Op Or [Var 11; Var 21])]


| Xor => list_Seq [Assign 29 (Const 0w);
160k lines HOL proofs: verified! Assign 27 (Const 0w);
Assign 33 (Op Xor [Var 13; Var 23]);
Assign 31 (Op Xor [Var 11; Var 21])]
| Add => list_Seq [Assign 29 (Const 0w);


Assign 27 (Const 0w);

Language semantics proven to be Inst (Arith (AddCarry 33 13 23 29));


Inst (Arith (AddCarry 31 11 21 29))]
| Sub => list_Seq [Assign 29 (Const 1w);
preserved through compilation!!! Assign 27 (Op Xor [Const (-1w); Var 23]);
Inst (Arith (AddCarry 33 13 27 29));
Assign 27 (Op Xor [Const (-1w); Var 21]);
Inst (Arith (AddCarry 31 11 27 29))]`

• Cannot emphasize enough. This was val WordShift64_on_32_def = Define `


WordShift64_on_32 sh n = list_Seq

science fiction when I was young.


(* inputs in 11 and 13, writes results
(if sh = Ror then
(let n = n MOD 64 in
in 31 and 33 *)

(if n < 32 then


[Assign 33 (Op Or [ShiftVar Lsl 11 (32 - n);

• CompCert first serious one, now ShiftVar Lsr


Assign 31 (Op Or [ShiftVar Lsl
ShiftVar Lsr
13
13
11
n]);
(32 - n);
n])]
several.
else
[Assign 33 (Op Or [ShiftVar Lsl 13 (64 - n);
ShiftVar Lsr 11 (n - 32)]);
Assign 31 (Op Or [ShiftVar Lsl 11 (64 - n);

• 2012-now, deeply academic.

!25
Variation #3

Meta-languages
• Notice Lisp / ML code looks a bit like grammar productions: recursive
branching tree-shaped type definitions, pattern matching.

• There's a language lineage that took that idea ("programs as grammars") to


its logical conclusion: metacompilers (a.k.a. "compiler-compilers"). Ultimate
in "compiler-friendly" implementation languages.

• More or less: parser glued to an "un-parser".

• Many times half a metacompiler lurks in more-normal compilers:

• YACCs ("yet another compiler-compiler"): parser-generators

• BURGs ("bottom-up rewrite generators"): code-emitter-generators

• See also: GCC ".md" files, LLVM TableGen. Common pattern!

!26
Aside: SRI-ARC
• Stanford Research Institute - Augmentation Research Lab. US Air
Force R&D project. Very famous for its NLS ("oNLine System").

• History of that project too big to tell here. Highly influential in forms of
computer-human interaction, hypertext, collaboration, visualization.

• Less well-known is their language tech: TREE-META and MPS/MPL.

!27
Specimen #10

TREE-META
• 184 lines of TREE-META. Bootstrapped .META PROGM

from META-II.
OUTPT[-,-] => % *1 ':' % '%PUSHJ;' % *2 '%POPJ;' % ;

AC[-,-,-] => *1 *3 ACX [*2,#1] #1 ':' % ;


ACX[AC[-,-,-],#1] => #1 ');' % *1:*1 *1:*3 ACX[*1:*2,#1]

• In the Schorre metacompiler family


T/
[-,#1] => #1 ');' % *1 ;

=> '%BT;DATA(@' ;
(META, META-II)
F/ => '%BF;DATA(@' ;

BALTER[-] => '%SAV;' % *1 '%RSTR;' % ;


OER[-,-] => *1 '%OER;' % *2 ;

• SRI-ARC, 1967. Made to support


OUTAB[-] => <TN <- CONV[*1]; OUT[TN] > ');' % ;
ERCODE [-,NUM] => *1 '%ERCHK;DATA(' OUTAB[*2]
[-.-] => *1 '%ERSTR;DATA(' OPSTR[*2] ;
language tools in the NLS project.
ERR[-] => *1 '%ERCHK;DATA(0);' % ;

DOO[-,-] => *1 *2 ;

• "Syntax-directed translation": parse NDMK[-,-] => '%MKND;DATA(@' *1 ',' *2 ');' % ;


NDLB => '%NDLBL;DATA(@' *1 ');' % ;
MKNODE [-] => '%NDMK;DATA(' *1 ');' % ;
input to trees, un-parse to machine GOO/ => '%OUTRE;'% ;
code. Only guided by grammars.
SET/ => '%SET;' % ;

PRIM[.ID] => '%' *1 ';' %


[.SR] => '%SRP;DATA(' OPSTR[*1] ;

• Hard to provide diagnostics, type- CALL[-] => '%CALL;DATA(@' *1 ');' % ;


STST[-] => '%TST;DATA(' OPSTR[*1] ;
checking, optimization, really anything SCODE[-] => '%CHRCK;DATA(' OUTAB[*1] ;
ARB[-] => #1 ':' % *1 '%BT;DATA(@' #1 ');' % '%SET;' % ;

other than straight translations.


BEGINN[-.-] => <B <- 0 > *2 'ENTRY 0;%INIT;%CALL;DATA(@' *1 ');' %
'%FIN;' % ;

• But: extremely small, simple compliers.


Couple pages. Ideal for bootstrap phase.

!28
Specimen #11 (Segue)

Mesa
• 42k lines of Mesa (bootstrapped
from MPL, itself from TREE-META).

• One of my favourite languages!

• Strongly typed, modules with


separate compilation and type
checked linking. Highly influential
(Modula, Java).

• Co-designed language, OS, and


byte-code VM implemented in CPU
microcode, adapted to compiler.

• Xerox PARC, 1976-1981, small


team left SRI-ARC, took MPL.

!29 https://www.flickr.com/photos/microsoftpdc/4119070676/ - CC BY 2.0


Variations #4, #5, and #6

leverage interpreters

• Mesa and Xerox PARC is a nice segue into next few


points: all involve compilers interacting with interpreters.

• Interpreters & compilers actually have a long relationship!

• In fact interpreters predate compilers.

• Let us travel back in time to the beginning, to illustrate!

!30
Origins of "computer"
• 1940s: First digital
computers.

• Before: fixed-function
machines and/or humans
(largely women) doing job
called "computer".

• Computing power literally


measured in "kilo-girls"
and "kilo-girl-hours".

!31
ENIAC: general hardware
• 1945: ENIAC built for US
Army, Ordnance Corps.
Artillery calculations in
WWII.

• "Programmers" drawn
from "computer" staff, all
women.

• "Programming" meant
physically rewiring per-
task.

!32
Stored Programs
• 1948: Jean Bartik leads
team to convert ENIAC to
"stored programs",
instructions (called
"orders") held in memory.

• Interpreted by hardware.
Faster to reconfigure than
rewiring; but ran slower.

• Subroutine concept
developed for factoring
stored programs.

!33
First software pseudo codes:
interpreters on ENIAC, BINAC, UNIVAC

• 1949:"Short Code" software interpreters for higher level


"pseudo-code" instructions (non-HW-interpreted) that
denote subroutine calls and expressions. ~50x slower
than HW-interpreted.

!34
Specimen #12

A-0: the first compiler


• Reads interpreter-like pseudo-
codes, then emits "compilation"
program with all codes resolved
to their subroutines.

• Result runs almost as fast as


manually coded; but as easy to
write-for as interpreter. An
interpreter "fast mode".

• Rationale all about balancing time


tradeoffs (coding-time, compiler-
execution-time, run-time).

• 1951, Grace Hopper, Univac

http://commons.wikimedia.org/wiki/File:Grace_Murray_Hopper,_in_her_office_in_Washington_DC,_1978,_©Lynn_Gilbert.jpg
!35 - CC BY-SA 4.0
Balance between
interpretation and compilation
is context dependent too!

!36
Variation #4

Only compile from frontend to IR,


interpret residual VM code
• Can stop before real machine code. Emit IR == "virtual machine" code.

• Can further compile or just interpret that VM code.

• Residual VM interpreter has several real advantages:

• Easier to port to new hardware, or bootstrap compiler. "Just get something running".

• Fast compilation & program startup, keeps interactive user engaged.

• Simply easier to write, less labor. Focus your time on frontend semantics.

https://xavierleroy.org/talks/zam-kazam05.pdf

!37
Specimen #13

Roslyn
• 350k lines C#, 320k lines VB. private void EmitBinaryOperatorInstruction(BoundBinaryOperator expression)
{
switch (expression.OperatorKind.Operator())
Self-hosting, bootstrapped off {
case BinaryOperatorKind.Multiplication:
_builder.EmitOpCode(ILOpCode.Mul);

previous gen.
break;

case BinaryOperatorKind.Addition:
_builder.EmitOpCode(ILOpCode.Add);
break;

• Multi-language framework case BinaryOperatorKind.Subtraction:


_builder.EmitOpCode(ILOpCode.Sub);
break;

(C#, VB.NET). Rich semantics, case BinaryOperatorKind.Division:


if (IsUnsignedBinaryOperator(expression))

good diagnostics, IDE {

}
_builder.EmitOpCode(ILOpCode.Div_un);

integration.
else
{
_builder.EmitOpCode(ILOpCode.Div);
}
break;

• Lowers from AST to CIL IR.


Separate CLR project
interprets or compiles IR.

• 2011-now, Microsoft, OSS.

!38
Specimen #14

Eclipse Compiler for Java (ECJ)


• 146k lines Java, self-hosting, /**
* Code generation for the conditional operator ?:
*
bootstrapped off Javac.
* @param currentScope org.eclipse.jdt.internal.compiler.lookup.BlockScope
* @param codeStream org.eclipse.jdt.internal.compiler.codegen.CodeStream
* @param valueRequired boolean
*/
@Override


public void generateCode(
In Eclipse! Also in many Java BlockScope currentScope,
CodeStream codeStream,
boolean valueRequired) {

products (eg. IntelliJ IDEA). int pc = codeStream.position;


BranchLabel endifLabel, falseLabel;

Rich semantics, good if (this.constant != Constant.NotAConstant) {


if (valueRequired)
codeStream.generateConstant(this.constant, this.implicitConversion);

diagnostics, IDE integration.


}
codeStream.recordPositionsFrom(pc, this.sourceStart);
return;

Constant cst = this.condition.optimizedBooleanConstant();


boolean needTruePart =


!(cst != Constant.NotAConstant && cst.booleanValue() == false);

Lowers from AST to JVM IR. boolean needFalsePart =


!(cst != Constant.NotAConstant && cst.booleanValue() == true);
endifLabel = new BranchLabel(codeStream);

Separate JVM projects


interpret or compile IR.

• 2001-now, IBM, OSS.

!39
Variation #5

Only compile some functions,


interpret the rest
• Cost of interpreter only bad at inner loops or fine-grain. Outer
loops or coarse-grain (eg. function calls) similar to virtual dispatch.

• Design option: interpret by default, selectively compile hot


functions ("fast mode") at coarse grain. Best of both worlds!

• Keep interpreter-speed immediate feedback to user.

• Interpreter may be low-effort, portable, can bootstrap.

• Defer effort on compiler until needed.

• Anything hard to compile, just call back to interpreter.

!40
Specimen #15

Pharo/Cog
• 54k line VM interpreter and 18k line JIT: C
code generated from Smalltalk
metaprograms. Bootstrapped from Squeak.

• Smalltalk is what you'll actually hear people


mention coming from Xerox PARC.

• Very simple language. "Syntax fits on a


postcard". Easy to interpret.

• Complete GUI, IDE, powerful tools.

• Standard Smalltalk style: interpret by


default, JIT for "fast mode". Compiler
bootstraps-from and calls-into VM
whenever convenient.

• Targets ARM, x86, x64, MIPS.

• 2008-now, academic-industrial consortium.

!41
Specimen #16

Franz Lisp
• 20k line C interpreter, 7,752 line Lisp ;--- e-move :: move value from one place to anther
; this corresponds to d-move except the args are EIADRS
compiler.
;
(defun e-move (from to)
(if (and (dtpr from)
(eq '$ (car from))


(eq 0 (cadr from)))
Older command-line system, standard then (e-write2 'clrl to)
else (e-write3 'movl from to)))

Unix Lisp for years.


;--- d-move :: emit instructions to move value from one place to another
;
(defun d-move (from to)
(makecomment `(from ,(e-uncvt from) to ,(e-uncvt to)))

• Like Smalltalk: very simple language. #+(or for-vax for-tahoe)


(cond ((eq 'Nil from) (e-move '($ 0) (e-cvt to)))

Actually an AST/IR that escaped from


(t (e-move (e-cvt from) (e-cvt to))))

#+for-68k
the lab. Easy to interpret.
(let ((froma (e-cvt from))
(toa (e-cvt to)))
(if (and (dtpr froma)
(eq '$ (car froma))


(and (>& (cadr froma) -1) (<& (cadr froma) 65))
Frequent Lisp style: interpret by (atom toa)
(eq 'd (nthchar toa 1)))
default; compile for "fast mode". then ;it's a mov #immed,Dn, where 0 <= immed <= 64
; i.e., it's a quick move
(e-write3 'moveq froma toa)
Compiler bootstraps-from and calls- else (cond ((eq 'Nil froma) (e-write3 'movl '#.nil-reg toa))
(t (e-write3 'movl froma toa))))))
into interpreter whenever convenient.

• Targets m68k and VAX.

• 1978-1988, UC Berkeley.

!42
Variation #6

Partial Evaluation Tricks


• Consider program in terms of parts that are static (will not
change anymore) or dynamic (may change).

• Partial evaluator (a.k.a. "specializer") runs the parts that


depend only on static info, emits residual program that only
depends on dynamic info.

• Note: interpreter takes two inputs: program to interpret, and


program's own input. First is static, but redundantly treated
as dynamic.

• So: compiling is like partially evaluating an interpreter,


eliminating the redundant dynamic treatment in its first input.

!43
Futamura Projections
• Famous work relating programs P, interpreters I, partial evaluators E, and
compilers C. The so-called "Futamura Projections":

• 1: E(I,P) → partially evaluate I(P) → emit C(P), a compiled program

• 2: E(E,I) → partially evaluate λP.I(P) → emit C, a compiler!

• 3: E(E,E) → partially evaluate λI.λP.I(P) → emit a compiler-compiler!

• Futamura, Yoshihiko, 1971. Partial Evaluation of Computation Process—


An Approach to a Compiler-Compiler. http://citeseerx.ist.psu.edu/
viewdoc/summary?doi=10.1.1.10.2747

• Formal strategy for building compilers from interpreters and specializers.

!44
Specimen #17

Truffle/Graal
• 240k lines of Java for Graal (VM); 90k public Variable emitConditional(LogicNode node, Value trueValue, Value
falseValue) {
lines for Truffle (interpreter-writing if (node instanceof IsNullNode) {
IsNullNode isNullNode = (IsNullNode) node;

framework)
LIRKind kind =
gen.getLIRKind(isNullNode.getValue().stamp(NodeView.DEFAULT));
Value nullValue = gen.emitConstant(kind, isNullNode.nullConstant());
return gen.emitConditionalMove(kind.getPlatformKind(),


operand(isNullNode.getValue()),
Actual real system based on first nullValue, Condition.EQ, false,
trueValue, falseValue);
Futamura Projection.
} else if (node instanceof CompareNode) {
CompareNode compare = (CompareNode) node;
PlatformKind kind =
gen.getLIRKind(compare.getX().stamp(NodeView.DEFAULT))


.getPlatformKind();
Seriously competitive! Potential future return gen.emitConditionalMove(kind, operand(compare.getX()),
operand(compare.getY()),
Oracle JVM.
compare.condition().asCondition(),
compare.unorderedIsTrue(),
trueValue, falseValue);
} else if (node instanceof LogicConstantNode) {


return gen.emitMove(((LogicConstantNode) node).getValue() ?
Multi-language (JavaScript, Python, trueValue : falseValue);
} else if (node instanceof IntegerTestNode) {
Ruby, R, JVM byte code, LLVM bitcode) IntegerTestNode test = (IntegerTestNode) node;
return gen.emitIntegerTestMove(operand(test.getX()),
operand(test.getY()),
multi-target (3)
trueValue, falseValue);
} else {
throw GraalError.unimplemented(node.toString());


}
"Write an interpreter with some }

machinery to help the partial evaluator,


get a compiler for free"

• Originally academic, now Oracle

!45
Variation #7

Forget IR and/or AST!


• In some contexts, even building an AST or IR is overkill.

• Small hardware, tight budget, one target, bootstrapping.

• Avoiding AST tricky, languages can be designed to help.


So-called "single-pass" compilation, emit code line-at-a-
time, while reading.

• Likely means no optimization aside from peephole.

!46
Specimen #18

Turbo Pascal
• 14k instructions including
editor. x86 assembly. 39kb on
disk.

• Famous early personal-micro


compiler. Single-pass, no AST
or IR. Single target.

• Proprietary ($65) so I don't


have source. Here's an ad!

• 1983-1992; lineage continues


into modern Delphi compiler.

!47
Specimen #19

Manx Aztec C
• 21k instructions, 50kb on disk.

• Contemporary to Turbo
Pascal, one of many
competitors.

• Unclear if AST or not, no


source. Probably no IR.

• Multi-target, Z80 and 8080.

• 1980-1990s, small team.

!48
Specimen #20

Not just the past: 8cc


• 6,740 lines of C, self-hosting, static void emit_binop_int_arith(Node *node) {
SAVE;
char *op = NULL;
compiles to ~110kb via clang, switch (node->kind) {
case '+': op = "add"; break;
case '-': op = "sub"; break;

220kb via self.


case '*': op = "imul"; break;
case '^': op = "xor"; break;
case OP_SAL: op = "sal"; break;
case OP_SAR: op = "sar"; break;
case OP_SHR: op = "shr"; break;


case '/': case '%': break;

Don't have to use assembly to default: error("invalid operator '%d'", node->kind);


}
emit_expr(node->left);

get this small! Quite readable push("rax");


emit_expr(node->right);
emit("mov #rax, #rcx");

and simple. Works.


pop("rax");
if (node->kind == '/' || node->kind == '%') {
if (node->ty->usig) {
emit("xor #edx, #edx");
emit("div #rcx");


} else {

Single target, AST but no IR, }


emit("cqto");
emit("idiv #rcx");

few diagnostics.
if (node->kind == '%')
emit("mov #edx, #eax");
} else if (node->kind == OP_SAL || node->kind == OP_SAR ||
node->kind == OP_SHR) {
emit("%s #cl, #%s", op, get_int_reg(node->left->ty, 'a'));


} else {

2012-2016, mostly one }


}
emit("%s #rcx, #rax", op);

developer.

!49
Grand Finale

!50
Specimen #21

JonesForth
• 692 instruction VM, 1,490 lines Forth \
\
IF is an IMMEDIATE word which compiles 0BRANCH followed by a dummy offset, and places
the address of the 0BRANCH on the stack. Later when we see THEN, we pop that address
for compiler, REPL, debugger, etc.
\
:
off the stack, calculate the offset, and back-fill the offset.
IF IMMEDIATE
' 0BRANCH , \ compile 0BRANCH
HERE @ \ save location of the offset on the stack


0 , \ compile a dummy offset
Educational implementation of Forth.
;

: THEN IMMEDIATE
DUP


HERE @ SWAP - \ calculate the offset from the address saved on the stack
Forth, like Lisp, is nearly VM code at ;
SWAP ! \ store the offset in the back-filled location

input (postfix not prefix).


: ELSE IMMEDIATE
' BRANCH , \ definite branch to just over the false-part
HERE @ \ save location of the offset on the stack
0 , \ compile a dummy offset

• Minimal partial-compiler turns user


SWAP
DUP
HERE @ SWAP -
\
\
now back-fill the original (IF) offset
same as for THEN word above

"words" into chains of indirect-jumps. ;


SWAP !

Machine-code primitive words.

• Interactive system with quote, eval,


control flow, exceptions, debug
inspector. Pretty high expressivity!

• 2009, one developer.

!51
Coda

!52
There have been a lot of languages
http://hopl.info catalogues 8,945 programming
languages from the 18th century to the present
!53
Go study them: past and present!
Many compilers possible!
Pick a future you like!

!54
The End!

https://en.wikipedia.org/wiki/Dinosaur#/media/File:Ornithopods_jconway.jpg

(I also probably ought to mention that due to using some CC BY-SA pictures,

this talk is licensed CC BY-SA 4.0 international)

!55

You might also like