1 Lec01
1 Lec01
COMPILER DESIGN
Adapted from slides by Steve Zdancewic, UPenn
Welcome!
• I’m glad you’re here!
• This is a rough time for everyone. If there’s anything I can
do to make things more survivable, please let me know.
• Instructor: William Mansky (he/him)
• Office hours Tuesday 1-2, Friday 10:30-11:30, and by
appointment, on Blackboard Collaborate
• Office hours are great for homework help, or just to say hi!
3
What is a Compiler?
• CPUs don’t actually understand programming languages!
• A compiler is a program that translates from one
programming language to another.
• Typically: high-level source code to low-level machine code
High-level Code
?
Low-level Code
4
What is a Compiler?
• CPUs don’t actually understand programming languages!
• A compiler is a program that translates from one
programming language to another.
• Typically: high-level source code to low-level machine code
• Provides the abstraction that computers understand C,
Java, etc.
C program
gcc
a.out
5
Why Study Compilers?
• They’re what makes programming possible!
• Useful if you’ve ever wanted to make your own language,
or tweak an existing one.
• But you don’t have to know engine design to drive a car
(anymore)
– If you’re going to be a professional driver, maybe you should.
– When things go wrong, the abstraction breaks.
C program
gcc
a.out
6
When Things Go Wrong, part 1
• (demo)
7
When Things Go Wrong, part 2
https://gcc.gnu.org/bugzilla/buglist.cgi?component=c&product
=gcc&resolution=---
8
9
Textbook
• Course textbook: Modern compiler implementation in C
(Appel)
– Green tiger book (there are also Java and ML versions)
– Can get an ebook via RedShelf (might be cheaper)
– Small number of copies at the library
– Errata, etc. at
https://www.cs.princeton.edu/~appel/modern/c/
10
Grading
11
In-Class Exercises
• One question every class, submitted through Gradescope
• Answer them in-class if you attend live, or whenever you
watch the lecture if you’re watching the recordings
• You don’t have to get them right to get credit!
12
Assignments
• We will have six programming assignments, each over ~2
weeks
• Each assignment will be submitted twice
– First submission: write as much as you can; you’ll receive full
credit as long as you submit anything, and I’ll give you feedback
on your code
– Second submission: I’ll actually test your code and grade you on
how well it works
• Put together, the assignments will be most of a compiler for a
simple C-like language
• Final project: add another feature to the compiler
• On Piazza
– Can ask/answer anonymously
– Can post privately to instructors
– Can answer other students’ questions
14
15
INTRO TO COMPILERS
What is a Compiler?
• A compiler is a program that translates from one
programming language to another.
• Typically: high-level source code to low-level machine code
(object code)
– Not always: Source-to-source translators, Java bytecode
compiler, Java ⇒ Javascript, etc.
High-level Code
?
Low-level Code
17
History of Compilers
• 1945: ENIAC, the first programmable computer, is built,
programmed by setting switches (machine language, 1’s
and 0’s)
• 1947: Kathleen Booth invents assembly language
• 1951-1952: Grace Hopper and her team invent A-0, a very
low-level compiler
• 1955-1959: Hopper invents FLOW-MATIC, the first
programming language with English-like syntax, which later
becomes part of COBOL
• 1957: an IBM team led by John Backus
releases the FORTRAN compiler, the first
commercially available compiler
• 1960: FORTRAN, COBOL, ALGOL, and LISP
become the four main programming languages
• 1970—: They inspire an explosion of new
languages, almost all of which use compilers!
18
Source Code
• Optimized for human readability
– Expressive: matches human ideas of grammar / syntax /
meaning
– Redundant: more information than needed to help catch errors
– Abstract: exact computation possibly not fully determined by
code
• Example C source:
#include <stdio.h>
int factorial(int n) {
int acc = 1;
while (n > 0) {
acc = acc * n;
n = n - 1;
}
return acc;
}
hardware pushl
movl
%ebp
%esp, %ebp
– Hard for people to subl $8, %esp
movl 8(%ebp), %eax
read movl %eax, -4(%ebp)
movl $1, -8(%ebp)
– Many steps, each one LBB0_1:
simple cmpl $0, -4(%ebp)
jle LBB0_3
– Redundancy, ## BB#2:
20
How to translate?
• Source code and target code aren’t just different languages
– they’re trying to express different things
• Some languages are farther from machine code than
others:
– Consider: C, C++, Java, Lisp, F#, Ruby, Python, Javascript,
Prolog
• Goals of translation:
– Correctly convey what the source code meant to do
– Best performance for the concrete computation
– Reasonable translation efficiency (< O(n3))
– Maintainable implementation
21
Idea: Translate in Steps
• Compile via a series of program representations
22
(Simplified) Compiler
Structure
Source code
if (b == 0) a = 0;
Lexical Analysis
Token Stream Front End
(machine independen
Parsing
Abstract Syntax Tree
Translation and
Optimization
Middle End
(compiler dependent)
Intermediate Code
23
Typical Compiler Stages
• Lexing token stream
• Parsing abstract syntax
• Semantic analysis annotated abstract
syntax
• Translation intermediate code
• Control flow analysis control-flow graph
• Dataflow analysis interference graph
• Register allocation assembly
• Code emission
24
25
First Step: Lexical Analysis
• Change the character stream “if (b == 0) a = 0;” into
tokens:
if ( b == 0 ) { a = 0 ; }