Administrivia
• Course home page:
http://www-inst.eecs.berkeley.edu/~cs164
Introduction to Programming Languages and
• If you are on the waiting list, follow the
Compilers normal procedures (see class web page)
– The course staff is not involved !
CS164 – If you are enrolled, you don’t need to do anything
4:00-5:30 TT • Discussion sections meet this week!
1 LeConte • Pick up class accounts
– At the end of lecture today,
and from Bowei Du afterwards
Prof. Bodik CS 164 Lecture 1 1 2
Prof. Bodik CS 164 Lecture 1
Course Structure Academic Honesty
• Course has theoretical and practical aspects • Don’t use work from uncited sources
– Including old code
• Need both in programming languages! • We use plagiarism detection software
– 10 cases in last course offerings
• Written assignments = theory, practice
– Class hand-in, right before lecture
• Programming assignments = practice
– Electronic hand-in
PLAGIARISM
• Strict deadlines (three free late days)
3 4
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
The Course Project How are Languages Implemented?
• A big project … in 5 easy parts • Two major strategies:
– Interpreters (older, less studied)
• Start early! – Compilers (newer, much more studied)
• Programming Assignment 1 • Interpreters run programs “as is”
– handed out today – Little or no preprocessing
– due in 13 days.
• Compilers do extensive preprocessing
• more on the project later in this lecture
5 6
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
1
Language Implementations (Short) History of High-Level Languages
• Batch compilation systems dominate • 1953 IBM develops the 701
– E.g., gcc
• All programming done in assembly
• Some languages are primarily interpreted
– E.g., Java bytecode • Problem: Software costs exceeded hardware
costs!
• Some environments (Lisp) provide both
– Interpreter for development • John Backus: “Speedcoding”
– Compiler for production – An interpreter
– Ran 10-20 times slower than hand-written assembly
7 8
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
FORTRAN I FORTRAN I
• 1954 IBM develops the 704 • The first compiler
• John Backus – Produced code almost as good as hand-written
– Idea: translate high-level code to assembly – Huge impact on computer science
– Many thought this impossible
• Had already failed in other projects • Led to an enormous body of theoretical work
• 1954-7 FORTRAN I project
• By 1958, >50% of all software is in FORTRAN • Modern compilers preserve the outlines of
• Cut development time dramatically FORTRAN I
– (2 wks ! 2 hrs)
9 10
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
The Structure of a Compiler Lexical Analysis
1. Lexical Analysis • First step: recognize words.
2. Parsing – Smallest unit above letters
3. Semantic Analysis
4. Optimization This is a sentence.
5. Code Generation
• Note the
– Capital “T” (start of sentence symbol)
The first 3, at least, can be understood by
analogy to how humans comprehend English. – Blank “ “ (word separator)
– Period “.” (end of sentence symbol)
11 12
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
2
More Lexical Analysis And More Lexical Analysis
• Lexical analysis is not trivial. Consider: • Lexical analyzer divides program text into
ist his ase nte nce “words” or “tokens”
if x == y then z = 1; else z = 2;
• Plus, programming languages are typically more
cryptic than English: • Units:
*p->f ++ = -.12345e-5 if, x, ==, y, then, z, =, 1, ;, else, z, =, 2, ;
13 14
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
Parsing Diagramming a Sentence
• Once words are understood, the next step is This line is a longer sentence
to understand sentence structure
• Parsing = Diagramming Sentences article noun verb article adjective noun
– The diagram is a tree
subject object
sentence
15 16
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
Parsing Programs Semantic Analysis
• Parsing program expressions is the same • Once sentence structure is understood, we
• Consider: can try to understand “meaning”
– But meaning is too hard for compilers
If x == y then z = 1; else z = 2;
• Diagrammed:
x == y z 1 z 2 • Compilers perform limited analysis to catch
inconsistencies
relation assign assign
• Some do more analysis to improve the
predicate then -stmt else-stmt
performance of the program
if-then-else
17 18
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
3
Semantic Analysis in English Semantic Analysis in Programming
• Example: • Programming {
Jack said Jerry left his assignment at home. languages define int Jack = 3;
What does “his” refer to? Jack or Jerry? strict rules to avoid
{
such ambiguities
int Jack = 4;
• Even worse: System.out.
• This Java code
Jack said Jack left his assignment at home? print(Jack);
prints “4”; the inner
How many Jacks are there? }
definition is used
Which one left the assignment? }
19 20
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
More Semantic Analysis Optimization
• Compilers perform many semantic checks • No strong counterpart in English, but akin to
besides variable bindings editing
• Example: • Automatically modify programs so that they
– Run faster
Jack left her homework at home.
– Use less memory
– Or conserve some other resource
• A “type mismatch” between her and Jack; we
know they are different people
• Our project has no optimization component
– Presumably Jack is male
21 22
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
Optimization Example Code Generation
• Produces assembly code (usually)
X = Y * 0 is the same as X = 0 • A translation into another language
– Analogous to human translation
YES and NO!
Valid for integers,
but not necessarily for floating point numbers
23 24
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
4
Intermediate Languages Intermediate Languages (Cont.)
• Many compilers perform translations between • IL’s are useful because lower levels expose
successive intermediate forms features hidden by higher levels
– All but first and last are intermediate languages – registers
internal to the compiler
– memory layout
– Typically there is 1 IL
– etc.
• IL’s generally ordered in descending level of
abstraction • But lower levels obscure high-level meaning
– Highest is source
– Lowest is assembly
25 26
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
Issues Compilers Today
• Compiling is almost this simple, but there are • The overall structure of almost every compiler
many pitfalls. adheres to our outline
• Example: How are erroneous programs • The proportions have changed since FORTRAN
handled? – Early: lexing, parsing most complex, expensive
• Language design has big impact on compiler – Today: optimization dominates all other phases,
lexing and parsing are cheap
– Determines what is easy and hard to compile
– Course theme: many trade-offs in language design
27 28
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
Trends in Compilation Programming Assignments
• Compilation for speed is less interesting. But: • the project
– scientific programs – implements a compiler for Decaf, a subset of Java
– has five parts:
– advanced processors (Digital Signal Processors,
advanced speculative architectures)
– PA1: interpreter of a subset of Decaf
– implementation of modern languages (Java, C#)
– PA2-5: the compiler of Decaf, in four pieces
• Ideas from compilation used for improving
• PA2: lexical analysis (a.k.a. scanner, lexer)
code reliability: • PA3: syntactic analysis (a.k.a. parser)
– memory safety • PA4: semantic analyzer (a.k.a. type checker)
• PA5: code generator
– detecting concurrency errors (data races)
– ... 29 30
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1
5
The Decaf compiler How we will implement the scanner, parser
Decaf pgm.
Decaf program (stream of characters)
Java lexer code lexer
PA2: lexer PA2: lexer PA2: lexer lexer generator
description
stream of tokens
Lexer implementation options (same for the parser):
PA3: parser PA3: parser PA3: parser
AST Abstract Syntax Tree (AST) • old and tedious:
• implement the lexer completely in Java
PA1: interpreter PA4: checker PA4: checker • the modern practice:
run! • write a lexer description in a domain-specific
AST with annotations (types, declarations) language,
• but you wouldn’t learn how lexer generators work
PA5: code gen PA5: code gen
• cs164:
MIPS code (maybe x86) • write our own lexer generator
• simple, but good enough and fun
MISP simulator
run! 31 32
Prof. Bodik CS 164 Lecture 1 Prof. Bodik CS 164 Lecture 1