Chapter 1
Introduction to Compilers
Compilers and Interpreters
Compilation
Translation of a program written in a source language into a semantically equivalent program written in a target language Input Oversimplified view:
Source Program
Compiler
Target Program
Output
Error messages
Compilers and Interpreters (contd)
Interpretation
Performing the operations implied by the source program Oversimplified view:
Source Program Input
Interpreter
Output
Error messages
Compilers and Interpreters (contd)
Compiler: a program that translates an executable program in one language into an executable program in another language Interpreter: a program that reads an executable program and produces the results of running that program
The Analysis-Synthesis Model of Compilation
There are two parts to compilation:
Analysis
Breaks up source program into pieces and imposes a grammatical structure Creates intermediate representation of source program Determines the operations and records them in a tree structure, syntax tree Known as front end of compiler
The Analysis-Synthesis Model of Compilation (contd)
Synthesis
Constructs target program from intermediate representation Takes the tree structure and translates the operations into the target program Known as back end of compiler
Other Tools that Use the Analysis-Synthesis Model
Editors (syntax highlighting) Pretty printers (e.g. Doxygen) Static checkers (e.g. Lint and Splint) Interpreters Text formatters (e.g. TeX and LaTeX) Silicon compilers (e.g. VHDL) Query interpreters/compilers (Databases)
A language-processing system
Skeletal Source Program Preprocessor Source Program Compiler Target Assembly Program Assembler Relocatable Object Code Linker Absolute Machine Code
Try for example:
gcc -v myprog.c
Libraries and Relocatable Object Files
8
Analysis
In compiling, analysis has three phases:
Linear analysis: stream of characters read from left-to-right and grouped into tokens; known as lexical analysis or scanning Hierarchical analysis: tokens grouped hierarchically with collective meaning; known as parsing or syntax analysis Semantic analysis: check if the program components fit together meaningfully
9
Lexical analysis
Characters grouped into tokens.
10
Syntax analysis (Parsing)
Grouping tokens into grammatical phrases Character groups recorded in symbol table Represented by a parse tree
11
Syntax analysis (contd)
Hierarchical structure usually expressed by recursive rules Rules for definition of expression:
12
Semantic analysis
Checks source program for semantic errors Gathers type information for subsequent code generation (type checking) Identifies operator and operands of expressions and statements
13
Phases of a compiler
14
Symbol-Table Management
Symbol table data structure with a record for each identifier and its attributes Attributes include storage allocation, type, scope, etc All the compiler phases insert and modify the symbol table
15
Intermediate code generation
Program representation for an abstract machine Should have two properties
Easy to produce Easy to translate into target program
Three-address code is a commonly used form similar to assembly language
16
Code optimization and generation
Code Optimization
Improve intermediate code by producing code that runs faster
Code Generation
Generate target code, which is machine code or assembly code
17
The Phases of a Compiler
Phase Programmer (source code producer) Scanner (performs lexical analysis) Parser (performs syntax analysis based on the grammar of the programming language) Output Source string Token string A=B+C; A, =, B, +, C, ; And symbol table with names
; | = / \ A + / \ B C
Sample
Parse tree or abstract syntax tree
Semantic analyzer (type checking, etc)
Intermediate code generator
Annotated parse tree or abstract syntax tree
Three-address code, quads, or RTL int2fp B + t1 := t2 C t1 t2 A
Optimizer
Code generator
Three-address code, quads, or RTL
Assembly code
int2fp B + t1
#2.3
t1 A
18
MOVF #2.3,r1 ADDF2 r1,r2
The Grouping of Phases
Compiler front and back ends:
Front end:
Analysis steps + Intermediate code generation Depends primarily on the source language Machine independent
Back end:
Code optimization and generation Independent of source language Machine dependent
19
The Grouping of Phases (contd)
Compiler passes:
A collection of phases is done only once (single pass) or multiple times (multi pass)
Single pass: reading input, processing, and producing output by one large compiler program; usually runs faster Multi pass: compiler split into smaller programs, each making a pass over the source; performs better code optimization
20
Compiler-Construction Tools
Software development tools are available to implement one or more compiler phases
Scanner generators Parser generators Syntax-directed translation engines Automatic code generators Data-flow engines
21