Compiler Construction Principles and Practice
Compiler Construction Principles and Practice
Kenneth C. Louden San Jose State University 1. 2. 3. 4. 5. 6. 7. 8. INTRODUCTION SCANNING CONTEXT-FREE GRMMARS AND PARSING TOP-DOWN PARSING BOTTOM-UP PARSING SEMANTIC ANALYSIS RUNTIME ENVIRONMENT CODE GENERATION
Chapter 1
Introduction
Emphasis: History of the compiler Description of programs related to compilers Compiling translation process Major Data Structures of a compiler Other related issues Bootstapping and porting What and Why Compliers? Compilers: Computer Programs that translate one language to another Source language(input) to target language (output)
Source program Target program
compiler
Source language: high-level language c or c++ Target language: object code, machine code (machine instruction) Purposes of learning compilers: 1.Basic knowledge (theoretical techniques --- automata theory) 2.Tools and practical experience to design and program an actual compiler Additional Usage of compiling techniques: Developing command interpreters, interface programs TINY : C-Minus : the language for the discussion in the text consist of a small but sufficiently complex subset of C, It is more extensive than TINY and suitable for a class project.
scanner tokens parser Literal table Syntax tree Semantic analyzer Symbol table Annotated tree Source code optimizer Intermediate code Code generator Target code Target code optimizer Target code Error handler
1. The scanner Lexical analysis: input a stream of characters, output tokens a[index] = 4 + 2 Tokens: a, [, index, ], = , 4, + , 2 The task of the scanner: the recognition of tokens, enter identifiers into the symbol table, or enter literal into the literal table. 2. The parser Determine the structure of the program Input: the forms of tokens Output: a parse tree or a syntax tree a syntax tree is a condensation of the information contained in the parse tree.
expression Assign-expression
expression
expression
Subscript-expression
Additive-expressive
expression
expression
Identifier a
Number 2
3. The semantic analyzer Static semantics: be cannot be conveniently expressed as syntax and analyzed by the parser, but can be determined prior to execution. For example: declarations and type checking,data types Dynamic semantics: be determined by executing it , cannot be determined by a compiler.
Assign-expression
Subscript-expression integer
Additive-expression integer
Number 4 integer
Number 2 integer
4. The source code optimizer Source-level optimization: 4+2 6, constant folding Threeaddress code: (intermediate code: any internal representation for the source code used by the compiler) t = 4+2 a[index] = t two phase optimizer: 1. t = 6 a[index] = t 2. a[index] = 6 intermediate code: any internal representation for the source code used by the compiler. (syntax tree ,three-address, four-address and so on)
5. The code generator Input: intermediate code or IR Output: machine code, code for the target machine 6. The target code optimizer Improve the target code generated by the code generator Task : choosing addressing mode to improve performance Replacing slow instructions by faster ones Eliminating redundant or unnecessary operations
MOV MUL MOV ADD MOV R0 , index R0 , 2 R1, &a R1 , R0 *R1, 6 MOV R0, index SHL R0 MOV &a[R0],6
Advantage: portability 3. passes passes: process the entire source program several times the initial pass: construct a syntax tree or intermediate code from the source a pass may consist of several phases. One complier with three passes scanning and parsing; semantic analysis and source-level optimization; code generation and target-level optimization. 4. language definition and compilers relation between the language definition and compiler formal definition in mathematical terms for the languages semantics one common method: denotational semantics. The structure and behavior of the runtime environment of the language affect compiler construction 5. compiler options and interfaces interfaces with the operating system provide options to the user for various purposes
10
Considering the following situations: (1) The existing compiler for B runs on the target machine; (2) The existing compiler for B runs on a machine different from the target machine. (3) How the first compilers were written when no compilers exited yet. At first, the compiler is written in the machine language. Today, the compiler is written in another language T-diagram:
S T H
A compiler written in language H that translates language S into language T. Combining T-diagram in two ways:
A B H B C H A C H
On the same machine H, a compiler from A to C can be obtained by combine the compiler for A to B with the compiler from B to C.
A B H H K M A B K
Using a compiler from H to K to translate the implementation language of another compiler from H to K.
11
Its common to write the compiler with the source language. The solution to the third situation mentioned above -----Bootstrapping:
A H A
Compiler written in own language A
H H
H H
H A A H H A H H
Solution to the porting: In order to port the compiler from old host H to the new host K, use the old compiler to produce a cross compiler and recompile the compiler to generate the new one. Step 1
A k A
Compiler source code retargeted to K
H H
k H
Original Compiler
Cross Compiler
Step 2
A
Compiler source code retargeted to K
K A A K H A H K
Cross Compiler
Retargeted compiler
13
1. 2. 3. 4. 5. 6. 7.
a sequence of statements separated by semicolons no procedure, no declarations all variables are integer, two control statement : if-else and repeat read and write statements comments with curly brackets; but can not be nested expressions are Boolean and integer arithmetic expressions ( using < ,=), (+,-,* /, parentheses, constants, variables ), Boolean expressions are only as tests in control statements.
One sample program in TINY: Factorial function Read x; {input an integer} If x>0 then {dont compute if x <=0} Fact:=1; Repeat Fact :=fact *x; X:=x-1; Until x=0; Write fact {output factorial of x} End
14
The TINY compiler C files: globals.h, util.h, scan.h, parse.h, symtab.h, analyze.h, code.h, cgen.h Main.c, util.c, scan.c, parse.c, symtab.c, analyze.c, code.c, cgen.c Four passes: 1. The scanner and the parser 2. semantic analysis: constructing the symbol table 3. semantic analysis: type checking 4. the code generator main.c drives these passes. The central code is as follows: syntaxTree = parse( ); buildSymtab (syntaxTree); typeCheck(syntaxTree); codeGen(syntaxTree,codefile); The TM Machine The target language: the assembly language TM machine has some the properties of Reduced Instruction Set Computers(RISC). 1. all arithmetic and testing must take place in registers. 2. the addressing modes are extremely limited. The simulator of the TM machine can directly execute the assembly files.
15