0% found this document useful (0 votes)
457 views

Compiler Construction Principles and Practice

This document summarizes the key stages and components of a compiler construction process. It discusses: 1) The major phases of a compiler including scanning, parsing, semantic analysis, code generation and optimization. 2) Important data structures used in compilers like the symbol table, syntax tree and intermediate code. 3) Historical developments in compilers from early assembly languages to modern optimizing compilers. 4) Other related tools like assemblers, linkers, loaders, debuggers and profilers. 5) Techniques for bootstrapping compilers and porting them to new machines or languages.

Uploaded by

Tarun Behera
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
457 views

Compiler Construction Principles and Practice

This document summarizes the key stages and components of a compiler construction process. It discusses: 1) The major phases of a compiler including scanning, parsing, semantic analysis, code generation and optimization. 2) Important data structures used in compilers like the symbol table, syntax tree and intermediate code. 3) Historical developments in compilers from early assembly languages to modern optimizing compilers. 4) Other related tools like assemblers, linkers, loaders, debuggers and profilers. 5) Techniques for bootstrapping compilers and porting them to new machines or languages.

Uploaded by

Tarun Behera
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

COMPILER CONSTRUCTION Principles and Practice

Kenneth C. Louden San Jose State University 1. 2. 3. 4. 5. 6. 7. 8. INTRODUCTION SCANNING CONTEXT-FREE GRMMARS AND PARSING TOP-DOWN PARSING BOTTOM-UP PARSING SEMANTIC ANALYSIS RUNTIME ENVIRONMENT CODE GENERATION

Main References: 1 Kenneth C. Louden 2 3

Chapter 1

Introduction

Emphasis: History of the compiler Description of programs related to compilers Compiling translation process Major Data Structures of a compiler Other related issues Bootstapping and porting What and Why Compliers? Compilers: Computer Programs that translate one language to another Source language(input) to target language (output)
Source program Target program

compiler

Source language: high-level language c or c++ Target language: object code, machine code (machine instruction) Purposes of learning compilers: 1.Basic knowledge (theoretical techniques --- automata theory) 2.Tools and practical experience to design and program an actual compiler Additional Usage of compiling techniques: Developing command interpreters, interface programs TINY : C-Minus : the language for the discussion in the text consist of a small but sufficiently complex subset of C, It is more extensive than TINY and suitable for a class project.

1.1 A brief history of compiler


1. In the late1940s, the stored-program computer invented by John von Neumann Programs were written in machine language, such as(Intel 8x86 in IBM PCs) c7 06 0000 0002 means to move number 2 to the location 0000 2. Assembly language: numeric codes were replaced symbolic forms. Mov x, 2 Assembler: translate the symbolic codes and memory location of assembly language into the corresponding numeric codes. Defects of the assembly language : difficult to read, write and understanding; Dependent on the particular machine. 3. FORTRAN language and its compiler: between 1954 and 1957, developed by the team at IBM , John Backus. The first compiler was developed 4.The structure of natural language studied by Noam Chomsky, The classification of languages according to the complexity of their grammars and the power of the algorithms needed to recognize them. Four levels of grammars: type 0, type 1,type2,type3 grammars Type 0: Turing machine Type 1: context-sensitive grammar Type 2: context-free grammar, the most useful of programming language Type 3: right-linear grammar, regular expressions and finite automata 5. Parsing problems: studied in 1960s and 1970s Code improvement techniques (optimization techniques): improve Compilers efficiency Compiler-compilers (parser generator ): only in one part of the compiler process. YACC written in 1975 by Steve Johnson for the UNIX system. Lex written in 1975 by Mike Lest. 6. Recent advances in compiler design: application of more sophisticated algorithms for inferring and /or simplifying the information contained in a program. (with the development of more sophisticated programming languages that allow this kind of analysis.) development of standard windowing environments. (interactive development environment. IDE)

1.2 Programs related to compilers


1. Interpreters: Another language translator. It executes the source program immediately. Interpreters Depending on the language in use and the situation Compilers Interpreters: BASIC ,LISP and so on. Compilers : speed execution 2. Assemblers A translator translates assembly language into object code 3. Linkers Collects code separately compiled or assembled in different object files into a file. Connects the code for standard library functions. Connects resources supplied by the operating system of the computer. 4. Loaders Relocatable : the code is not completely fixed . Loaders resolve all relocatable address relative to the starting address. 5. Preprocessors Preprocessors: delete comments, include other files, perform macro substitutions. 6. Editors Produce a standard file( structure based editors) 7. Debuggers Determine execution errors in a compiled program. 8. Profilers Collect statistics on the behavior of an object program during execution. Statistics: the number of times each procedure is called, the percentage of execution time spent in each procedure. 9. project managers coordinate the files being worked on by different people. sccs(source code control system ) and rcs(revision control system) are project manager programs on Unix systems.

The translation process


The phase of a compiler:
Source code

scanner tokens parser Literal table Syntax tree Semantic analyzer Symbol table Annotated tree Source code optimizer Intermediate code Code generator Target code Target code optimizer Target code Error handler

1. The scanner Lexical analysis: input a stream of characters, output tokens a[index] = 4 + 2 Tokens: a, [, index, ], = , 4, + , 2 The task of the scanner: the recognition of tokens, enter identifiers into the symbol table, or enter literal into the literal table. 2. The parser Determine the structure of the program Input: the forms of tokens Output: a parse tree or a syntax tree a syntax tree is a condensation of the information contained in the parse tree.
expression Assign-expression

expression

expression

Subscript-expression

Additive-expressive

expression

expression [ expression Identifier index ] Number 4

expression

Identifier a

Number 2

3. The semantic analyzer Static semantics: be cannot be conveniently expressed as syntax and analyzed by the parser, but can be determined prior to execution. For example: declarations and type checking,data types Dynamic semantics: be determined by executing it , cannot be determined by a compiler.
Assign-expression

Subscript-expression integer

Additive-expression integer

Identifier a Array of integer

Identifier index integer

Number 4 integer

Number 2 integer

4. The source code optimizer Source-level optimization: 4+2 6, constant folding Threeaddress code: (intermediate code: any internal representation for the source code used by the compiler) t = 4+2 a[index] = t two phase optimizer: 1. t = 6 a[index] = t 2. a[index] = 6 intermediate code: any internal representation for the source code used by the compiler. (syntax tree ,three-address, four-address and so on)

5. The code generator Input: intermediate code or IR Output: machine code, code for the target machine 6. The target code optimizer Improve the target code generated by the code generator Task : choosing addressing mode to improve performance Replacing slow instructions by faster ones Eliminating redundant or unnecessary operations
MOV MUL MOV ADD MOV R0 , index R0 , 2 R1, &a R1 , R0 *R1, 6 MOV R0, index SHL R0 MOV &a[R0],6

Major data structures in a compiler


1. tokens: a value of an enumerated data type the sets of tokens 2. the syntax tree: each node is a record whose fields represent the information collected by the parser and semantic analyzer 3. the symbol table: information associated with identifiers: functions, variables, constants, and data types. the scanner the parser insertion The symbol table interacts with the semantic analyzer deletion the optimization access code generation 4. the literal table: store: constants and strings need quick insertion and lookup, need not allow deletions 5. intermediate code : this code kept as an array of text strings, a temporary text file, or as a linked list of structures. 6. temporary files using temporary files to hold the products of intermediate steps for example: backpatch address during code generation if x = 0 then .else . Code : CMP x, 0 JNE NEXT NEXT: .

1.5 other issues in compiler structure


Viewing the compilers structure from different angles: 1. Analysis and synthesis analysis : lexical analysis syntax analysissemantic analysis (optimization) synthesis: code generation (optimization) 2. front end and back end separated depend on the source language or the target language. the front end: the scannerparsersemantic analyzer, intermediate code synthesis the back end: the code generator, some optimization
Front end Source code Intermediate code Back end Target code

Advantage: portability 3. passes passes: process the entire source program several times the initial pass: construct a syntax tree or intermediate code from the source a pass may consist of several phases. One complier with three passes scanning and parsing; semantic analysis and source-level optimization; code generation and target-level optimization. 4. language definition and compilers relation between the language definition and compiler formal definition in mathematical terms for the languages semantics one common method: denotational semantics. The structure and behavior of the runtime environment of the language affect compiler construction 5. compiler options and interfaces interfaces with the operating system provide options to the user for various purposes

10

Bootstrapping and porting


Host language: the language in which the compiler itself is written.
Compiler for language A written in language B Existing compiler for language B Running compiler for language A

Considering the following situations: (1) The existing compiler for B runs on the target machine; (2) The existing compiler for B runs on a machine different from the target machine. (3) How the first compilers were written when no compilers exited yet. At first, the compiler is written in the machine language. Today, the compiler is written in another language T-diagram:
S T H

(H is expected to be the same as T)

A compiler written in language H that translates language S into language T. Combining T-diagram in two ways:
A B H B C H A C H

On the same machine H, a compiler from A to C can be obtained by combine the compiler for A to B with the compiler from B to C.
A B H H K M A B K

Using a compiler from H to K to translate the implementation language of another compiler from H to K.

11

The solution to the first situation mentioned above:


A H B B H H A H H

The solution to the second situation mentioned above:


A H B B K K A H K

The issue of a blunder of circularity:


S T S

Its common to write the compiler with the source language. The solution to the third situation mentioned above -----Bootstrapping:
A H A
Compiler written in own language A

H H

H H

Compiler in machine language

Running but inefficient compiler

H A A H H A H H

Compiler written in own language A

Running but inefficient compiler 12

Final version of the compiler

Solution to the porting: In order to port the compiler from old host H to the new host K, use the old compiler to produce a cross compiler and recompile the compiler to generate the new one. Step 1
A k A
Compiler source code retargeted to K

H H

k H

Original Compiler

Cross Compiler

Step 2
A
Compiler source code retargeted to K

K A A K H A H K

Cross Compiler

Retargeted compiler

13

The TINY sample language and compiler


Language TINY: as a running example ( as a source language ) Target language: assembly language (TM machine)

the tiny language


The features of a program in TINY:

1. 2. 3. 4. 5. 6. 7.

a sequence of statements separated by semicolons no procedure, no declarations all variables are integer, two control statement : if-else and repeat read and write statements comments with curly brackets; but can not be nested expressions are Boolean and integer arithmetic expressions ( using < ,=), (+,-,* /, parentheses, constants, variables ), Boolean expressions are only as tests in control statements.

One sample program in TINY: Factorial function Read x; {input an integer} If x>0 then {dont compute if x <=0} Fact:=1; Repeat Fact :=fact *x; X:=x-1; Until x=0; Write fact {output factorial of x} End

14

The TINY compiler C files: globals.h, util.h, scan.h, parse.h, symtab.h, analyze.h, code.h, cgen.h Main.c, util.c, scan.c, parse.c, symtab.c, analyze.c, code.c, cgen.c Four passes: 1. The scanner and the parser 2. semantic analysis: constructing the symbol table 3. semantic analysis: type checking 4. the code generator main.c drives these passes. The central code is as follows: syntaxTree = parse( ); buildSymtab (syntaxTree); typeCheck(syntaxTree); codeGen(syntaxTree,codefile); The TM Machine The target language: the assembly language TM machine has some the properties of Reduced Instruction Set Computers(RISC). 1. all arithmetic and testing must take place in registers. 2. the addressing modes are extremely limited. The simulator of the TM machine can directly execute the assembly files.

15

You might also like