What Is A Compiler?
What Is A Compiler?
c Is a program that translates one language to another c Takes as input a source program typically written in a high-level language c Produces an equivalent target program typically in assembly or machine language c Reports error messages as part of the translation process
Compiler
Error messages
c First computers of late 1940s were programmed in machine language c Machine language was soon replaced by assembly language
\ \ \
Instructions and memory locations are given symbolic names An assembler translates the symbolic assembly code into equivalent machine code Assembly language improved programming, but is still machine dependent
Compiler Design Muhammed Mudawwar
Introduction to Compiling 1
Brief History
c The term compiler was coined in the early 1950s by Grace Murray Hopper
\
Translation was then viewed as the compilation of a sequence of routines selected from a library
Proved the viability of high-level and thus less machine dependent languages
c The study of the scanning and parsing problems were pursued in the 1960s and
Became standard part of compiler theory Resulted in scanner and parser generators that automate part of compiler development
optimization techniques, is still an ongoing research c Compiler technology was also applied in rather unexpected areas:
\ \
Text-formatting languages Hardware description languages for the automatic creation of VLSI circuits
Compiler Design Muhammed Mudawwar
Introduction to Compiling 2
Introduction to Compiling 3
Scanner Literal Table Parser Symbol Table Error Handler Semantic Analyzer
Annotated Tree Syntax Tree Tokens
Scanner
c The scanner begins the analysis of the source program by: \ Reading file character by character \ Grouping characters into tokens \ Eliminating unneeded information (comments and white space) \ Entering preliminary information into literal or symbol tables \ Processing compiler directives by setting flags c Tokens represent basic program entities such as: \ Identifiers, Literals, Reserved Words, Operators, Delimiters, etc. c Example: a := x + y * 2.5 ; is scanned as
a := x + identifier assignment operator identifier plus operator y * 2.5 ; identifier multiplication operator real literal semicolon
Compiler Design Muhammed Mudawwar
Introduction to Compiling 5
Parser
c Receives tokens from the scanner c Recognizes the structure of the program as a parse tree \ Parse tree is recognized according to a context-free grammar \ Syntax errors are reported if the program is syntactically incorrect c A parse tree is inefficient to represent the structure of a program c A syntax tree is a more condensed version of the parse tree c A syntax tree is usually generated as output by the parser
assign-stmt id a := expr id x expr + expr id y
Introduction to Compiling 6
a := x + y * 2.5 ;
; expr * expr literal 2.5 id a := + id x *
Parse Tree
Syntax Tree
id y literal 2.5
Semantic Analyzer
c The semantics of a program are its meaning as opposed to syntax or structure c The semantics consist of:
\ \
Runtime semantics behavior of program at runtime Static semantics checked by the compiler Declarations of variables and constants before use Calling functions that exist (predefined in a library or defined by the user) Passing parameters properly := real Type checking.
id a real + real id x real * real int2real literal 2.5 real id y integer
c Static semantics are difficult to check by the parser c The semantic analyzer does the following:
\ \
Checks the static semantics of the language Annotates the syntax tree with type information
Introduction to Compiling 7
Should be easy to produce Should be easy to translate into the target program Three-address code, P-code for an abstract machine, Tree or DAG representation
:= real id a real + real id x real * real int2real literal 2.5 real id y integer
Three-address code
temp1 := int2real(y) temp2 := temp1 real* 2.5 temp3 := x real+ temp2 a := temp3
Code Generator
c Generates code for the target machine, typically: \ Assembly code, or \ Relocatable machine code c Properties of the target machine become a major factor c Code generator selects appropriate machine instructions c Allocates memory locations for variables c Allocates registers for intermediate computations
Assembly code (Hypothetical) Three-address code
temp1 := int2real(y) temp2 := temp1 * 2.5 temp3 := x + temp2 a := temp3
Introduction to Compiling 9
;; R1 y ;; F1 int2real(R1) ;; F2 F1 * 2.5 ;; F3 x ;; F4 F3 + F2 ;; a F4
Code Improvement
c Code improvement techniques can be applied to: \ Intermediate code independent of the target machine \ Target code dependent on the target machine c Intermediate code improvement include: \ Constant folding \ Elimination of common sub-expressions \ Identification and elimination of unreachable code (called dead code) \ Improving loops \ Improving function calls c Target code improvement include: \ Allocation and use of registers \ Selection of better (faster) instructions and addressing modes
Introduction to Compiling 10
Interpreter
c Is a program that reads a source program and executes it c Works by analyzing and executing the source program commands one at a time c Does not translate the source program into object code c Interpretation is sensible when:
\ \ \ \
Programmer is working in interactive mode and needs to view and update variables Running speed is not important Commands have simple formats, and thus can be quickly analyzed and executed Modification or addition to user programs is required as execution proceeds Basic interpreter, Lisp interpreter, UNIX shell command interpreter, SQL interpreter Some languages are designed to be interpreted, others are designed to be compiled Execution speed degradation can vary from 10:1 to 100:1 Substantial space overhead may be involved
Compiler Design Muhammed Mudawwar
Introduction to Compiling 11
C or C++ Program
Preprocessor
C or C++ Program with macro substitutions and file inclusions
c Assembler
\ \
Compiler
Assembly code
\ \ \ \
Output is relocatable machine code. Links object files separately compiled or assembled Links object files to standard library functions Generates a file that can be loaded and executed
Assembler
Relocatable object module
c Linkers
Linker
Executable code
c Debuggers c Editors
Introduction to Compiling 12
Represented by an integer value or an enumeration literal Sometimes, it is necessary to preserve the string of characters that was scanned
a For example, name of an identifiers or value of a literal
c Syntax Tree
\ \ \ \ \ \ \ \ \ \
Constructed as a pointer-based structure Dynamically allocated as parsing proceeds Nodes have fields containing information collected by the parser and semantic analyzer Keeps information associated with all kinds of identifiers:
a Constants, variables, functions, parameters, types, fields, etc.
c Symbol Table
Identifiers are entered by the scanner, parser, or semantic analyzer Semantic analyzer adds type information and other attributes Code generation and optimization phases use the information in the symbol table Insertion, deletion, and search operations need to efficient because they are frequent Hash table with constant-time operations is usually the preferred choice More than one symbol table may be used
Compiler Design Muhammed Mudawwar
Introduction to Compiling 13
the replication of constants and strings. \ Quick insertion and lookup are essential. Deletion is not necessary. c Temporary Files \ Used historically by old compilers due to memory constraints \ Hold the data of various stages
Introduction to Compiling 14