0% found this document useful (0 votes)
25 views39 pages

CD Chapter 1

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views39 pages

CD Chapter 1

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

WCU

College of Engineering and Technology


Dep’t : Computer Science

Course: Compiler Design

Course Outline
Instructor: Agegnehu Ashenafi (MSc.)
Course Objective:
 To learn basic techniques used in compiler
construction such as lexical analysis, top-down
and bottom-up parsing, and intermediate code
generation.
 To learn basic data structures used in compiler
construction such as abstract syntax trees,
symbol tables, and three-address code.
 To learn software tools used in compiler
construction such as lexical analyzer generators,
and parser generators.
Course Content:
Chapter 1: Introduction
– 1.0. Compilers
– 1.1. Language processing
– 1.2. System Analysis of source
– 1.3. Program Phases of a Compiler
– 1.4. Compiler Construction Tools
Chapter 2: Lexical analysis
– The role of the lexical analyzer
– Token Specification
– Recognition of Tokens
Chapter 3: Syntax analysis
– Role of a parser
– Syntax error handling
– Top down parsing
– Bottom up parsing
Course Content … cont
Chapter 4: Intermediate code generation
– Intermediate languages
– Declarations
– Assignment statements
Chapter 5: Code optimization and Code Generation
– Issues in design of a code generator
– Simple Code generator
– Introduction to code optimization
– Optimization of basic block

Text Books:
Alfred Aho, Ravi Sethi, V.Jeffery Ullman D. “COMPILERS PRINCIPLES,
TECHINQUES AND TOOLS “, Addison- Wesley, 1988.
Chapter 1: Introduction to Compilers
What is Compiler Design?
 A Compiler is computer software that transforms source
program code which is written in a high-level language into
low-level machine code.
 Compiler design is the process of developing a program or
software that converts human-written code into machine
code. It involves many stages like lexical analysis, parsing,
semantic analysis, code generation, optimization, etc.
 Compiler Design is the structure and set of principles that
guide the translation, analysis, and optimization process of a
compiler.
 Process of deciding (arranging) how different parts of sth (e.g.,
in this case “compiler”), building, drawing, works(ALD
Oxford).
Chapter 1: Introduction to Compilers …con’t
Questions
 What is Program, programming, programmer,
translator?
 How do compilers become special over
interpreters and vice versa?
 Why lexical analysis, syntax analysis, semantic
analysis are needed?
 Why you learn compiler design?
 How do you match compiling time with
interpreting time?
 As a compiler design learners, how do the
machine detect the lexical error and syntax error,
and their recovery?
Chapter 1: Introduction to Compilers …con’t
1.1. Language Processing System
a) Translator
 Translator is a program that takes a program as input
written in one language and produces a program as output
in another language.
 Beside the program translation the translator performs
another very important role is error detection.
 During translation, any violation of high level language
specification would be detected and reported to the
programmers.
1. Preprocessors
Preprocessor is a computer program that modifies data to adapt with the input
requirements of another program. It is a macro processor which automatically
transform a program before actual compilation. It is responsible for starting
and ends of the program.
They may perform the following functions:
i. Macro Processing: - A preprocessor may allow a user to define macros that
are short hands for longer constructs. Example: #define MaxNo 4
ii. File Inclusion: - A preprocessor may include header files into the program
text. For example, the C preprocessor causes the contents of the file <global.h>
to replace the statement #include <global.h> when it processes a file
containing this statement.
iii. Rational Preprocessors: - These processors augment older languages
with more modern flow-of-control and data-structuring facilities. For
example, such a preprocessor might provide the user with built-in macros for
constructs like while-statements or if-statements, where none exist in the
programming language itself.
iv. Language Extensions: - These processors attempt to add capabilities
to the language by what amounts to built-in macros, For example, the
language Equel is a database query language embedded in C.
 Statements beginning with ## are taken by the preprocessor
to be database-access statements, unrelated to C, and are
translated into procedure calls on routines that perform the
database access.
 Macro processors deal with two kinds of statement:
 macro definition
 macro use
 Definitions are normally indicated by some unique character or keyword,
like define or macro.
 They consist of a name for the macro being defined and a body, forming
its definition.
 The use of a macro consists of naming the macro and supplying actual
parameters, that is Values for its formal parameters.
 The macro processor substitutes the actual parameters for the formal
parameters in the body of the macro; the transformed body then replaces
the macro use itself.
2. Compiler
 A Compiler is computer software that transforms source program
code which is written in a high-level language into low-level
machine code.
 In order to reduce the complexity of designing and building
computers, nearly all of these are made to execute relatively
simple commands (but do so very quickly).
 A program for a computer must be built by combining these
very simple commands into a program in, what is called
machine language.
 Since this is a tedious and error-prone process; most
programming is, instead, done using a high-level programming
language.
 This language can be very different from the machine
language, in that the computer can execute and so some means
of bridging the gap is required.
 This is where the compiler comes in.
2. Compiler …cont
 A compiler translates (or compiles) a program written in a
high-level programming language that is suitable for human
programmers into the low-level assembly language.
 During this process, the compiler will also attempt to
promote and report obvious programmer mistakes.
 Using a high-level language for programming has a large
impact on how fast programs can be developed.
 The main reasons for this are:
 Compared to machine language, the notation used by
programming languages is closer to the way humans think about
problems.
PARTS and Modules OF COMPILATION
There are two parts to compilation:
i) Analysis part (lexical analysis, syntax analysis, semantic analysis)
ii) Synthesis part(intermediate code gen., code optimization and code generation)

 The analysis part breaks up the source program into constituent pieces
and creates an intermediate representation of the source program.
 The synthesis part constructs the desired target program from the
intermediate representation. From the two parts, synthesis requires the
most specialized techniques.

 The compiler has two modules namely the front end and the back end.
 In compilers, the frontend translates a computer programming source
code into an intermediate representation.
 Front-end constitutes lexical analysis, syntax analysis, semantic analysis,
intermediate code generation and creation of symbol table;
 Whereas, the back-end(code optimization and code generation) works
with the intermediate representation to produce code in a computer
output language.
 The backend usually optimizes to produce code that runs faster.
PARTS OF COMPILATION …cont
During analysis, the operations implied by the
source program are determined and recorded
in a hierarchical structure called a tree.
Often, a special kind of tree called a syntax
tree is used, in which each node represents an
operation and the children of a node
represent the arguments of the operation.
Many software tools that manipulate source
programs first perform some kind of analysis.
Some examples of such tools are
Structure editor
Pretty printers
Static checkers
Interpreters
Structure editor
 A structure editor takes as input a sequence of
commands to build a source program.
 The structure editor not only performs the text-
creation and modification functions of an ordinary text
editor, but it also analyzes the program text, putting an
appropriate hierarchical structure on the source
program.
 For example, it can check that the input is correctly
formed, can supply keywords automatically (e.g., when
the user types while.
 The editor supplies the matching do and reminds the
user that a conditional must come between them), and
can jump from a begin or left parenthesis to its
matching end or right parenthesis.
Pretty printers
 A pretty printer analyzes a program and prints it in which a
way that the structure of the program becomes clearly
visible.
 For example, comments may appear in a special font, and
statements may appear with an amount of indentation
proportional to the depth of their nesting in the
hierarchical organization of the statements.

Static checkers
 A static checker reads a program, analyzes it, and attempts
to discover potential bugs without running the program.
 For example, a static checker may detect that parts of the
source program can never be executed.
 It can catch logical errors such as trying to use a real
variable as a pointer.
3. Interpreters
 An interpreter, like a compiler, translates high-level language
into low-level language. Example: Ruby, PHP, JavaScript, Java
 The difference lies in the way they read the source code or input.

 A compiler reads the whole source code at once, creates


tokens, checks semantics, generates intermediate code,
executes the whole program, and may involve many passes.
Example: C, C++, C#
 In contrast, an interpreter reads a statement from the input
converts it to an intermediate code, executes it, then takes the
next statement in sequence.
 If an error occurs, an interpreter stops execution and reports
it; whereas, a compiler reads the whole program even if it
encounters several errors.
Languages such as BASIC, SNOBOL, LISP can be
translated using interpreters. JAVA also uses interpreter.
The process of interpretation can be carried out in
following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
4. Assemblers
 An Assembler translates assembly language programs into
machine code. The output of an assembler is called an object
file, which contains a combination of machine instructions as
well as the data required to place these instructions in
memory.
 Some compilers produce assembly code that is passed to an
assembler for further processing, other compilers perform the
job of the assembler, producing relocatable machine code
that can be passed directly to the loader/linker-editor.
 Assembly code is a mnemonic version of machine code, in
which names are used instead of binary codes for operations,
and names are also given to memory addresses.
 A typical sequence of assembly instruction might be as below:
 This code moves the contents of the address a in to register 1,
then adds the constant 2 to it and finally stores the result in
the location named by b. Thus, it computes b : = a + 2.
Two Passes in Assembler
 The simplest form of assembler makes two passes over the
input, where a pass consists of reading an input file once.
 In the first pass, all the identifiers that denote storage
locations (addresses) are found and stored in a symbol table.
 Identifiers are assigned storage locations as they are
encountered for the first time, so after reading, the symbol
table might contain the entries shown in the figure below.
 In that figure, we have assumed that a word, consisting of 4-
bytes, is set aside for each identifier, and that addresses are
assigned starting from 0-byte for assembler code of the above
example b : = a + 2.
Two Pass Assembler … cont
In the second pass, the assembler scans the input
again. This time, it translates each operation code
into the sequence of bits representing that
operation in machine language, and
it translates each identifier representing a location
into the address given for that identifier in the
symbol table.
The output of the second pass is usually relocatable
machine code, meaning that it can be loaded
starting at any location L in memory.
5. Loaders and linker-editors
 Usually, a program called a loader performs the two functions
of loading and linker-editing.
– The process of loading consists of taking relocatable
machine code, altering the relocatable addresses, and
placing the altered instructions and data in memory at the
proper locations.
– The linker-editor allows us to make a single program from
several files of relocatable machine code, these files may
have been the result of several different compilations, and
one or more may be library files of routines provided by
the system and available to any program that needs them.
 The relocatable machine code file must retain the
information in the symbol table for each data location or
instruction label that is referred to externally.
 If we do not know in advance what might be referred to, we in
effect must include the entire assembler symbol table as part
of the relocatable machine code.
1.2. Analysis of the Source Program
Analysis consists of 3-parts
i) Linear Analysis –
 In a compiler, linear analysis is also called lexical analysis or scanning.
 It is the process of reading a character from left-to-right and grouped into
tokens that are sequences of characters having a collective meaning.
ii) Syntax/Hierarchical analysis –
 Hierarchical analysis is also called as syntax analysis or parsing.
 In this analysis the characters or tokens are grouped hierarchically into
nested collections with collective meaning.
iii) Semantic analysis:
 The semantic analysis phase checks the source program for semantic
errors and gathers type information for the subsequent code generation
phase.
 It uses the hierarchical structure determined by the syntax-analysis phase to
identify the operators and operands of expressions and statements.
 In semantic certain checks are performed to ensure that the components of
a program fit together meaningfully.
 Here the compiler checks that each operator has operands that are permitted
by the source language specification known as type checking.
1.3. The Phases of a Compiler
 A compiler operates in phases, each of which transforms the source
program from one representation to another.
 A typical decomposition of a compiler is shown in the following
figure.
 The first three phases forming the bulk of the analysis portion of a
compiler.
 Symbol table management and error handling, are shown interacting
with the six phases of the compiler.
a. Symbol-table management
 An essential function of a compiler is to record the identifiers used
in the source program and collect information about various
attributes of each identifier.
 These attributes may provide information about the storage
allocated for an identifier, its type, its scope and, in the case of
procedure names, such things as the number and types of its
arguments, the method of passing each argument and the type
returned.
 A symbol table is a data structure containing a record for each
identifier, with fields for the attributes of the identifier.
 The data structure allows us to find the record for each
identifier quickly and to store or retrieve data from that
record quickly.
 When an identifier in the source program is detected by the
lexical analyzer, the identifier is entered into the symbol table.
 However, the attributes of an identifier cannot normally
determined during lexical analysis. For example, in a Pascal
declaration like
var position, initial , rate : real ; (in a Pascal )
position := initial + rate * 60 e.g.: id1 = id2 + id3 * 60
 the type real is not known when position, initial , and rate are seen by the
lexical analyzer.
 The remaining phases enter information about identifiers into the symbol
table and then use this information in various ways.
b. Error detection and reporting
 Each phase can encounter errors. However, after
detecting an error, a phase must deal with that error,
so that compilation can proceed, allowing further
errors in the source program to be detected.
 The lexical phase can detect errors where the
characters remaining in the input do not form any
token of the language.
 Errors where the token stream violates the structure
rules of the language are determined by the syntax
analysis phase
 During semantic analysis the compiler tries to detect
constructs that have the right syntactic structure but
no meaning to the operation involved.
c. Lexical analysis
 The lexical analysis phase reads the characters in the source
program and groups them into a stream of tokens.
 Each token represents a logically cohesive sequence of characters,
such as an identifier, a keyword (if, while, etc,), a punctuation
character, or a multi-character operator like :=.
 The character sequence forming a token is called the lexeme for the
token.
 Certain tokens will be augmented by a "lexical value."
 The lexical analyzer not only generates a token, say id, but also
enters the lexeme rate into the symbol table.
 Consider the above expression
position := initial + rate * 60
 The representation of the above expression after the lexical analysis
is
id1 = id2 + id3 * 60
d. Syntax analysis
It groups token together into syntactic
structures. (Fig.1.11a.. syntax tree)
A typical data structure for the tree is shown
in Fig. 1.11(b) in which an interior node is a
record with a field for the operator and two
fields containing pointers to the records for
the left and right children.
A leaf is a record with two or more fields, one
to identify the token at the leaf, and the
others to record information about the token.
e. Semantic analysis
 An important component of semantic analysis is type checking. Here the compilers
checks that each operator has operands that are permitted by the source language
specification.
 For e.g., Many programming language definition require a compiler to report an
error every time a real number is used to index an array.
 However, the language specification may permit some operand coercions, for
example, when a binary arithmetic operator is applied to an integer and real, in
this case, the compiler may need to convert the integer to a real. Eg. int a[50.0] is
error but int a[50] is correct
f. Intermediate Code Generation
 After syntax and semantic analysis, some compilers
generate an explicit intermediate representation of the
source program.
 This intermediate representation should have two
important properties. it should be easy to produce and easy
to translate into the target program.
 The intermediate representation can have a variety of
forms and one of the forms is called “Three address code”,
which is like the assembly language for a machine in which
every memory location can act like a register.
 Three-address code consists of a sequence of instructions,
each of which has at most three operands.
 Three address code for the statement position : = initial + rate * 60 is
Three address code
 It is a linearized representation of a syntax tree, where the names
of the temporaries correspond to the nodes.
 The use of names for intermediate values allows three-address
code to be easily rearranged which is convenient for optimization.
 This allows the compiler to analyze the code and perform
optimizations that can improve the performance of the
generated code.
Intermediate form has several properties.
 First, each three-address instruction has at most one operator in
addition to the assignment.
 Thus, when generating these instructions, the compiler has to
decide on the order in which operations are to be done; the
multiplication precedes the addition in the source program of (1.1).
 Second, the compiler must generate a temporary name to hold the
value computed by each instruction.
 Third, some "three address" instructions have fewer than three
operands, e.g., the first and last instructions in ( 1.3).
g. Code Optimization
The code optimization phase attempts to improve
the intermediate code, so that faster running
machine code will result.
The above intermediate code is optimized like
this,

Int to real operation can be eliminated by the


conversion of 60 integer in to real and temp3 is
used only once, to transmit its value to id1, so it
can be eliminated.
h. Code Generation
 The final phase of the compiler is the generation of target
code, consisting normally of relocatable machine code or
assembly code, Memory locations are selected for each of
the variables used by the program.
 Then, intermediate instructions are each translated into a
sequence of machine instructions that perform the same
task.
 A crucial aspect is the assignment of variables to registers.
For example, using registers 1and 2, the translation of the
code of the above code might become
1.4. Compiler-construction tools
 In addition to these software-development tools, .other more specialized
tool has been developed for helping to implement various phases of a
compiler.
 Some general tools have been created for the automatic design of specific
compiler components, these tools use specialized languages for specifying
and implementing the component, and many use algorithms that are
quite sophisticated.
The following are list of some useful compiler- construction tools
1. Parser generators - These produce syntax analyzers, normally from input
that is based on a context-free grammar. In early compilers, syntax analysis
consumed not only a large fraction of the running time of a compiler, but a
large fraction of the intellectual effort of writing a compiler. This phase is now
considered one of the easiest to implement. Many parser generators utilize
powerful parsing algorithms that are too complex to be carried out by hand.
2. Scanner generators - These automatically generate lexical analyzers,
normally from a specification based on regular expressions. The basic
organization of the resulting lexical analyzer is in effect a finite automaton.
3. Syntax-directed translation engines - These produce collections of routines
that walk the parse tree, such as intermediate code. The basic idea is that one
or more "translations" are associated with each node of the parse tree, and
each translation is defined in terms of translations at its neighbor nodes in
the tree.
1.4. Compiler-construction tools …con’t
4. Automatic code generators - Such a tool takes a collection of rules
that define the translation of each operation of the intermediate
language into the machine language for the target machine.
The rules must include sufficient detail that we can handle the
different possible access methods for data; e.g.. Variables may be in
registers, in a fixed (static) location in memory, or may be allocated a
position on a stack.
The basic technique is "template matching." The intermediate code
statements are replaced by "templates" that represent sequences of
machine instructions, in such a way that the assumptions about
storage of variables match from template to template.
5. Data flow engines - Much of the information needed to perform
good code optimization involves "data-flow analysis," the gathering of
information about how values are transmitted from one part of a
program to each other part. Different tasks of this nature can be
performed by essentially the same routine, with the user supplying
details of the relationship between intermediate code statements and
the information being gathered
Thank You !

You might also like