Module 2&3
Module 2&3
Module 2&3
Assembler:
Compiler Design 2
- Compiler: translates a source program written in a High-
Level Language (HLL) such as Pascal, C++ into
computer’s machine language (Low-Level Language
(LLL)).
* The time of conversion from source program into
object program is called compile time
* The object program is executed at run time
Compiler Design 3
A compiler can be treated as a single box that maps a
source program into a semantically equivalent target
program.
The analysis part (Front End) breaks up the source
Compiler Design 7
1.”position” is a lexeme mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. = is a lexeme that is mapped into the token<=>. Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token <id, 2>, where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token <+>.
5. “rate” is a lexeme mapped into the token <id, 3>, where 3 points to the symbol-table
entry for rate.
6. * is a lexeme that is mapped into the token <*> .
7. 60 is a lexeme that is mapped into the token <60>
Compiler Design 9
The semantic analyzer uses the syntax tree and the information in the symbol
table to check the source program for semantic consistency with the language
definition.
Gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.
Compiler Design 10
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation
(a program for an abstract machine). This intermediate representation
should have two important properties:
◦ It should be easy to produce and
◦ It should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.
Compiler Design 11
The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
Usually better means:
◦ faster, shorter code, or target code that consumes less power.
The optimizer can deduce that the conversion of 60 from integer to floating
point can be done once and for all at compile time, so the int to float
operation can be eliminated by replacing the integer 60 by the floating-point
number 60.0. Moreover, t3 is used only once
There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much.
Compiler Design 12
If the target language is machine code, registers or memory
locations are selected for each of the variables used by the
program.
Then, the intermediate instructions are translated into sequences
of machine instructions that perform the same task.
A crucial aspect of code generation is the Judicious assignment
of registers to hold variables.
Compiler Design 13
Compiler Design 14
The symbol table is a data structure containing a
record for each variable name, with fields for the
attributes of the name.
The data structure should be designed to allow the
compiler to find the record for each name quickly and
to store or retrieve data from that record quickly
languages.
In imperative languages there is a notion of program
design objectives:
◦ Optimization must be correct, that is preserve the meaning
of the compiled program,
◦ Optimization must improve the performance of programs.
◦ The compilation time must be kept reasonable and
◦ engineering effort required must be manageable.
Optimizations speed ups execution time also
conserve power.
Compilation time should be short to support a
program in, but are less efficient, that is, the target
programs generated run more slowly.
Programmers using a low-level language have more
◦ Data abstraction.
◦ Inheritance of properties
Both of which have been found to make programs
more modular and easier to maintain.
Compiler optimizations have been developed
to reduce the overhead ex: unnecessary range
check, unreachable objects
Effective algorithms have been developed to
backward compatibility.
Hardware Synthesis - hardware designs are mostly
analysis.
Stripping out comments and white spaces
Correlating error messages with source
program
Keeps track of Line numbers
Expansion of macros
Sometimes, lexical analyzers are divided into
a cascade of two processes:
◦ Scanning consists of the simple processes that
do not require tokenization of the input, such as
deletion of comments and compaction of
consecutive whitespace characters into one.
◦ Lexical analysis proper is the more complex
portion, scanner produces the sequence of
tokens as output.
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
Simplicity of design is the most important
consideration.
Compiler efficiency is improved.
Compiler Construction
The RE are built Recursively out of smaller regular
expressions, using the rules described below.
Each RE r denotes a language L(r), which is also
defined recursively from the languages denoted by r’s
sub expressions.
Compiler Construction
Here are the rules that define the regular expression
over alphabet ∑ and languages that those
expressions denote.
Basis: there are two rules form the basis:
€ is a regular expression, and L(€) is { € }, that is,
Compiler Construction
Induction:
- Larger regular expressions are built from smaller ones.
Let r and s are regular expressions denoting languages L(r)
and L(s), respectively.
1. (r) | (s) is a regular expression denoting the language L(r) U
L(s).
2. (r) (s) is a regular expression denoting the language L(r) L(s)
3. (r) * is a regular expression denoting (L (r)) * .
4. (r) is a regular expression denoting L(r).
Compiler Construction
This last rule says that we can add additional pairs of
parentheses around expressions without changing the
language they denote.
for example, we may replace the regular expression
Compiler Construction
Example: Let ∑ = {a, b}
The regular expression a|b denotes the language {a, b}.
(a|b) (a|b) denotes {aa, ab, ba, bb} the language of all strings of
instances of a or b, that is, all strings of a's and b's: {Є, a, b, aa,
ab, ba, bb, aaa, ... }.
a|a*b denotes the language {ab, b, ab, aab, aaab, ... }, that is, the
string a and all strings consisting of zero or more a's and ending
in b.
A language that can be defined by a Regular
Expressions is called regular set.
If two RE r and s denote the same regular set,
Compiler Construction
The terminals of the grammar, which are if, then,
else, relop, id, and number, are the names of tokens
as used by the lexical analyzer.
The lexical analyzer also has the job of stripping out
called States.
Each state represents a condition that could occur during
2 return(relop,LE)
=
1 3 return(relop,NE)
>
<
start other *
0 = 5 4 return(relop,LT)
return(relop,EQ)
>
=
6 7 return(relop,GE)
letter other
*
start
9 10 11
return ( getToken(),
installID() )
Compiler Construction
Two questions remain.
1. How do we distinguish between identifiers and
keywords such as if, then and else, which also match
the pattern in the transition diagram?
2. What is (getToken(), installID())?
1) Install the reserved words in the Symbol Table initially.
installID() checks if the lexeme is already in the symbol
table. If it is not present, the lexeme is installed ( places
it in symbol table) as an id token. In either case a pointer
to the entry is returned.
getToken examines the symbol table entry for the lexeme
found and returns the token name
2) Create separate transition for each keyword;
Multiple accepting state
Accepting float
e.g. 12.31E4
Compiler Construction
There are several ways that a collections of
Transition Diagram can be used to build a LA.
1. Call–by-Value
2. Call-by-Reference
3. Call-by-Name:
Actual parameter were substituted
literally for the formal parameter(as if macro
for Actual parameter) in the code of callee.
Two formal parameter can refer to the same
location such variables called aliases of one
another.
Ex: a is an array of procedure p
syntactic structure.
developed iteratively.
Verifies the tokens can be generated by the
grammar
Report syntax errors
Recover from commonly occurring errors
Construct a parse tree and passes it to rest of
the compiler
We categorize the parsers into three groups:
1 Universal parser
Can parse any grammar but too inefficient to
use in production compilers
2 Top-Down Parser
the parse tree is created top to bottom,
starting from the root.
3 Bottom-Up Parser
the parse is created bottom to top; starting
from the leaves
Both top-down and bottom-up parsers
scan the input from left to right
(one symbol at a time).
Efficient top-down and bottom-up parsers
designer.
It is used in several error-repairing compliers,