0% found this document useful (0 votes)
23 views6 pages

CD Finalized Notes

Uploaded by

gagannag306
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

CD Finalized Notes

Uploaded by

gagannag306
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SYSTEM SOFTWARE & COMPILERS 18CS61

MODULE-2
INRODUCTION TO COMPILER DESIGN
1.1 Language Processors:
✓ Compiler: A compiler is a program that can read a program in one language — the source
language — and translate it into an equivalent program in another language — the target
language. See Fig. 1.1.
✓ An important role of the compiler is to report any errors in the source program that it detects during
the translation process.

Figure 1.1: A compiler


✓ If the target program is an executable machine-language program, it can then be called by the user
to process inputs and produce outputs as in Fig. 1.2.

Figure 1.2: Running the target program


✓ An interpreter is another language processor. Instead of producing a target program as a translation,
an interpreter appears to directly execute the operations specified in the source program on
inputs supplied by the user, as shown in Fig. 1.3.

Figure 1.3: An interpreter


✓ Example: Java language processors combine compilation and interpretation, as shown in Fig.
1.4. A Java source program may first be compiled into an intermediate form called bytecodes.
✓ The bytecodes are then interpreted by a virtual machine. A benefit of this arrangement is that
bytecodes compiled on one machine can be interpreted on another machine.
✓ In order to achieve faster processing of inputs to outputs, some Java compilers, called just-in-time
compilers, translate the bytecodes into machine language immediately before they run the
intermediate program to process the input.

Figure 1.4: A hybrid compiler

Page 1
SYSTEM SOFTWARE & COMPILERS 18CS61

COMPILER INTERPRETER
1. It works on complete program at once 1. It works line-by-line (it takes one statement
(entire program as input). at a time as input).
2. It generates Intermediate code called 2. It doesn’t generate Intermediate Object code
Object code or Machine code. or Machine code.
3. It executes Conditional control statements 3. It executes Conditional control statements
(if-else, switch-case) & logic Constructs faster (if-else, switch-case) & logic Constructs
than Interpreter. Slower than Compiler.
4. The Compiled program takes more 4. The interpreted programs are more Memory
memory (entire object code has to reside in efficient (it doesn’t generate object code).
memory).
5. Compile the Program once and run any 5. Program are interpreted line-by-line every
time. time when they run.
6. Errors are reported after entire program is 6. Error is reported as soon as the first error is
checked. encountered, rest of program will not bechecked
until error is removed.
7. Compiled language is difficult to debug. 7. It is easy to debug, because it stops & reports
errors as it encounters them.
8.Examples:C,C++,COBOL 8.Examples:BASIC,PHP,MATLAB,LISP
✓ A language-processing system:
1. A source program may be divided into modules stored in separate files. The task of collecting
the source program is sometimes entrusted to a separate program, called a preprocessor.
2. The compiler may produce an assembly-language program as its output, because assembly
language is easier to produce as output and is easier to debug.
3. The assembly language is then processed by a program called an assembler that produces
relocatable machine code as its output.
4. Large programs are often compiled in pieces, so the relocatable machine code may have to be
linked together with other relocatable object files and library files into the code that actually
runs on the machine.
5. The linker resolves external memory addresses, where the code in one file may refer to a location
in another file.
6. The loader then puts together the entire executable object files into memory for execution.

Figure 1.5: A language-processing system

Page 2
SYSTEM SOFTWARE & COMPILERS 18CS61

1.2 The Structure of a Compiler


✓ In the compiler there are two parts: analysis and synthesis.
✓ The analysis part breaks up the source program into constituent pieces and imposes a grammatical
structure on them. The analysis part also collects information about the source program and stores
it in a data structure called a symbol table, which is passed along with the intermediate
representation to the synthesis part.
✓ The synthesis part constructs the desired target program from the intermediate representation and
the information in the symbol table.
✓ The analysis part is often called the front end of the compiler; the synthesis part is the back end.
✓ Symbol table: It is a data structure containing a record for each variable name, with fields for the
attributes of the name. The data structure should be designed to allow the compiler to find the record
for each name quickly and to store or retrieve data from that record quickly.
✓ A typical decomposition of a compiler into phases is shown in Fig. 1.6.

Figure 1.6: Phases of a compiler. 1.7: Translation of an assignment statement

Page 3
SYSTEM SOFTWARE & COMPILERS 18CS61

1.2.1 Lexical Analysis or Scanning


Input: The lexical analyzer reads the stream of characters making up the source program and
groups the characters into meaningful sequences called lexemes.
Output: For each lexeme, the lexical analyzer produces as output a token of the form (token-
name, attribute-value) that it passes on to the subsequent phase, syntax analysis.
✓ In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token.
Example: Input → position = initial + rate * 60 (1.1)
✓ The characters in this assignment could be grouped into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:
1. position is a lexeme that would be mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol-table entry for
an identifier holds information about the identifier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token (=). Since this token needs
no attribute-value.
3. initial is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table entry
for initial.
4. + is a lexeme that is mapped into the token (+).
5. rate is a lexeme that is mapped into the token (id, 3), where 3 points to the symbol-table entry for
rate.
6. * is a lexeme that is mapped into the token (*).
7. 60 is a lexeme that is mapped into the token (60).
✓ After lexical analysis as the sequence of tokens:
Output → (id,l) (=) (id, 2) (+) (id, 3) (*) (60) (1.2)

1.2.2 Syntax Analysis or Parsing


Input: The parser uses the first components of the tokens produced by the lexical analyzer.
Output: It creates a tree-like intermediate representation that depicts the grammatical structure
of the token stream.
✓ A typical representation is a syntax tree in which each interior node represents an operation and
the children of the node represent the arguments of the operation.
Example: A syntax tree for the token stream (1.2) is shown as the output of the syntactic analyzer
in Fig. 1.7.
✓ The tree has an interior node labeled * with (id, 3) as its left child and the integer 60 as its right
child. The node (id, 3) represents the identifier rate.

Page 4
SYSTEM SOFTWARE & COMPILERS 18CS61

✓ The node labeled * makes it explicit that we must first multiply the value of rate by 60. The node
labeled + indicates that we must add the result of this multiplication to the value of initial.
✓ The root of the tree, labeled =, indicates that we must store the result of this addition into the location
for the identifier position.

1.2.3 Semantic Analysis


Input: This uses the syntax tree and the information in the symbol table.
Output: semantic analyzer check the source program for semantic consistency with the language
definition. It also gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.
✓ An important part of semantic analysis is type checking, where the compiler checks that each
operator has matching operands.
✓ Example: If the operator is applied to a floating-point number and an integer, the compiler may
convert or coerce the integer into floating-point number.
✓ Suppose that position, initial, and rate have been declared to be floating-point numbers, and that the
lexeme 60 by itself forms an integer. The type checker in the semantic analyzer in Fig. 1.7 discovers
that the operator * is applied to a floating-point number rate and an integer 60.
✓ The output of the semantic analyzer has an extra node for the operator inttofloat, which explicitly
converts its integer argument into a floating-point number.

1.2.4 Intermediate Code Generation


Input: Syntax trees that are used during syntax and semantic analysis.
Output: Compilers generate an explicit low-level or machine-like intermediate representation.
✓ This intermediate representation should have two important properties: it should be easy to produce
and it should be easy to translate into the target machine.
✓ An intermediate form is also called as three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction. Each operand can act like aregister.
The output of the intermediate code generator in Fig. 1.7 consists of the three-address code sequence.
t1 = inttofloat 60
t2 = id3 * t1
t3 = id2 + t2
id1 = t3 (1.3)
✓ There are several points about three-address instructions.
a) Each three-address assignment instruction has at most one operator on the right side. These
instructions fix the order in which operations are to be done; the multiplication precedes the
addition in the source program (1.1).

Page 5
SYSTEM SOFTWARE & COMPILERS 18CS61

b) The compiler must generate a temporary name to hold the value computed by a three-address
instruction.
c) Some "three-address instructions" like the first and last in the sequence (1.3), above, have
fewer than three operands.

1.2.5 Code Optimization


Input: low-level or machine-like intermediate code or the machine-independent code.
Output: this optimization phase attempts to improve the intermediate code so that better target
code will result. Usually better means faster, but other objectives may be desired, such as shorter
code, or target code that consumes less power.
Example: A simple intermediate code generation algorithm followed by code optimization is a
reasonable way to generate good target code.
✓ The optimizer can deduce that the conversion of 60 from integer to floating point can be done once
and for all at compile time, so the inttofloat operation can be eliminated by replacing the integer 60
by the floating-point number 60.0. Moreover, t3 is used only once to transmit its value to id1 so the
optimizer can transform (1.3) into the shorter sequence.
tl = id3 * 60.0
idl = id2 + tl (1.4)
✓ The optimizing compilers - improve the running time of the target program without slowing down
compilation too much.

1.2.6 Code Generation


Input: an intermediate representation of the source program.
Output: Maps intermediate code into the target language. If the target language is machine code,
registers or memory locations are selected for each of the variables used by the program.
Example: using registers Rl and R2, the intermediate code in (1.4) might get translated into the
machine code LDF R2, id3
MULF R2, R2, #60.0
LDF Rl, id2
ADDF Rl, Rl, R2
STF idl, Rl (1.5)
✓ The first operand of each instruction specifies a destination. The F in each instruction tells us that
it deals with floating-point numbers.
a) The code in (1.5) loads the contents of address id3 into register R2, then multiplies it with
floating-point constant 60.0. The # signifies that 60.0 is to be treated as an immediate constant.
b) The third instruction moves id2 into register Rl and the fourth adds to it the value previously
computed in register R2.
c) Finally, the value in register Rl is stored into the address of idl, so the code correctly
implements the assignment statement (1.1).

Page 6

You might also like