0% found this document useful (0 votes)
131 views17 pages

Phases of A Compiler

The compiler phases are: 1. Lexical analysis scans the source code and produces tokens from lexemes. 2. Syntax analysis uses tokens to build a parse tree representing the program structure. 3. Semantic analysis checks the tree for semantic correctness using symbol tables. 4. Intermediate code generation transforms the tree into 3-address code for optimization and code generation.

Uploaded by

Sam Alex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views17 pages

Phases of A Compiler

The compiler phases are: 1. Lexical analysis scans the source code and produces tokens from lexemes. 2. Syntax analysis uses tokens to build a parse tree representing the program structure. 3. Semantic analysis checks the tree for semantic correctness using symbol tables. 4. Intermediate code generation transforms the tree into 3-address code for optimization and code generation.

Uploaded by

Sam Alex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Phases of a compiler

Phases of a compiler
Lexical Analysis

– The first phase of a compiler is called lexical


analysis or scanning.
– Reads stream of characters making up the source
program and groups the characters into
meaningful sequences called lexemes.
– For each lexeme, the lexical analyzer produces as
output a token
• Tokens are two kinds
– Specific strings: keywords, punctuation symbols
– Classes of strings: identifies, constants, labels
– Token is of the form
(token-name/type, attribute-value)
– Specific string have token type but no value
– Classes of string have both token type and value
– token-name is an abstract symbol that is used
during syntax analysis,
– attribute-value points to an entry in the symbol
table for this token.
– Information from the symbol-table entry is needed
for semantic analysis and code generation.
• For example, a source program contains the
assignment statement
p o s i t i o n = i n i t i a l + r a t e * 60
• Lexical analyzer produces sequence of 7 tokens
( i d , 1 ) (=) (id, 2) (+) (id, 3) (*) (60)
• where id is an abstract symbol standing for identifier
and 1 , 2, 3 points to the symbol table entries for p o
sition,initial,rate.
• Token value represents an index in to symbol table
where information about classes of strings are kept.
• The symbol-table entry for an identifier holds
information about the identifier, such as its name and
type
• passes token to the subsequent phase, syntax analysis.
Syntax Analysis

• The second phase of the compiler is syntax analysis or


parsing.
• Use the tokens to create a tree-like intermediate
representation that depicts the grammatical structure of the
token.
• Grammatical structures are represented by parse tree
• Another typical representation is a syntax tree
– is a compressed representation of parse tree
– Operators appears as the internal nodes and operands of the
operators are the children of the node for that operator.
Assignment statement
=
identifier expression
:= +
(id,1)
position
expression + expression
(id,2) *

identifier expression * expression (id,3) 60

initial identifier number

rate 60
Parse tree for position := initial + rate * 60
Semantic Analysis

• The semantic analyzer uses the syntax tree and the


information in the symbol table to check the source
program for semantic consistency with the language
definition.
• type checking,
– where the compiler checks that each operator has
matching operands.
• For example, many programming language definitions
require an array index to be an integer; the compiler must
report an error if a floating-point number is used to index
an array
• permit some type conversions called coercions.
• For example: p o s i t i o n , i n i t i a l , and r a t e are
declared as floating-point, and 60 as integer. The type
checker convert integer into a floating-point number.

(id,1) +

(id,2) *

(id,3) Inttofloat

60
Intermediate Code Generation

• Transform parse tree in to an intermediate


language representation of the source program.

• An intermediate form is called three-address


code
– which consists of a sequence of instructions with at
most three operands per instruction
• For the above example, three address code is as follows
t1=inttofloat(60)
t2=id3*t1
t3=id2+t2
id1=t3
• Some properties of 3-address code
– Easy to produce
– Easy to translate in to target program.
– each three-address assignment instruction has at most one
operator on the right side.
– the compiler must generate a temporary name to hold the
value computed by a three-address instruction.
– some "three-address instructions" have fewer than three
operands. (like the first and last instruction in the above
eg:)
Code Optimization

• Attempts to improve the intermediate code so that


better target code will result.
– Optional phase
– Produce a better target program that will run faster
and take less memory and execution time
– In the above eg: the code can be optimized as
t1 = id3 * 60.0
id1= id2 + t1
Code Generation
• The code generator takes as input an intermediate
representation of the source program and maps it
into the target language.
• using registers R1 and R2, the above intermediate
code is translated into the following machine code

MOVF id3, R2
MULF #60.0 , R2
MOVF id2, R1
ADDF R2, R1
MOVF R1,id1
Symbol-Table Management

• The symbol table is a data structure containing a


record for each variable name, with fields for the
attributes of the name.
• To record the variable names used in the source
program and collect information about various
attributes of each name.
– Provide information about the storage allocated for a
name, its type, its scope
– In the case of procedures, the number and types of its
arguments, the method of passing each argument (by
value or by reference), and the type returned.
Error Handler
– Invoked when an error in the program is detected
– Common errors that are to be encountered are
• The lexical analyzer may be unable to proceed because the
next token in the source program is misspelled
• The parser may be unable to infer a structure because of a
syntactic error like missing semicolon
• Semantic analyzer detect a construct having no meaning
e.g. add array with a procedure
• The code optimizer may detect that certain statements can
never be reached,
• While entering in to symbol table the book keeping routine
may discover an identifier that has been multiply declared
with contradictory attribute, etc….

You might also like