0% found this document useful (0 votes)
16 views21 pages

Unit-1 CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

Unit-1 CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Source

Phases of program
a complier
Llexical Analyzer

LSyntax Analyzer

Ssymbol table
manager semantic
Analyzer
Error Handler

Lintermediate
code generator

L code optimizer

Target
L code generator program
Let us consider the statement
position:=initial + rate*60

• Lexical Analyzer:
• It reads the character in the source program and groups into a stream
of tokens in which each token represents an identifiers, keywords(if,
while….),punctuation character :=,…..
• Linear analysis is also refered as lexical analysis or scanning
• The character sequence forming a token is called lexence for the
token
• The statement of identifiers are entered into symbol table
• Symbol Table
Identifiers Type of identifiers

Position Id1 int

Rate id3 int

Initial id2 Int

60 const int

• After lexical analysis the statement are converted into,


• Postion = initial + rate*60
• Id1=id2+id3*60
Syntax Analysis:
• It imposes a hierarchical structure on the token stream which is
shown by syntax tree
• Also refered as hierarchical analysis or prasing
• It involves grouping the token of source program into grammatical
phases that are used by complier
=
Id1 +
id2 *
id3 60
It builds the syntax tree for the given statement
Semantic Analyzer
• It checks the source program for semantic error and gathers type of
information for the subsequent code generation phase
• It uses the syntax analysis phase to identify operators and operands of
the struts
• It converts the data type of the identifier which was used in previous
phase
• Eg:60 :integer converted into real as int to real
=
Id1 +
id2 *
id3 int to real
60
Intermediate Code Generator
• After syntax and sematic analysis ,complier generate explicit
intermediate representation of the source program it has two
properties
TAC Properties

• It has atmost one operator


• Complier must generate temporary name to hold the value computed
by each instruction
• Eg:
• Id1-id2+id3*60(int to real (60))
• Temp1:=int to real(60)
• Temp2:=id3*temp1 TAC
• Temp3:=id2+temp2
• id1:=temp3
Code Optimization
• It attempts to improve the intermediate code faster running machine
code will result
• Refining the previous phase statement to limited numbers of line of
code
• Temp1:=id3*60.0
• Id1:=id2+Temp1
• Int to real is eliminated in this phase assumes float value
• It optimizes the code using intermediate representation (previous
phase)
Code Generation:
• Final phase of the complier consisting of relocatable machine code /
• Assembly code
• Optimized code is translated into code generation phase by using
register
• Eg:
MOVF id3,R2
MULF #60.0,R2
MOVF id2,R1
ADD F R2,R1
MOVF R1,id1
• With the help of temporary register the expansion is assigned like
assembly codes
Symbol Table Management

• These attributes may provide information about the storage allocated for an
identifier ,its type , its scope.
• It is a data structure containing a record for each identifier and to store/retrieve
data quickly
• Whenever an identifier is detected by a lexical analyzer it is entered into the
symbol table
• ERROR DETECTION AND REPORTING
• Each phase can encounter error after detecting an error how to deal for
compilation proceed that should be identified
• A complier stops when it finds the first error
• The syntax and sematic analysis usually handle large fraction of error
• Lexical phase can detect error
• syntax of the danger is detect during syntax analysis
LEXICAL ANALYZER
• It is the first phase of complier or scanner
• It converts the high level input program into a sequence of tokens
• LA is implemented with the DFA
• Output is a sequence of tokens that is sent to the parser for syntax
analysis
• Read chars tokens

immimiii Syntax analyzer


Lexical analyzer
iInput
Push back
Extra character Ask for tokens
Tokens
• It is a sequence of characters that can be treated as a unit in the
grammar of the programming language
• Eg:
• 1.Type Token(id, number, real…..)
• 2.Punctuations(IF, Void ,return….)
• 3.Alphabetic tokens(Keywords)
• 4.Keyword:for, while , if, etc
• 5.Identifiers:variable name , function name etc
• Operators:+,++,-, etc
• Separtors : “,” , ”;” etc;
Eg of Non-Tokens:
• Comments
• Preprocessor directive
• Macros
• Blanks
• Tabs
• Newline
• Lexence:
The sequence of character matched by a pattern to form the
corresponding token or sequence of input character that compresses a
single token is called a Lexence
• Eg: float,=,273,abs-zero
• Functions of LA:
• Tokenization is dividing the program into valid token
• Remove white space characters
• Remove comments
• It also provides help in generating error messages
• Suppose a= b + c converted into id= id+ id
• Eg:
int main() valid tokens are
{ ‘int’ ,’main’, ‘(‘, ‘)’, ‘{‘,’int’, ‘a’, ‘b’, ‘;’, ‘a’,’=,
‘10,’ ’ ;’, ‘return’ ,’0’, ‘;’, ‘}’
// 2 variables
int a , b;
a=10;
return 0;
}
• Patterns:
• There is a set of strings in the input for which the same token is produced as
output ,This set of strings is described by a rule called pattern.
• Token Lexence Pattern
• Const const Const
• If if if
• Relation <,<=,>,>=,<> < or <= or > or >=
• Id pi,count,D2 letters followed by letters and
digit
• Num 3.146.0.6 any numeric const
• Literal core dumped any character between “and”
• Issues in LA:
• 1.A large amount of time is spent reading the source program and
partitioning it into token
• 2.Specialized buffering techniques for reading input characters and
processing tokens can siginificantly speed up the performance of a
complier
ERRORS RECOVERY IN LA:
1.Deleting a extraneous character
2.Inserting a missing character
3.Replacing an incorrect character by a correct character
4.Transposing two adjacent characters

You might also like