CD Unit-1 (Part-1)
CD Unit-1 (Part-1)
Lexical Analysis:
Lexical Analysis is the first phase, in this phase compiler scans the source code
and splits these characters into tokens
Example:
x = y + 10
Tokens
<X , Identifier>
<= , Assignment operator>
<Y , Identifier>
<+ , Addition operator>
<10 , Number>
Syntax Analyzer:
Syntax analysis is the second phase of compilation process. It takes tokens as input
and generates a parse tree as output. In syntax Analyzer phase, the parser checks
that the expression made by the tokens is syntactically correct or not.
Example:
KHADAR SIR 9966648277
Semantic Analyzer:
Semantic Analyzer will check for Type mismatches, incompatible operands, a
function called with improper arguments, an undeclared variable, etc.
Example:
float x = 20;
In the above code, the semantic Analyzer will typecast the integer 20 to float 20.0
before assign.
Example:
int a=5.5. Assigning a float value to an integer variable is not compatible. So compiler
will give error while compiling the program.
Intermediate Code Generation:
After the semantic analyzer phase is over, In next phase the compiler generates
intermediate code.
Intermediate code is between high-level and machine-level language. This
intermediate code should be designed in a way that makes it easy to translate into
target machine code.
Example
For example, total = count + rate * 5
t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3
Code Optimizer :
This phase removes unnecessary code line and arranges the sequence of statements to speed up
the execution of the program without wasting resources.
Example:
Consider the following code
KHADAR SIR 9966648277
a = int_to_float(10)
b=c*a
d=e+b
f=d
Can become
b = c * 10.0
f = e+b
Code Generator:
This is the last stage of the compiler. It takes optimized code as an input and generates the target
code for the machine.
Example:
consider the following code
b =c * 10.0
f = e+b
can become
Mul
Add
Store
Symbol table: It is a data structure being used and maintained by the compiler, it consists all
the identifier’s name along with their types. It helps the compiler to function smoothly by finding
the identifiers quickly.
consider the following statement
x = a+b*50
The symbol table for the above example is given below. In symbol table are clearly mentions the
variable name and variable types.
Error handler: The error may be encountered in any of the above phases. After finding errors,
the phase needs to deal with the errors to continue with the compilation process. These errors
need to be reported to the error handler which handles the error to perform the compilation
process. Generally, the errors are reported in the form of message.
In the compiler design process error may occur in all the below-given phases:
Example2:
Input: int p=0, d=1, c=2;
ans: total no. of tokens = 13
Example3:
void main()
{
i/*nt*/a=10;
return;
}
ans: total tokens =13 they are void main ( ) { i a = 10 ; return ;
}
INPUT BUFFERING
The lexical analyzer scans only one character in the input string at a time from left
to right
It use of two pointers, begin pointer (bp) and forward pointer (fp), for Identify the
token.
Initially both the pointers point to the first character of the input string as shown
in below
KHADAR SIR 9966648277
The bp remains at the beginning of the string to be read and the fp move forward
to search for end of lexeme. Once the blank space is encountered it indicates end
of lexeme
In above example as soon as fp reaches a blank space the lexeme “int” is identified.
now both pointers takes place to first letter of next lexeme. as shown in below.
RECOGNITION OF TOKENS
Tokens can be recognized by Finite Automata. A finite automaton is a simple machine
computation model with very small amount of memory. It does not give any output
except an indication of acceptance or rejection of the string. There are two
notations for representing Finite Automata. They are
Transition Diagram
Transition Table
Transition diagram is a directed labelled graph in which it contains nodes and edges
Nodes represents the states and edges represents the transition of a state
Every transition diagram is only one initial state represented by an arrow mark (-->)
and zero or more final states are represented by double circle.
Example:
Finite Automata for recognizing identifiers
regular expression:
KHADAR SIR 9966648277
letter -> [A-Za-z_]
id -> letter (letter|digit)*
transition diagram:
An input file, which we call lex.l , is written in the Lex language and describes the
lexical analyzer to be generated. Then Lex compiler runs the lex.l program and
produces a C program lex.yy.c.
Finally C compiler runs the lex.yy.c program and produces an object program a.out.
a.out is lexical analyzer that transforms an input stream into a sequence of tokens.
A Lex program is separated into three sections by %% delimiters. The formal of Lex
source is as follows:
{ declaration}
%%
KHADAR SIR 9966648277
{ rules }
%%
{ user subroutines }
Declaration: include declarations of constant, variable and regular definitions.
Rules: It contains rules, it syntax is
pattern action
%%
if|else|while|int|switch|for|char { printf(“keyboard”);}
[a-zA-Z_]([a-z]|[A-Z]|[0-9]|_)* printf(“\n identifier”);}
[0-9]* printf(“\n number”);}
.* {printf(“invalid”);}
%%
main()
{
yylex();
return 0;
}
int yywrap()
{
Transition table
KHADAR SIR 9966648277
Letter Digit other
→S1 S2
S2 S2 S2 S3
*S3
All NFAs are formed into a single NFA by a new common initial state as shown below.
finite automaton recognize the tokens of input stream and action model takes
appropriate action.
SPECIFICATION OF TOKENS:
There are 3 specifications of tokens:
1)Strings
2) Language
3)Regular expression
1. Alphabets: Any finite set of symbols
{0,1} is a set of binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets,
{a-z, A-Z} is a set of English language alphabets.
Example:
i. the language {w:w contains exactly two 0’s} is described by the expression
1*01*01*
some of the strings of the above language will be {00, 100,1010,10101,001,010,
- -}
ii. the language { w: the length of w is even} is described by the
expression((0U1)(0U1))*
For every regular expression there is a regular language and for every regular
language there is a regular expression.
Every regular expression or regular language can be represented using a finite
automata and finite automata can be converted to a regular expression.
KHADAR SIR 9966648277
Let <digit> = 0|1|2|3|---|8|9
Integer = <digit><digit>*
TRANSLATION PROCESS
A translator is used to translate a high level language program into efficient
machine code
It divides the translation process into a series of phases
Each phase manages some particular aspect of translation
Phases of Translation:
Lexical Analysis:
Lexical Analysis is the first phase in this phase compiler scans the source code. and
group these characters into tokens.
Syntax Analysis:
Syntax analysis is the second phase of compilation process. It takes tokens as
input and generates a parse tree as output.
Semantic analyser:
Semantic Analyzer will check for Type mismatches, incompatible operands, a
function called with improper arguments, an undeclared variable, etc.
Intermediate Code Generation:
Once the semantic analysis phase is over the compiler, generates intermediate code
for the target machine.
Code Optimizer :
KHADAR SIR 9966648277
This phase removes unnecessary code line and arranges the sequence of statements to speed up
the execution of the program without wasting resources.
Code Generator:
This is the last stage of the compiler. It takes optimized code as an input and generates the target
code for the machine.
Example:
KHADAR SIR 9966648277
MAJOR DATA STRUCTURE IN C:
Symbol table is an important major data structure created and maintained by
compilers in order to store information about the occurrence of various entities
such as variable names, function names, objects, classes, interfaces, etc.
Implementation: we can implement Symbol table by using following data structure
• Linear list.
• Binary Search Tree.
• Hash table.
BOOTSTRAPPING AND PORTING
Bootstrapping is a process to create new compiler. it is also used to create cross
compiler.
For Example if programming language L2 is used to design compiler for language L1
then it represented by T diagram as shown below.