CD Lexical
CD Lexical
CD Lexical
It uses two
pointers begin ptr(bp) and forward to keep track of the pointer of the input scanned.
The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is
encountered, it indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank
space the lexeme “int” is identified.The fp will be moved ahead at white space, when fp
encounters white space, it ignores and moves ahead. then both the begin ptr(bp) and forward
ptr(fp) are set at next token.The input character is thus read from secondary storage, but reading
in this way from secondary storage is costly. hence buffering technique is used.A block of data is
first read into a buffer, and then second by lexical analyzer.
inite Automata(FA) is the simplest machine to recognize patterns.It takes the string of symbol as
input and changes its state accordingly. When the desired symbol is found, then the transition
occurs.At the time of transition, the automata can either move to the next state or stay in the
same state. Finite automata have two states, Accept state or Reject state. When the input string
is processed successfully, and the automata reached its final state, then it will accept.
RAGULAR EXPRESSION-
Regular Expressions are used to denote regular languages. An expression is regular if:
• ɸ is a regular expression for regular language ɸ.
• ɛ is a regular expression for regular language {ɛ}.
• If a ∈ Σ (Σ represents the input alphabet), a is regular expression with language {a}.
• If a and b are regular expression, a + b is also a regular expression with language {a,b}.
• If a and b are regular expression, ab (concatenation of a and b) is also regular.
• If a is regular expression, a* (0 or more times a) is also regular.
Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the
flow of values flowing among the basic blocks, and offers optimization too. DAG provides easy
transformation on basic blocks. DAG can be understood here:
• Leaf nodes represent identifiers, names or constants.
• Interior nodes represent operators.
• Interior nodes also represent the results of expressions or the identifiers/name where the
values are to be stored or assigned.
Q7. Parameter passing is the communication medium among the procedures. By using some
mechanism, the variable values from the calling procedure are transferred to the called
procedure. Some of the terms related to value are:
r-value
r-value is the value of an expression. If the value of the single variable appears on the right-hand
side of the assignment operator, the variable becomes an r-value. R-values ara always assigned to
some other variables.
l-value
The memory location where the expression is stored in known as I-Value of the expression. It
always appears at the left hand side of an assignment operator.
For instance:
Day=1;
Week=day*7;
Month=1;
Year=month*12;
It is understood that the constant values like 1, 7, 12, and variables like day, week, month and
year, all have r-values. The variables have the i-values as they represent the memory location
assigned to them.
For instance:
7=x+y;
is an l-value error, as the constant 7 does not represent any memory location.
Q10. Step-01:
S → bSS’ / a
S’ → SaaS / SaSb / b
Again, this is a grammar with common prefixes.
Step-02:
S → bSS’ / a
S’ → SaA / b
A → aS / Sb
This is a left factored grammar.
PART-B
Syntax trees are abstract or compact representation of parse trees.They are also called as Abstract
Syntax Trees.
Syntax trees are called as Abstract Syntax Trees because-
• They are abstract representation of the parse trees.
• They do not provide every characteristic information from the real syntax.
• For example- no rule nodes, no parenthesis etc.
i. ( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))
Step-01:
We convert the given arithmetic expression into a postfix expression as-
(a+b)*(c–d)+((e/f)*(a+b))
ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ef/ * ( a + b ) )
ab+ * cd- + ( ef/ * ab+ )
ab+ * cd- + ef/ab+*
ab+cd-* + ef/ab+*
ab+cd-*ef/ab+*+
Step-02:
We draw a syntax tree for the above postfix expression.
ii. A+4-b+3
Step-01:
A+4-b+3
A4+-b+3
A4+-b3+
A4+b3+-
Step-02:
We draw a syntax tree for the above postfix expression.
Q2. The design of compiler can be decomposed into several phases, each of which converts one form of
source program into another.
The different phases of compiler are as follows:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Code generation
All of the aforementioned phases involve the following tasks:
• Symbol table management.
• Error handling.
Lexical Analysis
• Lexical analysis is the first phase of compiler which is also termed as scanning.
• Source program is scanned to read the stream of characters and those characters are grouped to
form a sequence called lexemes which produces token as output.
• Token: Token is a sequence of characters that represent lexical unit, which matches with the
pattern, such as keywords, operators, identifiers etc.
• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. ,
• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must
be matched by strings.
• Once a token is generated the corresponding entry is made in the symbol table.
Input: stream of characters
Output: Token
Token Template: <token-name, attribute-value>
(eg.) c=a+b*5;
Lexemes and tokens
Lexemes Tokens
c identifier
= assignment symbol
a identifier
+ + (addition symbol)
b identifier
* * (multiplication symbol)
5 5 (number)
Semantic Analysis
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information gathered and stored in symbol table or in syntax tree.
• Performs type checking.
Code Optimization
• Code optimization phase gets the intermediate code as input and produces optimized
intermediate code as output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a program.
• This phase reduces the redundant code and attempts to improve the intermediate code so that
faster-running machine code will result.
• During the code optimization, the result of the program is not affected.
• To improve the code generation, the optimization involves
o Deduction and removal of dead code (unreachable code).
o Calculation of constants in expressions and terms.
o Collapsing of repeated expression into temporary string.
o Loop unrolling.
o Moving code outside the loop.
o Removal of unwanted temporary variables.
t1 = id3* 5.0
id1 = id2 + t1
Code Generation
• Code generation is the final phase of a compiler.
• It gets input from code optimization phase and produces the target code or object code as result.
• Intermediate instructions are translated into a sequence of machine instructions that perform the
same task.
• The code generation involves
o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data types.
o Generation of missing code.
LDF R2, id3
MULF R2, # 5.0
LDF R1, id2
ADDF R1, R2
STF id1, R1
Q3. Part-01:
Part-02:
Step-01:
We identify the leader statements as-
• prod = 0 is a leader because first statement is a leader.
• T1 = 4 x i is a leader because target of conditional or unconditional goto is a leader.
Step-02:
The above generated three address code can be partitioned into 2 basic blocks as-
Step-03:
The flow graph is:
TYPE CONVERSION:
Type conversion generally means a process of converting one type to another type when two
different types are assigned. Or we can say that when two variables of different datatypes are
used together or are assigned to each other we need to convert type of one variable to match with
the type of another variable. In programming language it generally means basic datatypes such
as:
Int
char
float
longint
double and so on.
Type conversion: The process of converting one pre defined type to another type is called type
conversion.Types conversion are of two types:
1.IMPLICIT.- Implicit conversion is when compiler automatically converts one predefined data
type to another data type.
2.EXPLICIT- Explicit type conversion is a type conversion which is done explicitly(i.e) by the
user instead of leaving it up to compiler to perform automatically.
Q5. ACTIVATION RECORD
o Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.
o When it is called (activation begins) then the procedure name will push on to the stack
and when it returns (activation ends) then it will popped.
o Activation record is used to manage the information needed by a single execution of a
procedure.
o An activation record is pushed into the stack when a procedure is called and it is popped
when the control returns to the caller function.
ACTIVATION TREE
The sequence of instructions when combined into number of procedures is known as a program
and the procedure instructions are executed sequentially. There is a start and end delimiter and
the rest is the body of the procedure. The body of the procedure comprises of the procedure
identifier and the sequence of finite instructions.
The process of executing a procedure is called activation. The necessary information for calling a
procedure is available in an activation record. Depending upon the source language used, the
activation record contains some of the units such as:
Machine Status Stores machine status such as Registers, Program Counter etc.,
before the procedure is called.
Control Link Stores the address of activation record of the caller procedure.
Access Link Stores the information of data which is outside the local scope.
Actual Parameters Stores actual parameters, i.e., parameters which are used to
send input to the called procedure.
A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.
Method:
1. We first determine the set of leaders, for that we use the following rules:
I) The first statement is a leader.
II) Any statement that is the target of a conditional or unconditional goto is a leader.
III) Any statement that immediately follows a goto or conditional goto statement is leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
1) STRUCTURE-PRESERVING TRANSFORMATIONS
The primary structure-preserving transformations on basic blocks are:
1. common sub-expression elimination
2. dead-code elimination
3. renaming of temporary variables
4. interchange of two independent adjacent statements
• Common sub-expression elimination
✓ Consider the basic block
a:= b+c
b:= a-d
c:= b+c
d:= a-d
✓ The second and fourth statements compute the same expression, hence this basic block may
be transformed into the equivalent block
a:= b+c
b:= a-d
c:= b+c
d:= b
✓ Although the 1st and 3rd statements in both cases appear to have the same expression on the
right, the second statement redefines b. Therefore, the value of b in the 3rd statement is
different from the value of b in the 1st, and the 1st and 3rd statements do not compute the same
expression.
• Dead-code elimination
✓ Suppose x is dead, that is, never subsequently used, at the point where the statement x: = y+z
appears in a basic block. Then this statement may be safely removed without changing the
value of the basic block.
✓ In fact, we can always transform a basic block into an equivalent block in which each
statement that defines a temporary defines a new temporary. We call such a basic block a
normal-form block.
• Interchange of statements
2) ALGEBRIC TRANSFORMATIONS
✓ Countless algebraic transformation can be used to change the set of expressions computed by
the basic block into an algebraically equivalent set. The useful one are those that simplify
expressions or replace expensive operation by cheaper one.
✓ Example x=x+0 or x=x+1 can be eliminated
3) FLOW GRAPH
✓ A graph representation of three-address statements, called a flow graph, is useful for
understanding code-generation algorithms
✓ Nodes in the flow graph represent computations, and the edges represent the flow of control.
✓ Example of flow graph for following three address code
(1) prod=0
(2) i=1
(3) t1 := 4*i
(4) t2 := a [ t1 ] (5) t3 := 4*i
(6) t4 :=b [ t3 ]
(7) t5 := t2*t4
(8) t6 := prod +t5
(9) prod := t6
(10) t7 := i+1 B2
(11) i := t7
(12) if i<=20 goto (3)
Q7. Step-01:
id + x $
id > > >
Step-01:
We insert $ symbol at both ends of the string as-
$ id + id x id $
We insert precedence operators between the string symbols as-
$ < id > + < id > x < id > $
Step-02:
We scan and parse the string as-
$ < id > + < id > x < id > $
$ E + < id > x < id > $
$ E + E x < id > $
$E+ExE$
$+x$
$<+<x>$
$<+>$
$$
PART-C
.Q3. POSTFIX
a+a*(b-c)+(b-c)*d
a+a*bc-+bc-*d
a+bc-*+bc-d*
abc-*++bc-d*
abc-*+bc-d*+
SYNTAX TREE
DAG
QUADRUPLE
TRIPLE
INDIRECT TRIPLE
Q4. (a)
(b) Parsing input string “id*id+id”
Parse tree-
Q5. (a)
(b).