CD Lexical

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Q1. The lexical analyzer scans the input from left to right one character at a time.

It uses two
pointers begin ptr(bp) and forward to keep track of the pointer of the input scanned.

Initially both the pointers point to the first character

The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is
encountered, it indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank
space the lexeme “int” is identified.The fp will be moved ahead at white space, when fp
encounters white space, it ignores and moves ahead. then both the begin ptr(bp) and forward
ptr(fp) are set at next token.The input character is thus read from secondary storage, but reading
in this way from secondary storage is costly. hence buffering technique is used.A block of data is
first read into a buffer, and then second by lexical analyzer.

Q2. TheLAisthefirstphaseofacompiler.Itmaintaskistoreadtheinputcharacterand produceas output a


sequenceof tokens that theparser uses for syntaxanalysis.
Uponreceivinga‘getnexttoken’commandformtheparser, thelexicalanalyzerreads the input
character until it can identify the next token. The LA returnto the parser
representationforthetokenithasfound.Therepresentationwillbeanintegercode,ifthe token is asimple
construct such as parenthesis,commaor
colon.LAmayalsoperformcertainsecondarytasksastheuserinterface.Onesuchtaskis
stripingoutfromthesourceprogramthecommandsandwhitespacesintheformofblank,
tabandnewlinecharacters.Anotheriscorrelatingerrormessagefromthecompilerwiththe
sourceprogram.

Q3. FINITE AUTOMATION-

inite Automata(FA) is the simplest machine to recognize patterns.It takes the string of symbol as
input and changes its state accordingly. When the desired symbol is found, then the transition
occurs.At the time of transition, the automata can either move to the next state or stay in the
same state. Finite automata have two states, Accept state or Reject state. When the input string
is processed successfully, and the automata reached its final state, then it will accept.

A Finite Automata consists of the following :


Q : Finite set of states.
∑ : set of Input Symbols.
q : Initial state.
F : set of Final States.
δ : Transition Function.

Formal specification of machine is {Q, ∑, q, F, δ }

RAGULAR EXPRESSION-

Regular Expressions are used to denote regular languages. An expression is regular if:
• ɸ is a regular expression for regular language ɸ.
• ɛ is a regular expression for regular language {ɛ}.
• If a ∈ Σ (Σ represents the input alphabet), a is regular expression with language {a}.
• If a and b are regular expression, a + b is also a regular expression with language {a,b}.
• If a and b are regular expression, ab (concatenation of a and b) is also regular.
• If a is regular expression, a* (0 or more times a) is also regular.

Q4. ACTIVATION RECORD


o Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.
o When it is called (activation begins) then the procedure name will push on to the stack
and when it returns (activation ends) then it will popped.
o Activation record is used to manage the information needed by a single execution of a
procedure.
o An activation record is pushed into the stack when a procedure is called and it is popped
when the control returns to the caller function.

Q5. DAG (Directed Acyclic Graph)

Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the
flow of values flowing among the basic blocks, and offers optimization too. DAG provides easy
transformation on basic blocks. DAG can be understood here:
• Leaf nodes represent identifiers, names or constants.
• Interior nodes represent operators.
• Interior nodes also represent the results of expressions or the identifiers/name where the
values are to be stored or assigned.

Q6. Different types of error in compiler-


The program errors are detected and reported by parser. The parser handles the errors
encountered and the rest of the input is parsed. The errors may be encountered at various stages
of the compilation process. At various stages, the following kinds of errors occur:
• Lexical : name of some identifier typed incorrectly
• Syntactical : missing semicolon or unbalanced parenthesis
• Semantical : incompatible value assignment
• Logical : code not reachable, infinite loop

Q7. Parameter passing is the communication medium among the procedures. By using some
mechanism, the variable values from the calling procedure are transferred to the called
procedure. Some of the terms related to value are:
r-value
r-value is the value of an expression. If the value of the single variable appears on the right-hand
side of the assignment operator, the variable becomes an r-value. R-values ara always assigned to
some other variables.
l-value
The memory location where the expression is stored in known as I-Value of the expression. It
always appears at the left hand side of an assignment operator.
For instance:
Day=1;
Week=day*7;
Month=1;
Year=month*12;
It is understood that the constant values like 1, 7, 12, and variables like day, week, month and
year, all have r-values. The variables have the i-values as they represent the memory location
assigned to them.
For instance:
7=x+y;
is an l-value error, as the constant 7 does not represent any memory location.

Q8: PEEPHOLE OPTIMIZATION:


A statement-by-statement code-generations strategy often produces target code that contains
redundant instructions and suboptimal constructs. The quality of such target code can be
improved by applying “optimizing” transformations to the target program.
A simple but effective technique for improving the target code is peephole optimization, a
method for trying to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these instructions by a shorter
or faster sequence, whenever possible.
The peephole is a small, moving window on the target program. The code in the peephole need
not be contiguous, although some implementations do require this. It is characteristic of peephole
optimization that each improvement may spawn opportunities for additional improvements.
Characteristics of peephole optimizations:
Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
Unreachable

Q9. The grammar after eliminating left recursion is-


S → (L) / a
L → SL’
L’ → ,SL’ / ∈

Q10. Step-01:

S → bSS’ / a
S’ → SaaS / SaSb / b
Again, this is a grammar with common prefixes.

Step-02:

S → bSS’ / a
S’ → SaA / b
A → aS / Sb
This is a left factored grammar.
PART-B

Q1. ABSTRACT SYNTAX TREE-

Syntax trees are abstract or compact representation of parse trees.They are also called as Abstract
Syntax Trees.
Syntax trees are called as Abstract Syntax Trees because-
• They are abstract representation of the parse trees.
• They do not provide every characteristic information from the real syntax.
• For example- no rule nodes, no parenthesis etc.
i. ( a + b ) * ( c – d ) + ( ( e / f ) * ( a + b ))
Step-01:
We convert the given arithmetic expression into a postfix expression as-
(a+b)*(c–d)+((e/f)*(a+b))
ab+ * ( c – d ) + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ( e / f ) * ( a + b ) )
ab+ * cd- + ( ef/ * ( a + b ) )
ab+ * cd- + ( ef/ * ab+ )
ab+ * cd- + ef/ab+*
ab+cd-* + ef/ab+*
ab+cd-*ef/ab+*+
Step-02:
We draw a syntax tree for the above postfix expression.

ii. A+4-b+3
Step-01:
A+4-b+3
A4+-b+3
A4+-b3+
A4+b3+-
Step-02:
We draw a syntax tree for the above postfix expression.
Q2. The design of compiler can be decomposed into several phases, each of which converts one form of
source program into another.
The different phases of compiler are as follows:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Code generation
All of the aforementioned phases involve the following tasks:
• Symbol table management.
• Error handling.

Lexical Analysis
• Lexical analysis is the first phase of compiler which is also termed as scanning.
• Source program is scanned to read the stream of characters and those characters are grouped to
form a sequence called lexemes which produces token as output.
• Token: Token is a sequence of characters that represent lexical unit, which matches with the
pattern, such as keywords, operators, identifiers etc.
• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. ,
• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must
be matched by strings.
• Once a token is generated the corresponding entry is made in the symbol table.
Input: stream of characters
Output: Token
Token Template: <token-name, attribute-value>
(eg.) c=a+b*5;
Lexemes and tokens

Lexemes Tokens
c identifier
= assignment symbol
a identifier
+ + (addition symbol)
b identifier
* * (multiplication symbol)
5 5 (number)

Hence, <id, 1><=>< id, 2>< +><id, 3 >< * >< 5>


Syntax Analysis
• Syntax analysis is the second phase of compiler which is also called as parsing.
• Parser converts the tokens produced by lexical analyzer into a tree like representation called
parse tree.
• A parse tree describes the syntactic structure of the input.
• Syntax tree is a compressed representation of the parse tree in which the operators appear as
interior nodes and the operands of the operator are the children of the node for that operator.
Input: Tokens
Output: Syntax tree

Semantic Analysis
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information gathered and stored in symbol table or in syntax tree.
• Performs type checking.

Intermediate Code Generation


• Intermediate code generation produces intermediate representations for the source program
which are of the following forms:
o Postfix notation
o Three address code
o Syntax tree
Most commonly used form is the three address code.
t1 = inttofloat (5)
t2 = id3* tl
t3 = id2 + t2
id1 = t3
Properties of intermediate code-
• It should be easy to produce.
• It should be easy to translate into target program.

Code Optimization
• Code optimization phase gets the intermediate code as input and produces optimized
intermediate code as output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a program.
• This phase reduces the redundant code and attempts to improve the intermediate code so that
faster-running machine code will result.
• During the code optimization, the result of the program is not affected.
• To improve the code generation, the optimization involves
o Deduction and removal of dead code (unreachable code).
o Calculation of constants in expressions and terms.
o Collapsing of repeated expression into temporary string.
o Loop unrolling.
o Moving code outside the loop.
o Removal of unwanted temporary variables.
t1 = id3* 5.0
id1 = id2 + t1

Code Generation
• Code generation is the final phase of a compiler.
• It gets input from code optimization phase and produces the target code or object code as result.
• Intermediate instructions are translated into a sequence of machine instructions that perform the
same task.
• The code generation involves
o Allocation of register and memory.
o Generation of correct references.
o Generation of correct data types.
o Generation of missing code.
LDF R2, id3
MULF R2, # 5.0
LDF R1, id2
ADDF R1, R2
STF id1, R1
Q3. Part-01:

Three address code for the given code is-


prod = 0
i=1
T1 = 4 x i
T2 = a[T1]
T3 = 4 x i
T4 = b[T3]
T5 = T2 x T4
T6 = T5 + prod
prod = T6
T7 = i + 1
i = T7
if (i <= 10) goto (3)

Part-02:
Step-01:
We identify the leader statements as-
• prod = 0 is a leader because first statement is a leader.
• T1 = 4 x i is a leader because target of conditional or unconditional goto is a leader.
Step-02:
The above generated three address code can be partitioned into 2 basic blocks as-
Step-03:
The flow graph is:

Q4. TYPE CHECKING


A compiler must check that the source program follows both syntactic and semantic conventions
of the source language. This checking, called static checking, detects and reports programming
errors.
Some examples of static checks:
1. Type checks - A compiler should report an error if an operator is applied to an incompatible
operand. Example: If an array variable and function variable are added together.
2. Flow-of-control checks - Statements that cause flow of control to leave a construct must have
some place to which to transfer the flow of control. Example: An enclosing statement, such as
break, does not exist in switch statement.
A type checker verifies that the type of a construct matches that expected by its context. For
example: arithmetic operator mod in Pascal requires integer operands, so a type checker verifies
that the operands of mod have type integer. Type information gathered by a type checker may be
needed when code is generated.
TYPE SYSTEMS
A type system is a collection of rules for assigning type expressions to the various parts of a
program. A type checker implements a type system. It is specified in a syntax-directed manner.
Different type systems may be used by different compilers or processors of the same language.
Static and Dynamic Checking of Types
Checking done by a compiler is said to be static, while checking done when the target program
runs is termed dynamic. Any check can be done dynamically, if the target code carries the type
of an element along with the value of that element.
Sound type system
A sound type system eliminates the need for dynamic checking fo allows us to determine
statically that these errors cannot occur when the target program runs. That is, if a sound type
system assigns a type other than type_error to a program part, then type errors cannot occur
when the target code for the program part is run.
Strongly typed language
A language is strongly typed if its compiler can guarantee that the programs it accepts will
execute without type errors.
Error Recovery
Since type checking has the potential for catching errors in program, it is desirable for type
checker to recover from errors, so it can check the rest of the input. Error handling has to be
designed into the type system right from the start; the type checking rules must be prepared to
cope with errors.
TYPE EXPRESSIONS
The type of a language construct will be denoted by a “type expression.” A type expression is
either a basic type or is formed by applying an operator called a type constructor to other type
expressions. The sets of basic types and constructors depend on the language to be checked. The
following are the definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting “the
absence of a value” allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions is a type expression.
Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression denoting the type of an
array with elements of type T and index set I.
Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type
expression.
Records : The difference between a record and a product is that the names. The record type
constructor will be applied to a tuple formed from field names and field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.
Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type
“pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions : A function in programming languages maps a domain type D to a range type R. The
type of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.

TYPE CONVERSION:
Type conversion generally means a process of converting one type to another type when two
different types are assigned. Or we can say that when two variables of different datatypes are
used together or are assigned to each other we need to convert type of one variable to match with
the type of another variable. In programming language it generally means basic datatypes such
as:
Int
char
float
longint
double and so on.
Type conversion: The process of converting one pre defined type to another type is called type
conversion.Types conversion are of two types:
1.IMPLICIT.- Implicit conversion is when compiler automatically converts one predefined data
type to another data type.
2.EXPLICIT- Explicit type conversion is a type conversion which is done explicitly(i.e) by the
user instead of leaving it up to compiler to perform automatically.
Q5. ACTIVATION RECORD
o Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not been
completed.
o When it is called (activation begins) then the procedure name will push on to the stack
and when it returns (activation ends) then it will popped.
o Activation record is used to manage the information needed by a single execution of a
procedure.
o An activation record is pushed into the stack when a procedure is called and it is popped
when the control returns to the caller function.

ACTIVATION TREE

The sequence of instructions when combined into number of procedures is known as a program
and the procedure instructions are executed sequentially. There is a start and end delimiter and
the rest is the body of the procedure. The body of the procedure comprises of the procedure
identifier and the sequence of finite instructions.

The process of executing a procedure is called activation. The necessary information for calling a
procedure is available in an activation record. Depending upon the source language used, the
activation record contains some of the units such as:

Temporaries Stores temporary and intermediate values of an expression.

Local Data Stores local data of the called procedure.

Machine Status Stores machine status such as Registers, Program Counter etc.,
before the procedure is called.

Control Link Stores the address of activation record of the caller procedure.

Access Link Stores the information of data which is outside the local scope.

Actual Parameters Stores actual parameters, i.e., parameters which are used to
send input to the called procedure.

Return Value Stores return values.


The control stack stores the activation record, when the procedure is executed. Until execution is
finished by the called procedure, the execution of the caller is suspended, when the procedure
calls another procedure. The stack stores the activation record of the called procedure.
It is assumed that the program control flows in a sequential manner and the control of the called
procedure is transferred to the called procedure. The control is returned back to the caller on
execution of the called procedure. This type of control flows represents the activation tree.
Q6. BASIC BLOCK

A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.

The following sequence of three-address statements forms a basic block:


t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4+t5
Some terminology used in basic blocks are given below:
• A three-address statement x :=y+z is said to define x and to use y or z. A name in a basic
block is said to live at a given point if its value is used after that point in the program,
perhaps in another basic block.
• The following algorithm can be used to partition a sequence of three-address statements
into basic blocks.
Algorithm: Partition into basic blocks.

Input: A sequence of three-address statements.


Output: A list of basic blocks with each three-address statement in exactly one block.

Method:

1. We first determine the set of leaders, for that we use the following rules:
I) The first statement is a leader.
II) Any statement that is the target of a conditional or unconditional goto is a leader.
III) Any statement that immediately follows a goto or conditional goto statement is leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.

TRANSFORMATION OF BASIC BLOCK

1) STRUCTURE-PRESERVING TRANSFORMATIONS
The primary structure-preserving transformations on basic blocks are:
1. common sub-expression elimination
2. dead-code elimination
3. renaming of temporary variables
4. interchange of two independent adjacent statements
• Common sub-expression elimination
✓ Consider the basic block
a:= b+c
b:= a-d
c:= b+c
d:= a-d
✓ The second and fourth statements compute the same expression, hence this basic block may
be transformed into the equivalent block
a:= b+c
b:= a-d
c:= b+c
d:= b
✓ Although the 1st and 3rd statements in both cases appear to have the same expression on the
right, the second statement redefines b. Therefore, the value of b in the 3rd statement is
different from the value of b in the 1st, and the 1st and 3rd statements do not compute the same
expression.

• Dead-code elimination

✓ Suppose x is dead, that is, never subsequently used, at the point where the statement x: = y+z
appears in a basic block. Then this statement may be safely removed without changing the
value of the basic block.

• Renaming temporary variables

✓ Suppose we have a statement t: = b+c, where t is a temporary. If we change this statement to


u:= b+c, where u is a new temporary variable, and change all uses of this instance of t to u,
then the value of the basic block is not changed.

✓ In fact, we can always transform a basic block into an equivalent block in which each
statement that defines a temporary defines a new temporary. We call such a basic block a
normal-form block.

• Interchange of statements

✓ Suppose we have a block with the two adjacent statements


t1:= b+c
t2:= x+y
✓ Then we can interchange the two statements without affecting the value of the block if and
only if neither x nor y is t1 and neither b nor c is t2. A normal-form basic block permits all
statement interchanges that are possible.

2) ALGEBRIC TRANSFORMATIONS
✓ Countless algebraic transformation can be used to change the set of expressions computed by
the basic block into an algebraically equivalent set. The useful one are those that simplify
expressions or replace expensive operation by cheaper one.
✓ Example x=x+0 or x=x+1 can be eliminated

3) FLOW GRAPH
✓ A graph representation of three-address statements, called a flow graph, is useful for
understanding code-generation algorithms
✓ Nodes in the flow graph represent computations, and the edges represent the flow of control.
✓ Example of flow graph for following three address code
(1) prod=0
(2) i=1
(3) t1 := 4*i
(4) t2 := a [ t1 ] (5) t3 := 4*i
(6) t4 :=b [ t3 ]
(7) t5 := t2*t4
(8) t6 := prod +t5
(9) prod := t6
(10) t7 := i+1 B2
(11) i := t7
(12) if i<=20 goto (3)

Q7. Step-01:

We convert the given grammar into operator precedence grammar.


The equivalent operator precedence grammar is-
E → E + E | E x E | id
Step-02:
The terminal symbols in the grammar are { id, + , x , $ }
We construct the operator precedence table as-

id + x $
id > > >

+ < > < >

x < > > >

$ < < <


Operator Precedence Table

Parsing Given String-


Given string to be parsed is id + id x id.
We follow the following steps to parse the given string-

Step-01:
We insert $ symbol at both ends of the string as-
$ id + id x id $
We insert precedence operators between the string symbols as-
$ < id > + < id > x < id > $

Step-02:
We scan and parse the string as-
$ < id > + < id > x < id > $
$ E + < id > x < id > $
$ E + E x < id > $
$E+ExE$
$+x$
$<+<x>$
$<+>$
$$

PART-C

Q1. Augmented Grammar:


S’→S
CLR Parsing Table:
Q2.

.Q3. POSTFIX

a+a*(b-c)+(b-c)*d
a+a*bc-+bc-*d
a+bc-*+bc-d*
abc-*++bc-d*
abc-*+bc-d*+
SYNTAX TREE

TREE ADDRESS CODE

DAG

QUADRUPLE
TRIPLE

INDIRECT TRIPLE

Q4. (a)
(b) Parsing input string “id*id+id”

STACK INPUT ACTION


1 0 Id*id+id$ Shift
2 0 id 5 *id+id$ reduce F→id
3 0F3 *id+id$ reduce T→F
4 0T2 *id+id$ Shift
5 0 T 2*7 id+id$ Shift
6 0 T 2*7 id 5 +id $ reduce F→id
7 0 T 2*7 F 10 +id $ reduce T→T*F
8 0T2 +id$ reduce E→T
9 0E1 +id $ Shift
10 0 E 1+6 id $ Shift
11 0 E 1+6 id 5 $ reduce F→id
12 0 E 1+6 F 3 $ reduce T→F
13 0 E 1+6 T 9 $ Reduce E→E+T
14 0E $ Accept

Parse tree-
Q5. (a)
(b).

You might also like