0% found this document useful (0 votes)
45 views16 pages

Acd Unit-2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views16 pages

Acd Unit-2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Compiler Phases

The compilation process contains the sequence of various phases. Each phase takes
source program in one representation and produces output in another
representation. Each phase takes input from its previous stage.

There are the various phases of compiler

Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code
as input. It reads the source program one character at a time and converts it into
meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens.

Syntax Analysis

Syntax analysis is the second phase of compilation process. It takes tokens as input and
generates a parse tree as output. In syntax analysis phase, the parser checks that the
expression made by the tokens is syntactically correct or not.
Semantic Analysis

Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.

Intermediate Code Generation

In the intermediate code generation, compiler generates the source code into the intermediate
code. Intermediate code is generated between the high-level language and the machine
language. The intermediate code should be generated in such a way that you can easily
translate it into the target machine code.

Code Optimization

Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines
of the code and arranges the sequence of statements in order to speed up the program
execution.

Code Generation

Code generation is the final stage of the compilation process. It takes the optimized
intermediate code as input and maps it to the target machine language. Code generator
translates the intermediate code into the machine code of the specified computer
The Role of Lexical Analyer

The LA is the first phase of a compiler. It main task is to read the input
character and produce as output a sequence of tokens that the parser uses for
syntax analysis.

Upon receiving a get next token command form the parser, the
lexical analyzer reads the input character until it can identify the next
token. The LA return to the parser representation for the token it has
found. The representation will be an integer code, if the token is a
simple construct such as parenthesis, comma or colon.

2.1 TOKEN, LEXEME, PATTERN:

Token: Token is a sequence of characters that can be treated as a


single logical entity. Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants

Example 1:
int a = 10; //Input Source code
Tokens
int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon)
Answer – Total number of tokens = 5

Example 2:
int main() {
printf("Welcome to GeeksforGeeks!");
return 0;
}
Tokens
'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ',
')', ';', 'return', '0', ';', '}'
Answer – Total number of tokens = 14

Pattern: It specifies set of rules that a scanner follows to create a token.


Lexeme: The character sequence forming a token is called lexeme.
Example:
Description of token

Token lexeme Pattern

const const Const

If If If

relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or letter
followed by letters & digit
I pi any numeric constant

nun 3.14 any character b/w “and “except"

literal "core" Pattern

SPECIFICATION OF TOKENS

There are 3 specifications of tokens:


1)Strings and Languages.
2) Regular Expressions.
3)Regular Definitions.

1) Strings and Languages


 An alphabet or character class is a finite set of symbols.
 A string over an alphabet is a finite sequence of symbols drawn from that
alphabet.
 A language is any countable set of strings over some fixed alphabet.
o In language theory, the terms "sentence" and "word" are often used
as synonyms for "string." The length of a string s, usually written
|s|, is the number of occurrences of symbols in s.
o For example, banana is a string of length six. The empty string,
denoted ε, is the string of length zero.

Operations on strings
Prefix of String

The prefix of the string is the preceding symbols present in the string and the
string s itself.

For example:

s = abcd
The prefix of the string abcd: ∈, a, ab, abc, abcd

2. Suffix of String

Suffix of the string is the ending symbols of the string and the string s itself.

For example:

s = abcd

Suffix of the string abcd: ∈, d, cd, bcd, abcd

3. Proper Prefix of String

The proper prefix of the string includes all the prefixes of the string excluding ∈
and the string s itself.

Proper Prefix of the string abcd: a, ab, abc

4. Proper Suffix of String

The proper suffix of the string includes all the suffixes excluding ∈ and the
string s itself.

Proper Suffix of the string abcd: d, cd, bcd

5. Substring of String

The substring of a string s is obtained by deleting any prefix or suffix from the
string.

Substring of the string abcd: ∈, abcd, bcd, abc, …

6. Proper Substring of String

The proper substring of a string s includes all the substrings of s excluding ∈


and the string s itself.

Proper Substring of the string abcd: bcd, abc, cd, ab…

7. Subsequence of String

The subsequence of the string is obtained by eliminating zero or more (not


necessarily consecutive) symbols from the string.

A subsequence of the string abcd: abd, bcd, bd, …

8. Concatenation of String

If s and t are two strings, then st denotes concatenation.


s = abc t = def

Concatenation of string s and t i.e. st = abcdef

Operations on languages:
The following are the operations that can be applied to languages:
1. Union
2. Concatenation
3. Kleene closure
4. Positive closure

The following example shows the operations on strings: Let L={0,1} and
S={a,b,c}

Regular Expressions
· Each regular expression r denotes a language L(r).

· Here are the rules that define the regular expressions over some alphabet
Σ and the languages that those expressions denote:

1.ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2. If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a}, that is,
the language with one string, of length one, with ‘a’ in its one position.
3.Suppose r and s are regular expressions denoting the languages L(r) and L(s).
Then,
a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
4.The unary operator * has highest precedence and is left associative.
5.Concatenation has second highest precedence and is left associative.
6. | has lowest precedence and is left associative.
4)Regular Definitions

Giving names to regular expressions is referred to as a Regular definition. If Σ is


an alphabet of basic symbols, then a regular definition is a sequence of
definitions of the form
dl → r 1
d2 → r2

………
dn → rn
1.Each di is a distinct name.
2.Each ri is a regular expression over the alphabet Σ U {dl, d2,. . . , di-l}.
Example: Identifiers is the set of strings of letters and digits beginning with a
letter. Regular
definition for this set:

letter → A | B | …. | Z | a | b | …. | z |
digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *

Recognition of Tokens
Tokens can be recognized by Finite Automata.
A Finite Automata(FA) is a simple idealized machine used to recognize patterns
within input taken from some character set(or Alphabet) C. The job of FA is to
accept or reject an input depending on whether the pattern defined by the FA
occurs in the input.
There are two notations for representing Finite Automata. They are
1) Transition Diagram
2) Transition Table
Transition Diagram:

Transition diagram is a directed labelled graph in which it contains nodes and


edges. Nodes represents the states and edges represents the transition of a state.
Every transition diagram is only one initial state represented by an arrow mark
(-->) and Oone or more final states are represented by double circle.
Example:

Where state "1" is initial state and state 3 is final state.


Transition Table:

The transition table is basically a tabular representation of the transition function. It takes two
arguments (a state and a symbol) and returns a state (the "next state").

A transition table is represented by the following things:

o Columns correspond to input symbols.


o Rows correspond to states.
o Entries correspond to the next state.
o The start state is denoted by an arrow with no source.
o The accept state is denoted by a star.

Example:
o

Transition table of given NFA is as follows:

Present Next state for Input Next State of Input


State 0 1

→q0 q0 q1

q1 q1, q2 q2

q2 q1 q3

*q3 q2 q2

There are two types of FA

1) DFA
2) NFA

Lexical Analyzer Generator-Lex


LEX
o Lex is a program that generates lexical analyzer. It is used with YACC parser
generator.
o The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
o It reads the input stream and produces the source code as output through
implementing the lexical analyzer in the C program.

The function of Lex is as follows:


o Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex
compiler runs the lex.1 program and produces a C program lex.yy.c.
o Finally C compiler runs the lex.yy.c program and produces an object program a.out.
o a.out is lexical analyzer that transforms an input stream into a sequence of tokens.
o

Lex file format


A Lex program is separated into three sections by %% delimiters. The formal of Lex source
is as follows:

{
definitions
}
%%
{
Translation rules
}
%%
{
user subroutines
}

Definitions include declarations of constant, variable and regular definitions.

Rules define the statement of form p1 {action1} p2 {action2}....pn {action}.

Where pi describes the regular expression and action1 describes the actions what action the
lexical analyzer should take when pattern pi matches a lexeme.

User subroutines are auxiliary procedures needed by the actions. The subroutine can be
loaded with the lexical analyzer and compiled separately.

Derivation
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing we have to take two decisions. These are as follows:

o We have to decide the non-terminal which is to be replaced.


o We have to decide the production rule by which the non-terminal will be replaced.

We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule from
left to right. So in left most derivatives we read the input string from left to right.

Example

1. S = S + S
2. S = S - S
3. S = a | b |c

Input:

a-b+c

The left-most derivation is:

1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c

Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule from
right to left. So in right most derivatives we read the input string from right to left.

Example:
1. S = S + S
2. S = S - S
3. S = a | b |c

Input:

a-b+c

The right-most derivation is:

1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c

Examples for LMD


1) Consider the following grammar-
S → aB / bA

S → aS / bAA / a
B → bS / aBB / b

Let us consider a string w = aaabbabbba


2) S → bB / aA
A → b / bS / aAA

B → a / aS / bBB
For the string w = bbaababa, find-

1. Leftmost derivation
2. Rightmost derivation
3. Parse Tree
3) Consider the grammar-
S → A1B

A → 0A / ∈

B → 0B / 1B / ∈

For the string w = 00101,

Parse Tree-

 The process of deriving a string is called as derivation.


 The geometrical representation of a derivation is called as a parse tree or derivation tree.

Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivative or more than one parse tree for the given input string. If the
grammar is not ambiguous then it is called unambiguous.

Example:

1. S = aSb | SS
2. S = ∈

For the string aabb, the above grammar generates two parse trees:

It is ambiguous Grammar
1. Left Recursion-

A Grammar G (V, T, P, S) is left recursive if it has a production in the form.


A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first
position on the right side of production. It can eliminate left recursion by replacing a pair of
production with
A → βA′
A → αA′|ϵ
Elimination of Left Recursion
Left Recursion can be eliminated by introducing new non-terminal A such that.

 A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS.
 A grammar containing a production having left recursion is called as Left Recursive Grammar.

Example-

S → Sa / ∈
(Left Recursive Grammar)
 Left recursion is considered to be a problematic situation for Top down parsers.
 Therefore, left recursion has to be eliminated from the grammar.

Examples
1) Eliminate the left recursion from the grammar
E → E + T|T
T → T * F|F
F → (E)|id
Solution
The production after removing the left recursion will be
E → TE′

E′ → +TE′| ∈

T → FT′

T′ →∗ FT′| ∈

F → (E)|id
2) − Remove the left recursion from the grammar
E → E(T)|T
T → T(F)|F
F → id
Solution
Eliminating immediate left-recursion among all Aα productions, we obtain
E → TE′

E → (T)E′|ε

T → FT′

T′ → (F)T′|ε

F → id

Left factoring
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which
are common to two or more productions.

PRACTICE PROBLEMS BASED ON LEFT FACTORING-

Problem-01:

Do left factoring in the following grammar-

S → iEtS / iEtSeS / a
E→b
Solution-

The left factored grammar is-

S → iEtSS’ / a

S’ → eS / ∈
E→b

Problem-02:

Do left factoring in the following grammar-


A → aAB / aBc / aAc

Solution-

Step-01:

A → aA’
A’ → AB / Bc / Ac
Again, this is a grammar with common prefixes.

Step-02:

A → aA’

A’ → AD / Bc

D→B/c

This is a left factored grammar.

Problem-03:

Do left factoring in the following grammar-

S → bSSaaS / bSSaSb / bSb / a

Solution-

Step-01:

S → bSS’ / a

S’ → SaaS / SaSb / b
Again, this is a grammar with common prefixes.

Step-02:

S → bSS’ / a
S’ → SaA / b

A → aS / Sb

This is a left factored grammar.

Problem-04:

Do left factoring in the following grammar-


S → aSSbS / aSaSb / abb / b

Solution-

Step-01:

S → aS’ / b

S’ → SSbS / SaSb / bb
Again, this is a grammar with common prefixes.

Step-02:

S → aS’ / b

S’ → SA / bb
A → SbS / aSb

You might also like