Acd Unit-2
Acd Unit-2
The compilation process contains the sequence of various phases. Each phase takes
source program in one representation and produces output in another
representation. Each phase takes input from its previous stage.
Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code
as input. It reads the source program one character at a time and converts it into
meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens.
Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and
generates a parse tree as output. In syntax analysis phase, the parser checks that the
expression made by the tokens is syntactically correct or not.
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.
In the intermediate code generation, compiler generates the source code into the intermediate
code. Intermediate code is generated between the high-level language and the machine
language. The intermediate code should be generated in such a way that you can easily
translate it into the target machine code.
Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines
of the code and arranges the sequence of statements in order to speed up the program
execution.
Code Generation
Code generation is the final stage of the compilation process. It takes the optimized
intermediate code as input and maps it to the target machine language. Code generator
translates the intermediate code into the machine code of the specified computer
The Role of Lexical Analyer
The LA is the first phase of a compiler. It main task is to read the input
character and produce as output a sequence of tokens that the parser uses for
syntax analysis.
Upon receiving a get next token command form the parser, the
lexical analyzer reads the input character until it can identify the next
token. The LA return to the parser representation for the token it has
found. The representation will be an integer code, if the token is a
simple construct such as parenthesis, comma or colon.
Example 1:
int a = 10; //Input Source code
Tokens
int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon)
Answer – Total number of tokens = 5
Example 2:
int main() {
printf("Welcome to GeeksforGeeks!");
return 0;
}
Tokens
'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ',
')', ';', 'return', '0', ';', '}'
Answer – Total number of tokens = 14
If If If
relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or letter
followed by letters & digit
I pi any numeric constant
SPECIFICATION OF TOKENS
Operations on strings
Prefix of String
The prefix of the string is the preceding symbols present in the string and the
string s itself.
For example:
s = abcd
The prefix of the string abcd: ∈, a, ab, abc, abcd
2. Suffix of String
Suffix of the string is the ending symbols of the string and the string s itself.
For example:
s = abcd
The proper prefix of the string includes all the prefixes of the string excluding ∈
and the string s itself.
The proper suffix of the string includes all the suffixes excluding ∈ and the
string s itself.
5. Substring of String
The substring of a string s is obtained by deleting any prefix or suffix from the
string.
7. Subsequence of String
8. Concatenation of String
Operations on languages:
The following are the operations that can be applied to languages:
1. Union
2. Concatenation
3. Kleene closure
4. Positive closure
The following example shows the operations on strings: Let L={0,1} and
S={a,b,c}
Regular Expressions
· Each regular expression r denotes a language L(r).
· Here are the rules that define the regular expressions over some alphabet
Σ and the languages that those expressions denote:
1.ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2. If ‘a’ is a symbol in Σ, then ‘a’ is a regular expression, and L(a) = {a}, that is,
the language with one string, of length one, with ‘a’ in its one position.
3.Suppose r and s are regular expressions denoting the languages L(r) and L(s).
Then,
a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
4.The unary operator * has highest precedence and is left associative.
5.Concatenation has second highest precedence and is left associative.
6. | has lowest precedence and is left associative.
4)Regular Definitions
………
dn → rn
1.Each di is a distinct name.
2.Each ri is a regular expression over the alphabet Σ U {dl, d2,. . . , di-l}.
Example: Identifiers is the set of strings of letters and digits beginning with a
letter. Regular
definition for this set:
letter → A | B | …. | Z | a | b | …. | z |
digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *
Recognition of Tokens
Tokens can be recognized by Finite Automata.
A Finite Automata(FA) is a simple idealized machine used to recognize patterns
within input taken from some character set(or Alphabet) C. The job of FA is to
accept or reject an input depending on whether the pattern defined by the FA
occurs in the input.
There are two notations for representing Finite Automata. They are
1) Transition Diagram
2) Transition Table
Transition Diagram:
The transition table is basically a tabular representation of the transition function. It takes two
arguments (a state and a symbol) and returns a state (the "next state").
Example:
o
→q0 q0 q1
q1 q1, q2 q2
q2 q1 q3
*q3 q2 q2
1) DFA
2) NFA
{
definitions
}
%%
{
Translation rules
}
%%
{
user subroutines
}
Where pi describes the regular expression and action1 describes the actions what action the
lexical analyzer should take when pattern pi matches a lexeme.
User subroutines are auxiliary procedures needed by the actions. The subroutine can be
loaded with the lexical analyzer and compiled separately.
Derivation
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing we have to take two decisions. These are as follows:
We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule from
left to right. So in left most derivatives we read the input string from left to right.
Example
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a-b+c
1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c
Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule from
right to left. So in right most derivatives we read the input string from right to left.
Example:
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a-b+c
1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c
S → aS / bAA / a
B → bS / aBB / b
B → a / aS / bBB
For the string w = bbaababa, find-
1. Leftmost derivation
2. Rightmost derivation
3. Parse Tree
3) Consider the grammar-
S → A1B
A → 0A / ∈
B → 0B / 1B / ∈
Parse Tree-
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivative or more than one parse tree for the given input string. If the
grammar is not ambiguous then it is called unambiguous.
Example:
1. S = aSb | SS
2. S = ∈
For the string aabb, the above grammar generates two parse trees:
It is ambiguous Grammar
1. Left Recursion-
A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS.
A grammar containing a production having left recursion is called as Left Recursive Grammar.
Example-
S → Sa / ∈
(Left Recursive Grammar)
Left recursion is considered to be a problematic situation for Top down parsers.
Therefore, left recursion has to be eliminated from the grammar.
Examples
1) Eliminate the left recursion from the grammar
E → E + T|T
T → T * F|F
F → (E)|id
Solution
The production after removing the left recursion will be
E → TE′
E′ → +TE′| ∈
T → FT′
T′ →∗ FT′| ∈
F → (E)|id
2) − Remove the left recursion from the grammar
E → E(T)|T
T → T(F)|F
F → id
Solution
Eliminating immediate left-recursion among all Aα productions, we obtain
E → TE′
E → (T)E′|ε
T → FT′
T′ → (F)T′|ε
F → id
Left factoring
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which
are common to two or more productions.
Problem-01:
S → iEtS / iEtSeS / a
E→b
Solution-
S → iEtSS’ / a
S’ → eS / ∈
E→b
Problem-02:
Solution-
Step-01:
A → aA’
A’ → AB / Bc / Ac
Again, this is a grammar with common prefixes.
Step-02:
A → aA’
A’ → AD / Bc
D→B/c
Problem-03:
Solution-
Step-01:
S → bSS’ / a
S’ → SaaS / SaSb / b
Again, this is a grammar with common prefixes.
Step-02:
S → bSS’ / a
S’ → SaA / b
A → aS / Sb
Problem-04:
Solution-
Step-01:
S → aS’ / b
S’ → SSbS / SaSb / bb
Again, this is a grammar with common prefixes.
Step-02:
S → aS’ / b
S’ → SA / bb
A → SbS / aSb