0% found this document useful (0 votes)
32 views63 pages

Merged TOC

The document provides lecture notes on the Theory of Computation and Compiler Design, focusing on automata theory, finite automata, and regular languages. It covers historical aspects, definitions, and applications of automata, as well as the construction and minimization of finite automata. Additionally, it discusses the conversion between finite automata and context-free grammars, including concepts like null production and normal forms.

Uploaded by

kirithorranv29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views63 pages

Merged TOC

The document provides lecture notes on the Theory of Computation and Compiler Design, focusing on automata theory, finite automata, and regular languages. It covers historical aspects, definitions, and applications of automata, as well as the construction and minimization of finite automata. Additionally, it discusses the conversion between finite automata and context-free grammars, including concepts like null production and normal forms.

Uploaded by

kirithorranv29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 1
Theory of Automata and FA
2

Theory of Automata and FA


THEORY OF COMPUTATION (TOC)-
Content:
1. Introduction to TOC
2. Introduction to FA

INTRODUCTION TO CONCEPT OF AUTOMATA-


HISTORICAL ASPECT OF AUTOMATA-
 Automata theory is study of abstract computing devices, or “machines”. Before there were computers, in
1930’s, Alen turing introduced an abstract machine that has all the capabilities of today’s computers, at
least as far as in what they could compute. In 1940’s and 1950’s simpler kinds of machines, which we
today use “finite-automata”, were studied by a number of researchers. These automata, originally
proposed to model bring function, turned out to be extremely useful for a variety of other purposes. Also
in the late 1950’s the linguist N. Chomsky began the study of formal “grammars”. While not strictly
machines, these grammars have close relation-ships to abstract automata and serve today on the basis of
some important software components, including parts of compilers.
 In 1969, S. Cook extended turning’s study of what could and what could not be computed. Cook
separated those problems that can be solved, efficiently by computer, from those problems that m
problems can be solved, but in practice take so much time, that computers are unavailing for all but very
small cases of the problem. The later class of problems is called “intractable”, or NP-hard. All of this
theoretical development bears directly on what Computers Scientists do today.
 In computer science we find many examples of finite state system and the theory of finite state systems,
and the theory of finite automata as a useful design tool for these systems. A primary example is a
switching circuit. A switching circuit is composed of a finite number of gates, each of which can be in
one or two conditions, usually denoted by 0 and 1.
 We can suppose situation, in case of electric switch only two situations are possible that is “on” and
“off”. Let us say 1 for on and 0 for off. We can interchange the situation by pushing the switch, it can be
shown in the below figure.

 Automatic is also tempting to view the human brain as a finite state system. The number of brain cell or
neurons are little, probably 235 at most. Although there is proof to the opposite, that the state of each
neuron can be described by a small number of bits, if so then finite state theory applies to the brain also.
3

THE STUDY OF AUTOMATA THEORY IS FERTILE AND FUTURISTIC-


There are many reasons why the study of automata and difficulty is an important and core part of computer
science. In this we will establish many reasons which will definitely motivate the computer science students to
study automata theory. Let us list out some important features of the automata theory.
1. It plays an important role when we are making software for designing and checking the behavior of
digital circuit.
2. The “Lexical analyser” of the typical compiler that is the compiler component that breaks the input text
into logical units, such as identifiers keywords and punctuation.
3. Software for scanning large bodies of text, such as set of web pages, to detect occurance of words,
phrases or other patterns.
4. It is key to software for proving system of all types that have a finite number of distinct states, such as
communication protocols or protocol for secure exchange of information.
5. It is most useful method of software for natural language processing.

DEVELOP YOUR FEELINGS WITH THE AUTOMATA-


 Truly, it is very necessary to understand the subject like automata theory, that you should develop
feelings about this subject.
 And moreover you must try to implement it in your day life. In the whole book you will read about states
so let us develop feelings for this term.
 Sometime when you are sitting alone thinking about you previous life. Some events you certainly
remember, which plays important role in your life. Some of them are not relevant and it is difficult to you
to remember those events. Same theory applies in the case of finite state machines, when we design a
system or machine, two kinds of points come into picture, some of them certainly affect the output of the
system and others never affect the output of the system. So we can skip these irrelevant points.

FINITE AUTOMATA-
INTRODUCTION-
 In this chapter, we are discussing the mathematical model of the computers and algorithms. Further we
are going to define powerful models of computation, more and more sophisticated device for accepting
the generating languages, which are restricted model of the real computers, called finite automata or finite
state machine. These machines are close to the Central Processing Unit of a computer. Absence of the
memory makes these machines more restricted mode.
 Computer is also settled by which we mean that, on reading one particular input instruction, the machine
converts itself from the state it was, into some particular other state, where the resolve state is completely
fixed by the prior state and the input instruction. Some sequence of instruction may lead to success and
some may not. Success is fixed by the sequence of inputs. Either program will work or not.
 Before discussing the mathematical model let us discuss the pictorial representation of finite machine.
Strings are sustained into device by way of an input tape, which is divided into square, with one symbol
in each square. The main part of the device is a “black box”. Which is responsible for all proceedings?
Let us say “black box” is the finite control, finite control can serve, that what symbol is written at any
position on the input tape by means of a movable head. At first head is placed at the left most square of
tape and finite control is set is designated initial state.
4

TIPS-

 Finite automation is called “finite” because number of possible states and number of letter in the alphabet
are both finite and “automata” because the change of the state is totally governed by the input. It is
deterministic, since, what state is next is automatic not will-full, just as the motion of the hands of the
clock is automatic, while the motion of hands of a human is presumably the result of desire and thought.
 P0, P1, P2, P3, P4, P5, are state in finite control system x and y are input symbols.
 At regular interval the automata reads one symbol from the input tape and then enters in a new state that
depends only on the current state and the symbol just read.
 After reading an input symbol, reading head moves one square to the right on the input tape, so that on
the next move, it will read the symbol in next tape square. This process is repeated again and again.
 The automation then indicates approval or disapproval.
 If it winds up in one of a set of final states the input strings is considered to be accepted. The language
accepted by the machine is the set of strings, it accepts.

DEFINITION OF DETERMINISTIC FINITE AUTOMATA-


A deterministic finite automata is a quintuple
M = (Q, Σ, δ, q0, F)
Where,
Q: Is a non-empty finite set of a states presents in the finite control. (q0, q1, q2, …….)
Σ: Is a non-empty finite set of input symbols which can be passed to the finite state machine. (a, b, c, d, e, ….)
q0: Is a starting state, one of the states in Q.
F: Is a non-empty set of final states or accepting states, set of final states belongs to Q.
δ: Is a function called transaction function that takes two arguments a state and an input Symbol, it returns a
single state.

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 2
Regular Language Model
2

Regular Language Model


Symbols:- Symbols are undefined concepts or pre-emitives. Symbols are generally letters or digits

Alphabets:- It is finite non-empty set of elements or symbols & it is denoted by Σ


Σ = {0, 1, 2, 3, 4, 5, 6, …..}
= {a, e, i, o, u}

String:- A string or a word is finite sequence of symbols chosen from some alphabet.
Σ = {a, b, c, d}
u = abc, v=aabc, w = abcd

Length of String:- Number of symbols in string


|u| = 3, |v| = 4, |w| =4

Power of alphabet:- let Σ is alphabet Σk be the set of string length k each of whose symbol is in alphabet .
Example:
Σ = {0, 1}
Σ1 = {0, 1}
Σ2 = {00, 11, 01, 10}
Σ3 = {000, 001, 010, 100, 101, 110, 011, 111}
Σ4, Σ5, Σ6
Kleen star (*)

Σ* = Set of all possible combination including null


Σ* = Σ0 U Σ1 U Σ2 U Σ3…..
Σ* = { ʌ , 0, 1, 00, 11, 01, 10, 000, 010…….}

a* = all possible combination of a including null


= {ʌ, a, aa, aaa, aaaa, …….}
b* = {ʌ, b, bb, bbb, bbbb……}
(ab)* = {ʌ, ab, abab, abababab, ……}
3

ab* = {a, ab, abb, abbb,…….}


ab*c* = {a, ab, ac, abc, abbc, abbbc, abbcc………}

Σ+ = All possible combination excluding null.


a+ = {a,aa,aaa,aaaa,……}
a+b* = {a, ab, aa, abb, aabb,………..}
ab*c+ = {ac, abc, abbcc,……..}

Formal Language (L):- language is the subset of kleen star (*) operator
L ⊆Σ*

Σ = {0, 1}
Σ* = {ʌ, 0, 1, 00, 01, 10, 11………}

L = All the strings starting with 0.


Σ = {0, 00, 01, 000, 010, …….}

(a+b)* = All possible combination a+ a, b, including null


{n, a, b, aa, ab,ba...}

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 3
Construction of FA
2

Construction of FA
State:

Initial state:

Final state:

Q. String ending with a


{a, aa, ba, aaa, aba, bba....}
(a + b)*a

Q. Draw a Fa that will accept exactly one b


Σ = {a, b}
3

Q: Draw an FA that will except even number of a’s

Q. FA strings of length more than four

Q. FA having Length 6 or more

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 4
DFA Minimization and FA with Output
2

DFA Minimization and FA with Output


FA/DFA Minimization:
Distinguishable states: Are those in which one state is ending on final and another state on the non-final.
Non-Distinguishable: Are those in which both are ending on the final state or both are ending on non-final state.

Q.
∂ 0 1
 q1 q2 *q3
q2 *q3 *q3
*q3 q4 *q3
q4 *q3 *q3
*q5 q2 *q3

Q:
a b
p --- q
q *r s
*r *r s
S *r s

 Z1(p) Z2(q)
Z2(q) +Z3(r) Z4(s)
+z3(r) +Z3(r) Z4(s)
Z4(s) +Z3(r) Z4(s)
3

FA with Output:
So far we have considered the FA’s that recognizes a language i.e. it does not produce any output for any input
string, except accept or reject. It is interesting to consider FA’s with output which is called Transducers. They
compute some function or relation.

Two equivalent machines with output are:


1. Mealy Machine
2. Moore Machine

Moore and Mealy Machine: These machines are DFA’S expect that they are associated with an output symbols
with each state or with each transection

1) Moore machine: It is a FA when the output is determined by the current state only. The output of Moore
Machine is 1 character longer than the I/P that means if I/P is “n” then output will be”n+1”.the output of each
state is determined with the state itself.

It has 5 components:
a. Finite set of states: q0, q1, q2, ………. qn
b. Alphabet of input string Σ
c. An alphabet of output string Γ
d. Transition function/table δ,
e. Output table

aa ba ba  n = 6
1010111  m = 7
 There is no final state because there is no accept ion and rejection involved. There we have to see the
output.
 So, O/P is always one greater than the I/P string
 Final state always ends with O/P

2. Mealy Machine: It is also an FA where the output of a state is carried by the I/P symbols of that state .That
means output is represented in coming edges.
4

aababa
010111
Power of Moore and mealy machine is same because we can convert Moore to mealy and mealy to Moore.

Conversion of Mealy to Moore:-


Q: Mealy Machine

Equivalent Moore Machine:

Q: Mealy Machine:
5

Moore Machine:

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 6
Understanding CFG Part-2
2

Understanding CFG Part-2


Context Free Language Part-2
Content:
1. Conversion of FA to CFG
2. Conversion of CFG to FA
3. Semi-word
4. Word
5. Null Production
6. Unit Production
7. Chomsky Normal Form(CNF)
8. Greibach Normal Form (GNF)

Conversion of FA to CFG:-

S aX / bS
X  aY / bX
Y  aX /bY / ∧

Conversion of CFG to FA:


S  aX/ bS
X  aY / bX
Y  aX/bY/∧

Semi-word: The production rule in which ending symbol is always non-terminal & there is only one non-
terminal i.e. there is one and only one terminal which is at the end

N.T.  (T)(T)(T)(T)(T)………(NT)
3

Word:- String of terminal


NT  (T)(T)(T)(T)…………(T)

Null Production:- Production rule to form


NT  Null
OR
NT  ∧

Unit Production:-
A production of the form
Non-terminal → One non-terminal
(NT) (NT)
That is a production of the form A → B (where A and B, both are non-terminals) is called unit production. Unit
production increase the cost of derivation in a grammar.

Following algorithm can be used to eliminate the unit production.


Algorithm: Removal of unit production →
While (there exist a unit production, A → B)
{
Select a unit production A → B, such that there exist a production B → α, where α is a terminal.
For (every non-unit production, B → α)
Add production A → α to the grammar
Elimination A → B from the grammar.
}.

Example: Consider the context free grammar G.


S → AB
A→a
B → C/b
C→D
D→E

Remove the unit production.


Solution: Given CFG
S → AB
A→a
B → C/b
C→D
D→E
E→a
Contain three unit productions
B→C
C→D
D→E
Now to remove unit production B → C, we see if there exists a production whose left side has C and right side
contains a terminal (i.e. C → a), but there is no such productions in G. similar things holds for production C → D.
4

now we try to remove unit production D → E, before there is a production E → a. therefore, eliminate D → E and
introduction D → a, grammar becomes
S → AB
A→a
B → C/b
C→D
D→a
E→a
Now we can remove C → D by using D → a, we get
S → AB
A→a
B → C/b
C→a
D→a
E→a
Similarly, we can remove B → C by using C → a, we obtain
S → AB
A→a
B → a/b
C→a
D→a
E→a
Now it can be easily seen that production C → a, D → a E → a are useless because if we start deriving from S,
these productions will never be used. Hence eliminating them gives,
S → AB
A→a
B → a/b
Which is completely reduced grammar?

CHOMSKY NORMAL FORM


If CFG has only production of the form
Non-terminal → string of exactly two non-terminal or of the form
i.e. (NT) (NT)(NT)
Non-terminal → one terminal
i.e.
(NT) (T)
Is said to be Chomsky normal form or CNF.

Example:
S  XY
A a

Q. Change the following grammar in to CNF.


S → abSb/a/aAb
A → bS/aAAb.
5

Q. Convert CFG which is given below in to CNF form.


S → bA/aB
A → bAA/aS/a
B → aBB/bS/b.

GREIBACH NORMAL FORM (GNF)-


Tips-
For every context free language L without ∈, there exist a grammar in which every production is of the form A →
aV, where „A‟ is a variable, „a‟ is exactly one terminal and „V‟ is the string of none or more variables, clearly V ∈
V*n.
“In other words if every production of the context free grammar is of the form A → aV/a, then it is in Greibach
Normal Form”.

Greibach normal form will be used to construct a push down automata that recognize the language generated by a
context free grammar.
To convert a grammar to GNF we start with a production in which the left side has a higher numbered variable
than first variable in the right side and make replacements in right side.

Production Rules:
1. (NT)  a α

Single Terminal String of NT

2. NT  one terminal

Ex: S aXYZ
Ab

Q. S  S1S2 S1  aS1c/S2/ λ
S2  aS2b/ λ S3  aS3b / S4/ λ

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 6
Understanding CFG Part-2
2

Understanding CFG Part-2


Context Free Language Part-2
Content:
1. Conversion of FA to CFG
2. Conversion of CFG to FA
3. Semi-word
4. Word
5. Null Production
6. Unit Production
7. Chomsky Normal Form(CNF)
8. Greibach Normal Form (GNF)

Conversion of FA to CFG:-

S aX / bS
X  aY / bX
Y  aX /bY / ∧

Conversion of CFG to FA:


S  aX/ bS
X  aY / bX
Y  aX/bY/∧

Semi-word: The production rule in which ending symbol is always non-terminal & there is only one non-
terminal i.e. there is one and only one terminal which is at the end

N.T.  (T)(T)(T)(T)(T)………(NT)
3

Word:- String of terminal


NT  (T)(T)(T)(T)…………(T)

Null Production:- Production rule to form


NT  Null
OR
NT  ∧

Unit Production:-
A production of the form
Non-terminal → One non-terminal
(NT) (NT)
That is a production of the form A → B (where A and B, both are non-terminals) is called unit production. Unit
production increase the cost of derivation in a grammar.

Following algorithm can be used to eliminate the unit production.


Algorithm: Removal of unit production →
While (there exist a unit production, A → B)
{
Select a unit production A → B, such that there exist a production B → α, where α is a terminal.
For (every non-unit production, B → α)
Add production A → α to the grammar
Elimination A → B from the grammar.
}.

Example: Consider the context free grammar G.


S → AB
A→a
B → C/b
C→D
D→E

Remove the unit production.


Solution: Given CFG
S → AB
A→a
B → C/b
C→D
D→E
E→a
Contain three unit productions
B→C
C→D
D→E
Now to remove unit production B → C, we see if there exists a production whose left side has C and right side
contains a terminal (i.e. C → a), but there is no such productions in G. similar things holds for production C → D.
4

now we try to remove unit production D → E, before there is a production E → a. therefore, eliminate D → E and
introduction D → a, grammar becomes
S → AB
A→a
B → C/b
C→D
D→a
E→a
Now we can remove C → D by using D → a, we get
S → AB
A→a
B → C/b
C→a
D→a
E→a
Similarly, we can remove B → C by using C → a, we obtain
S → AB
A→a
B → a/b
C→a
D→a
E→a
Now it can be easily seen that production C → a, D → a E → a are useless because if we start deriving from S,
these productions will never be used. Hence eliminating them gives,
S → AB
A→a
B → a/b
Which is completely reduced grammar?

CHOMSKY NORMAL FORM


If CFG has only production of the form
Non-terminal → string of exactly two non-terminal or of the form
i.e. (NT) (NT)(NT)
Non-terminal → one terminal
i.e.
(NT) (T)
Is said to be Chomsky normal form or CNF.

Example:
S  XY
A a

Q. Change the following grammar in to CNF.


S → abSb/a/aAb
A → bS/aAAb.
5

Q. Convert CFG which is given below in to CNF form.


S → bA/aB
A → bAA/aS/a
B → aBB/bS/b.

GREIBACH NORMAL FORM (GNF)-


Tips-
For every context free language L without ∈, there exist a grammar in which every production is of the form A →
aV, where „A‟ is a variable, „a‟ is exactly one terminal and „V‟ is the string of none or more variables, clearly V ∈
V*n.
“In other words if every production of the context free grammar is of the form A → aV/a, then it is in Greibach
Normal Form”.

Greibach normal form will be used to construct a push down automata that recognize the language generated by a
context free grammar.
To convert a grammar to GNF we start with a production in which the left side has a higher numbered variable
than first variable in the right side and make replacements in right side.

Production Rules:
1. (NT)  a α

Single Terminal String of NT

2. NT  one terminal

Ex: S aXYZ
Ab

Q. S  S1S2 S1  aS1c/S2/ λ
S2  aS2b/ λ S3  aS3b / S4/ λ

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 7
Turing Machine and Chomsky Hierarchy
2

Turing Machine and Chomsky Hierarchy


1. The language accepted by Turing Machine is recursively enumerable language (REL).
2. If we apply some restriction on Turing Machine, then the language accepted by Truing Machine is called
Recursive large( RL).
3. All recursive languages are recursively enumerable languages.

4. Non- deterministic Turing Machine or Deterministic Turing Machine have same or equal powers.

CHOMSKY HIERARCHY-
We can exhibit the relationship between grammars by the Chomsky Hierarchy. Noum Chomsky, a founder of
formal language theory, provided an initial classification in to four language types:
Type – 0 (Unrestricted grammar)
Type – 1 (Context sensitive grammar)
Type – 2 (Context free grammar)
Type – 3 (Regular grammar)

Type 0 languages are those generated by unrestricted grammars, that is, the recursively enumerable languages.
Type-1 consists of the context-sentive languages, Type 2 consists of the context-free languages and Type 3
consists of the regular languages. Each languages family of type k is a proper subset of the family of type k – 1.
Following diagram shows the original Chomsky Hierarchy.
3

We have also met several other language families that can be fitted in to this picture. Including the families of
deterministic context-free languages (LDCF), and recursive languages (LREC). The modified Chomsky
Hierarchy can be seen in below figure.

The relationship between, linear, deterministic context-free and non-deterministic context-free language is shown
in below figure.
4

Type Name of languages generated Production Restrictions Acceptor


A→B
0 Unrestricted (recursively A = any string with non-terminal Turing machine
enumerable) B = any string
1 Context-sensitive A = any string with non terminals Pushdown
B = any string as long as or longer than A. Automata
2 Context-free A = one non-terminal Finite automata
B = any string
3 Regular A = one non-terminal
B = aX or B = a, where ‘a’ is a terminal and X is
non-terminal.

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 8
Operations and Properties of Languages
2

Operations and Properties of Languages


The Below Table shows the Closure Properties of Formal Languages:
REG = Regular Language
DCFL = deterministic context-free languages
CFL = context-free languages
CSL = context-sensitive languages
RC = Recursive
RE = Recursive Enumerable

Operations REG DCFL CFL CSL RC RE

Union Y N Y Y Y Y

Intersection Y N N Y Y Y

Set Difference Y N N Y Y N

Complement Y Y N Y Y N

Intersection with a Regular Language Y Y Y Y Y Y

Union with a Regular Language Y Y Y Y Y Y

Concatenation Y N Y Y Y Y

Kleene Star Y N Y Y Y Y

Kleene Plus Y N Y Y Y Y

Reversal Y N Y Y Y Y

Epsilon-free Homomorphism Y N Y Y Y Y

Homomorphism Y N Y N N Y

Inverse Homomorphism Y Y Y Y Y Y

Epsilon-free Substitution Y N Y Y Y Y

Substitution Y N Y N N Y

Subset N N N N N N

Left Difference with a Regular Language (L-Regular) Y Y Y Y Y Y

Right Difference with a Regular Language (Regular-R) Y Y N Y Y N

Left Quotient with a Regular Language Y Y Y N Y Y

Right Quotient with a Regular Language Y Y Y N Y Y


3

Decision properties of CFG:


1. Test for Membership: Decidable.
2. Test for Emptiness: Decidable
3. Test for finiteness: Decidable

Decision properties of Regular languages:


Almost all properties are decidable for FA:
(i) Emptiness
(ii) Non-emptiness
(iii) Finiteness
(iv) Infiniteness
(v) Membership
(vi) Equality

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 9
Pumping Lemma for Regular Languages and
CFG
2

Pumping Lemma for Regular Languages and CFG


Theorem for Regular Languages: For any regular language L, there exists an integer P, such that for all w in L
|w|>=P
We can break w into three strings, w=xyz such that.
(1) lxyl < P
(2) lyl > 1
(3) for all k>= 0: the string xykz is also in L

Applications for Regular Languages:


Pumping lemma is to be applied to show that certain languages are not regular.
It should never be used to show a language is regular.
● CIf L is regular, it satisfies the Pumping lemma.
● If L does not satisfy the Pumping Lemma, it is not regular.
Theorem for Context Free languages: If L is a context-free language, there is a pumping length p such that any
string w ∈ L of length ≥ p can be written as w = uvxyz, where vy ≠ ε, |vxy| ≤ p, and for all i ≥ 0, uvixyiz ∈ L.

Applications for CFG:


● Pumping lemma is used to check whether a grammar is context free or not.

Trick for Regular languages-


Trick (1)-
 an bm n, m  1
 Different Power, so it is regular

Trick (2)-
 a n bn | n  100
 Finite within so it is regular

Trick (3)-
 an | n  1
 Simple power, so it is regular

Trick (4)-
 a n bn | n  1
 Same power and infinite within. So not regular

Q. Check whether the Grammar is regular or not?


A. ap | p < 1
B. a P bQ | P, Q  5
C. a P bP | P  3
D. a P bP | P  2020
3

Trick (5)-
 a P bQcR | P,Q,R  1
 Different power so it is regular

Trick (6)-
 If Arithmetic progressive followed then it is regular
 a P b 2Q | P,Q  1

Q. Check whether the grammar is regular or not?


A. aP | P is prime
B. aP | P is multiple of 5

Trick (7)-
Concept of power is power terms is nwa is AP. Therefore, it is wt regular

Q. Check whettu the grammar is regular or not?


Q
A. a2 | Q  1
2
B. PX | X  1

Trick (8)-
Comparison based languages are not Regular
A. w {a, b}*
 | a ()  b ()
B.  | a ()  n b ()
Trick (9)-
Henony saving is wt done is regular languages
A. ww | t {0,1}*
B. {www R w | w {0,1}*

Q. Check whettu given grammar is regular or not.


A. a n

bn cn | n  1
B. a n bn bmcm | n, m  1

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 10
Introduction to Compiler Design and its
Phases
2

Introduction to Compiler Design and its Phases


Compiler is software which converts a program written in high level language to a program written in low level
language.

The fundamental language processing model for compilation consists of two step processing of a source program.
 Analysis of a source program.
 Synthesis of a source program.
The analysis part breaks up the source program into its constituent pieces and determines the meaning of a source
string and creates an intermediate representation of the source program. The synthesis part constructs an
equivalent target program from the intermediate representation.
3

Lexical Analysis-
❏ The first phase of scanner works as a text scanner. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes.
Syntax Analysis-
❏ The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as
input and generates a parse tree (or syntax tree). The parser checks if the expression made by the tokens is
syntactically correct.
Semantic Analysis-
❏ Semantic analysis checks whether the parse tree constructed follows the rules of language. For example,
assignment of values is between compatible data types
Intermediate Code Generation-
❏ This intermediate code should be generated in such a way that it makes it easier to be translated into the
target machine code.
Code Optimization-
❏ The next phase does code optimization of the intermediate code. Optimization can be assumed as
something that removes unnecessary code lines, and arranges the sequence of statements in order to speed
up the program execution without wasting resources (CPU, memory).

Code Generation-
❏ Here, the code generator takes the optimized representation of the intermediate code and maps it to the
target machine language.
Symbol Table-
❏ It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along
with their types are stored here. The symbol table makes it easier for the compiler to quickly search the
identifier record and retrieve it.
Error handling-
❏ The tasks of the Error Handling process are to detect each error, report it to the user, and then make some
recover strategy and implement them to handle error. Example -A run-time error is an error which takes
place during the execution of a program, invalid input data. Compile-time errors rise at compile time,
before execution of the program. Syntax error or missing file reference, semicolon missing, misspellings
of keywords or operators, infinite loops are the examples.
4

EXAMPLE:
The front-end of the compiler includes those phases which are dependent on the source language and are
independent of the target machine. It normally includes the analysis part. It may also include a certain amount of
code optimization and the error handling.
The back-end of the compiler includes those phases which are dependent on the target language and are
independent of the source language.
5

Role of Lexical Analyzer:


Lexical analyzer is the first phase of a compiler. Its job is reading the input program and breaking it up into a
sequence of tokens. These tokens are used by the syntax analyzer. Each token is a sequence of characters that
represents a unit of information in the source program.
It also performs certain secondary tasks like removing the comments and white spaces from the source program.
It may also be given the responsibility of making a copy of the source program with the error messages marked in
it. Each error message may also be associated with a line number. The analyzer may keep track of the line number
by tracking the number of new line characters seen.

LEXICAL ANALYSIS (REGULAR EXPRESSION and FINITE AUTOMATA)-


● The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belongs
to the language. Regular expressions have the capability to express finite languages by defining a pattern
for finite strings of symbols. The grammar defined by regular expressions is known as regular grammar.
Finite automata are a state machine that takes a string of symbols as input and changes its state
accordingly. Finite automata are a recognizer for regular expressions.

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 11
Lexical and Syntax Analysis Part 1
2

Lexical and Syntax Analysis Part 1


LEXICAL ANALYZER: Lexical analysis which determines the lexical constituents in a source string by
readingthe stream of characters from left to right and grouping them into tokens. A token is a sequence of
characters having a collective meaning.

Example of compilation-
Consider the translation of the following statement
x = y * z+10;
The internal representation of the source program changes with each phase of the compiler.
The lexical analyzer builds uniform descriptors for the elementary constituents of a source string.To do this; it
must identify lexical units in the source string and categorize them into identifiers, constants or reserved words
etc. The uniform descriptor is called a token. A token has the following format

Category-Lexical value-
 A token is constructed for each identifier as well as for each operator in the string. Let it assign token
id1,id2 and id3 to x,y and z respectively, and assign –op to =, mult-op to *, add-op to + and num to 10.
After lexical analysis may be given as:
 id1 assign-op id2 mult-op id3 add-op num

Basic Definitions-
 Lexemes: They are the smallest logical units of a program. It is a sequence of characters in the source
program for which a token is produced for example, if, 10.0, + etc.
 Tokens: classes of similar lexemes are identified by the same token. For example, identifier, number,
reserve word etc.

Pattern: It is a rule which describes a token. For example an identifier is a string of at most 8 characters
consisting of digits and letters. The first character should be a letter.

Example:
Count the number of tokens in following code.
int main()
{
X=y+z:
Int x, y, z:
Printf(“sum%d%d”,x);
}

PARSER AND CONTEXT FREE GRAMMARS-


● The lexical analyzer can identify tokens with the help of regular expressions. But a lexical analyzer
cannot check the syntax of a given sentence due to the limitations of the regular expressions. Regular
expressions cannot check balancing tokens, such as parenthesis. Therefore, this phase uses context-free
grammar (CFG), which is recognized by pushdown automata. CFG is more powerful than Regular
languages.
3

Top-down Parsing-
When the parser starts constructing the parse tree from the start symbol and then tries to transform the start
symbol to the input, it is called top-down parsing.
• Recursive descent parsing: It is a common form of top-down parsing. It uses recursive procedures to
process the input. Recursive descent parsing suffers from backtracking.
• Backtracking: It means, if one derivation of a production fails, the syntax analyzer restarts the process
using different rules of same production. So here process the input string more than once to determine the
right production.

TOP TO DOWN PARSER (LL Parser)-


❏ An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some
restrictions.
❏ LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in
LL(k) stands for left-most derivation and k itself represents the number of look aheads. Generally k = 1, so
LL(k) may also be written as LL(1). If a given grammar is not LL(1), then usually, it is not LL(k), for any
given k.
4

FIRST AND FOLLOW:


 FIRST ()− It is a function that gives the set of terminals that begin the strings derived from the
production rule.
 A symbol c is in FIRST (α) if and only if α ⇒ cβ for some sequence β of grammar symbols.
 Follow (A) is defined as the collection of terminal symbols that occur directly to the right of A.
 FOLLOW(A) = {a|S ⇒* αAaβ where α, β can be any strings}

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 12
Lexical and Syntax Analysis Part 2
2

Lexical and Syntax Analysis Part 2


Role of the Parser:
It plays an important role in the compiler design. It performs the following tasks.
❑ It obtains a string of tokens from the lexical analyzer.
❑ It groups the tokens appearing in the input in order to identify larger structures in the program. This
process is done to verify that the string can be generated by the grammar for the source language.
❑ It should report any syntax error in the program.
❑ It should also recover from the errors so that it can continue to process the rest of the input.
Eliminating Left Recursion:
❑ There are many syntactic features of a programming language that are expressed using recursive rules.
The recursion involved may be left recursion or right recursion. Though both forms are equivalent in
expressiveness, left recursive grammars are not suitable for top-down parsers.
❑ A grammar is left recursive if the first symbol in the right hand side of a rule is the same non-terminal as
that in the left hand side. We can also say that a grammar is left recursive if there exists a non-terminal S
such that S=> S
Fortunately, we can always eliminate left recursion from a grammar by applying certain transformations
in such a way that the translated grammar recognizes the same input strings.

Consider a production which is self-left recursive as follows-


S → Sα | β

We can make it non-recursive by rewriting the production as:


S → βS'
S' → αS' | ∈

Example:
We can apply this translation rule to the following production
E→ E + T|T
E → TE'
E' → + TE' | ∈
Here S = E, α = + T and β = T

Eliminating Left Factoring:


Left factoring is a process which isolates the common parts of two productions into a single production. Any
single production of the form
A → αβ1 | αβ2 | … | αβn
Can be represented by
A → αA'
A' → β1 | β2 | … | βn

BOTTOM UP PARSER-
❏ Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the
root node. Here, we start from a sentence and then apply production rules in reverse manner in order to
reach the start symbol.
3

Shift-Reduce Parsing-
❏ Shift-reduce parsing use two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
❏ Shift step: The shift step refers to the advancement of the input pointer to the next input symbol, which is
called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single
node of the parse tree.
❏ Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known
as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is
performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.

Example:
Let us consider the following grammar again
D → type tlist;
type → int | float
tlist → tlist, id | id.

Table shows the shift reduce parsing for the string in id, id; using this grammar.
Stack Input Action
$ int id, id; $ shift
$ int id, id; $ reduce by type → int
$ type id, id; $ shift
$ type id , id; $ reduce by tlist → id
$ type tlist , id; $ shift
$ type tlist, id; $ shift
$ type tlist, id ;$ reduce by tlist → tlist, id
$ type tlist ;$ shift
$ type tlist; $ reduce by D → type tlist;
$D $ accept
Steps of shift reduce parsing on input int id, id;

LR Parser-
❏ The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free
grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k)
parsers, where L stands for left-to-right scanning of the input stream; R stands for the construction of right-
most derivation in reverse, and k denotes the number of look ahead symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser:
• SLR(1) – Simple LR Parser:
o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction
4

• LALR(1) – Look-Ahead LR Parser:


o Works on intermediate size of grammar
o Number of states is same as in SLR(1)

• LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction

Some more points of differences:


1. A LALR(1) parsing is always free of shift reduce conflicts provided the corresponding LR(1) table is shift-
reduce conflict free.
2. A LALR(1) parsing table may have a reduce conflict even if the corresponding LR(1) table is conflict free.
3. For a sentence of the language, both LALR and LR parsers produces same sequence of shifts and reduces.
4. For an erroneous input, a LALR parser may continue to perform reductions even after the LR parser has
caught the error. But it does not shift any symbol beyond the point that the LR parser declared an error.

Q. PYQ 2019
Shift-reduce parser consists of
(a) input buffer
(b) stack
(c) parse table
Choose the correct option from those given below:
A. (a) and (b) only
B. (a) and (c) only
C. (c) only
D. (a), (b) and (c)
Ans. D

Q. PYQ 2018
A bottom up parser generates_____
A. Right most derivation
B. Rightmost derivation in reverse
C. Leftmost derivation
D. Leftmost derivation in reverse
Ans. B
5

Q. PYQ 2022
Consider the following statements:
Statement (I): LALR Parser is more powerful than canonical LR parser
Statement (II): SLR parser is more powerful than LALR
Which of the following is correct?
A. Statement (I) true and (II) statement (II) false
B. Statement (I) false and statement (II) true
C. Both Statement (I) and Statement (II) false
D. Both statement (I) and statement (II) true
Ans. C

Q. PYQ 2021
Statement I: LL(1) and LR are examples of Bottom-up parsers.
Statement II: Recursive descent parser and SLR are examples of Top-down parsers.
In light of the above statements, choose the correct answer from the options given below Options:
A. Both statements I and statement II are false
B. Both statements I and statement II are true
C. Statement I is false and statement II is true
D. Statement I is true but statement II is false
Ans. A

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1

UGC NET

DAILY
CLASS NOTES
Computer

Theory of Computation and Compiler Design


Lecture – 13
Intermediate Code Generation, Code
Optimization and Code Generation
2

Intermediate Code Generation, Code Optimization and Code


Generation
INTERMEDIATE CODE GENERATION:
 The front end of a compiler translates a source program into an independent intermediate code, and
then the back end of the compiler uses this intermediate code to generate the target code.

Advantages:
1. Because of the machine-independent intermediate code, portability wills beenhanced. For ex, suppose,
if a compiler translates the source language to its target machine language without having the option
for generating intermediate code, then for each new machine, a full native compiler is required.
Because, obviously, there were some modifications in the compiler itself according to the machine
specifications.
2. It is easier to apply source code modification to improve the performance of source code by optimizing
the intermediate code.

Commonly used intermediate code representations are:


Postfix Notation:
1. Also known as reverse Polish notation or suffix notation.
2. In the infix notation, the operator is placed between operands, e.g., a+b. Postfix notation positions the
operator at the right end, as in ab +.
3. Postfix notation eliminates the need for parentheses, as the operator’s position and arity allow
unambiguous expression decoding.
4. In postfix notation, the operator consistently follows the operand.

Three-Address Code:
 A three address statement involves a maximum of three references, consisting of two for operands and one
for the result.
3

 The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent
memory addresses.
 Each variable (x, y, and z) in a three address statement is associated with a specific memory location.
 Example: The three address code for the expression a + b * c + d: T1 = b * c T2 = a + T1 T3 = T2 + d; T1,
T2, T3 are temporary variables.
 There are 3 ways to represent a Three-Address Code in compiler design:
(i) Quadruples
(ii) Triples
(iii) Indirect Triples

Syntax Tree:
 A syntax tree serves as a condensed representation of a parse tree.
 The operator and keyword nodes present in the parse tree undergo a relocation process to become part of
their respective parent nodes in the syntax tree. The internal nodes are operators and child nodes are
operands.
Example: x = (a + b * c) / (a – b * c)

Code optimization:
To decrease CPU utilization time, less power consumption and efficient usage.
Two techniques:
1. Platform dependent technique: It is dependent on underlying architecture like processor, registers, cache etc.
a. Peephole optimization
b. Instruction level parallelism
c. Data level parallelism
d. Cache optimization
e. Redundant resources

Peephole optimization – It is applied on small piece of code, repeatedly applied, apply on target code. It
includes:
a. Redundant load and store:
b. Strength reduction
c. Simplify algebraic expression
d. Replace slower instruction with faster
e. Dead code elimination
4

Platform independent technique:


It is dependent on machine.
a. Loop optimization
b. Constant folding
c. Constant propagation
d. Common subexpression elimination

Loop optimization-
a. Code motion (frequency reduction)
b. Loop fusion (Loop jamming)
c. Loop unrolling

Q. PYQ UGC NET-


In ____, the bodies of the two loops are merged together to form a single loop provided that they do not
make any reference to each other.
A. Loop unrolling
B. Strength reduction
C. Loop concatenation
D. Loop jamming
Ans. D

Q. PYQ UGC NET


In compiler optimization, operator strength reduction uses mathematical identities to replace slow math
operations with faster operations. Which of the following code replacements is an illustration of operator
strength reduction?
A. Replace P + P by 2 * P or Replace 3 + 4 by 7.
B. Replace P * 32 by P << 5
C. Replace P * 0 by 0
D. Replace (P << 4) - P by P * 15
Ans. B

Q. PYQ UGC NET


Replacing the expression 4*2.14 by 8.56 is known as
A. Constant folding
B. Induction variable
C. Strength reduction
D. Code reduction
Ans. A
Solution: Constant folding is defined as the replacement of values of expression by constant before compile time.
When a variable increases or decreases by some fixed amount of time after every iteration is called induction
variable.

Code Generation:
It can be considered as the final phase of compilation. Through post code generation, the optimization process can
be applied on the code, but that can be seen as a part of code generation phase itself. Code generated by the
compiler is an object code of some lower-level programming language, i.e. assembly language. We have seen that
source code written in a higher-level language is transformed into a lower-level language that results in a lower-
level object code, which should have the following minimum properties:
a. It should carry the exact meaning of the source code.
b. It should be efficient in terms of CPU usage and memory management.
5

EXAMPLE:

PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if

You might also like