Merged TOC
Merged TOC
UGC NET
DAILY
CLASS NOTES
Computer
Automatic is also tempting to view the human brain as a finite state system. The number of brain cell or
neurons are little, probably 235 at most. Although there is proof to the opposite, that the state of each
neuron can be described by a small number of bits, if so then finite state theory applies to the brain also.
3
FINITE AUTOMATA-
INTRODUCTION-
In this chapter, we are discussing the mathematical model of the computers and algorithms. Further we
are going to define powerful models of computation, more and more sophisticated device for accepting
the generating languages, which are restricted model of the real computers, called finite automata or finite
state machine. These machines are close to the Central Processing Unit of a computer. Absence of the
memory makes these machines more restricted mode.
Computer is also settled by which we mean that, on reading one particular input instruction, the machine
converts itself from the state it was, into some particular other state, where the resolve state is completely
fixed by the prior state and the input instruction. Some sequence of instruction may lead to success and
some may not. Success is fixed by the sequence of inputs. Either program will work or not.
Before discussing the mathematical model let us discuss the pictorial representation of finite machine.
Strings are sustained into device by way of an input tape, which is divided into square, with one symbol
in each square. The main part of the device is a “black box”. Which is responsible for all proceedings?
Let us say “black box” is the finite control, finite control can serve, that what symbol is written at any
position on the input tape by means of a movable head. At first head is placed at the left most square of
tape and finite control is set is designated initial state.
4
TIPS-
Finite automation is called “finite” because number of possible states and number of letter in the alphabet
are both finite and “automata” because the change of the state is totally governed by the input. It is
deterministic, since, what state is next is automatic not will-full, just as the motion of the hands of the
clock is automatic, while the motion of hands of a human is presumably the result of desire and thought.
P0, P1, P2, P3, P4, P5, are state in finite control system x and y are input symbols.
At regular interval the automata reads one symbol from the input tape and then enters in a new state that
depends only on the current state and the symbol just read.
After reading an input symbol, reading head moves one square to the right on the input tape, so that on
the next move, it will read the symbol in next tape square. This process is repeated again and again.
The automation then indicates approval or disapproval.
If it winds up in one of a set of final states the input strings is considered to be accepted. The language
accepted by the machine is the set of strings, it accepts.
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
String:- A string or a word is finite sequence of symbols chosen from some alphabet.
Σ = {a, b, c, d}
u = abc, v=aabc, w = abcd
Power of alphabet:- let Σ is alphabet Σk be the set of string length k each of whose symbol is in alphabet .
Example:
Σ = {0, 1}
Σ1 = {0, 1}
Σ2 = {00, 11, 01, 10}
Σ3 = {000, 001, 010, 100, 101, 110, 011, 111}
Σ4, Σ5, Σ6
Kleen star (*)
Formal Language (L):- language is the subset of kleen star (*) operator
L ⊆Σ*
Σ = {0, 1}
Σ* = {ʌ, 0, 1, 00, 01, 10, 11………}
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Construction of FA
State:
Initial state:
Final state:
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Q.
∂ 0 1
q1 q2 *q3
q2 *q3 *q3
*q3 q4 *q3
q4 *q3 *q3
*q5 q2 *q3
Q:
a b
p --- q
q *r s
*r *r s
S *r s
Z1(p) Z2(q)
Z2(q) +Z3(r) Z4(s)
+z3(r) +Z3(r) Z4(s)
Z4(s) +Z3(r) Z4(s)
3
FA with Output:
So far we have considered the FA’s that recognizes a language i.e. it does not produce any output for any input
string, except accept or reject. It is interesting to consider FA’s with output which is called Transducers. They
compute some function or relation.
Moore and Mealy Machine: These machines are DFA’S expect that they are associated with an output symbols
with each state or with each transection
1) Moore machine: It is a FA when the output is determined by the current state only. The output of Moore
Machine is 1 character longer than the I/P that means if I/P is “n” then output will be”n+1”.the output of each
state is determined with the state itself.
It has 5 components:
a. Finite set of states: q0, q1, q2, ………. qn
b. Alphabet of input string Σ
c. An alphabet of output string Γ
d. Transition function/table δ,
e. Output table
aa ba ba n = 6
1010111 m = 7
There is no final state because there is no accept ion and rejection involved. There we have to see the
output.
So, O/P is always one greater than the I/P string
Final state always ends with O/P
2. Mealy Machine: It is also an FA where the output of a state is carried by the I/P symbols of that state .That
means output is represented in coming edges.
4
aababa
010111
Power of Moore and mealy machine is same because we can convert Moore to mealy and mealy to Moore.
Q: Mealy Machine:
5
Moore Machine:
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Conversion of FA to CFG:-
S aX / bS
X aY / bX
Y aX /bY / ∧
Semi-word: The production rule in which ending symbol is always non-terminal & there is only one non-
terminal i.e. there is one and only one terminal which is at the end
N.T. (T)(T)(T)(T)(T)………(NT)
3
Unit Production:-
A production of the form
Non-terminal → One non-terminal
(NT) (NT)
That is a production of the form A → B (where A and B, both are non-terminals) is called unit production. Unit
production increase the cost of derivation in a grammar.
now we try to remove unit production D → E, before there is a production E → a. therefore, eliminate D → E and
introduction D → a, grammar becomes
S → AB
A→a
B → C/b
C→D
D→a
E→a
Now we can remove C → D by using D → a, we get
S → AB
A→a
B → C/b
C→a
D→a
E→a
Similarly, we can remove B → C by using C → a, we obtain
S → AB
A→a
B → a/b
C→a
D→a
E→a
Now it can be easily seen that production C → a, D → a E → a are useless because if we start deriving from S,
these productions will never be used. Hence eliminating them gives,
S → AB
A→a
B → a/b
Which is completely reduced grammar?
Example:
S XY
A a
Greibach normal form will be used to construct a push down automata that recognize the language generated by a
context free grammar.
To convert a grammar to GNF we start with a production in which the left side has a higher numbered variable
than first variable in the right side and make replacements in right side.
Production Rules:
1. (NT) a α
2. NT one terminal
Ex: S aXYZ
Ab
Q. S S1S2 S1 aS1c/S2/ λ
S2 aS2b/ λ S3 aS3b / S4/ λ
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Conversion of FA to CFG:-
S aX / bS
X aY / bX
Y aX /bY / ∧
Semi-word: The production rule in which ending symbol is always non-terminal & there is only one non-
terminal i.e. there is one and only one terminal which is at the end
N.T. (T)(T)(T)(T)(T)………(NT)
3
Unit Production:-
A production of the form
Non-terminal → One non-terminal
(NT) (NT)
That is a production of the form A → B (where A and B, both are non-terminals) is called unit production. Unit
production increase the cost of derivation in a grammar.
now we try to remove unit production D → E, before there is a production E → a. therefore, eliminate D → E and
introduction D → a, grammar becomes
S → AB
A→a
B → C/b
C→D
D→a
E→a
Now we can remove C → D by using D → a, we get
S → AB
A→a
B → C/b
C→a
D→a
E→a
Similarly, we can remove B → C by using C → a, we obtain
S → AB
A→a
B → a/b
C→a
D→a
E→a
Now it can be easily seen that production C → a, D → a E → a are useless because if we start deriving from S,
these productions will never be used. Hence eliminating them gives,
S → AB
A→a
B → a/b
Which is completely reduced grammar?
Example:
S XY
A a
Greibach normal form will be used to construct a push down automata that recognize the language generated by a
context free grammar.
To convert a grammar to GNF we start with a production in which the left side has a higher numbered variable
than first variable in the right side and make replacements in right side.
Production Rules:
1. (NT) a α
2. NT one terminal
Ex: S aXYZ
Ab
Q. S S1S2 S1 aS1c/S2/ λ
S2 aS2b/ λ S3 aS3b / S4/ λ
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
4. Non- deterministic Turing Machine or Deterministic Turing Machine have same or equal powers.
CHOMSKY HIERARCHY-
We can exhibit the relationship between grammars by the Chomsky Hierarchy. Noum Chomsky, a founder of
formal language theory, provided an initial classification in to four language types:
Type – 0 (Unrestricted grammar)
Type – 1 (Context sensitive grammar)
Type – 2 (Context free grammar)
Type – 3 (Regular grammar)
Type 0 languages are those generated by unrestricted grammars, that is, the recursively enumerable languages.
Type-1 consists of the context-sentive languages, Type 2 consists of the context-free languages and Type 3
consists of the regular languages. Each languages family of type k is a proper subset of the family of type k – 1.
Following diagram shows the original Chomsky Hierarchy.
3
We have also met several other language families that can be fitted in to this picture. Including the families of
deterministic context-free languages (LDCF), and recursive languages (LREC). The modified Chomsky
Hierarchy can be seen in below figure.
The relationship between, linear, deterministic context-free and non-deterministic context-free language is shown
in below figure.
4
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Union Y N Y Y Y Y
Intersection Y N N Y Y Y
Set Difference Y N N Y Y N
Complement Y Y N Y Y N
Concatenation Y N Y Y Y Y
Kleene Star Y N Y Y Y Y
Kleene Plus Y N Y Y Y Y
Reversal Y N Y Y Y Y
Epsilon-free Homomorphism Y N Y Y Y Y
Homomorphism Y N Y N N Y
Inverse Homomorphism Y Y Y Y Y Y
Epsilon-free Substitution Y N Y Y Y Y
Substitution Y N Y N N Y
Subset N N N N N N
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Trick (2)-
a n bn | n 100
Finite within so it is regular
Trick (3)-
an | n 1
Simple power, so it is regular
Trick (4)-
a n bn | n 1
Same power and infinite within. So not regular
Trick (5)-
a P bQcR | P,Q,R 1
Different power so it is regular
Trick (6)-
If Arithmetic progressive followed then it is regular
a P b 2Q | P,Q 1
Trick (7)-
Concept of power is power terms is nwa is AP. Therefore, it is wt regular
Trick (8)-
Comparison based languages are not Regular
A. w {a, b}*
| a () b ()
B. | a () n b ()
Trick (9)-
Henony saving is wt done is regular languages
A. ww | t {0,1}*
B. {www R w | w {0,1}*
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
The fundamental language processing model for compilation consists of two step processing of a source program.
Analysis of a source program.
Synthesis of a source program.
The analysis part breaks up the source program into its constituent pieces and determines the meaning of a source
string and creates an intermediate representation of the source program. The synthesis part constructs an
equivalent target program from the intermediate representation.
3
Lexical Analysis-
❏ The first phase of scanner works as a text scanner. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes.
Syntax Analysis-
❏ The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as
input and generates a parse tree (or syntax tree). The parser checks if the expression made by the tokens is
syntactically correct.
Semantic Analysis-
❏ Semantic analysis checks whether the parse tree constructed follows the rules of language. For example,
assignment of values is between compatible data types
Intermediate Code Generation-
❏ This intermediate code should be generated in such a way that it makes it easier to be translated into the
target machine code.
Code Optimization-
❏ The next phase does code optimization of the intermediate code. Optimization can be assumed as
something that removes unnecessary code lines, and arranges the sequence of statements in order to speed
up the program execution without wasting resources (CPU, memory).
Code Generation-
❏ Here, the code generator takes the optimized representation of the intermediate code and maps it to the
target machine language.
Symbol Table-
❏ It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along
with their types are stored here. The symbol table makes it easier for the compiler to quickly search the
identifier record and retrieve it.
Error handling-
❏ The tasks of the Error Handling process are to detect each error, report it to the user, and then make some
recover strategy and implement them to handle error. Example -A run-time error is an error which takes
place during the execution of a program, invalid input data. Compile-time errors rise at compile time,
before execution of the program. Syntax error or missing file reference, semicolon missing, misspellings
of keywords or operators, infinite loops are the examples.
4
EXAMPLE:
The front-end of the compiler includes those phases which are dependent on the source language and are
independent of the target machine. It normally includes the analysis part. It may also include a certain amount of
code optimization and the error handling.
The back-end of the compiler includes those phases which are dependent on the target language and are
independent of the source language.
5
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Example of compilation-
Consider the translation of the following statement
x = y * z+10;
The internal representation of the source program changes with each phase of the compiler.
The lexical analyzer builds uniform descriptors for the elementary constituents of a source string.To do this; it
must identify lexical units in the source string and categorize them into identifiers, constants or reserved words
etc. The uniform descriptor is called a token. A token has the following format
Category-Lexical value-
A token is constructed for each identifier as well as for each operator in the string. Let it assign token
id1,id2 and id3 to x,y and z respectively, and assign –op to =, mult-op to *, add-op to + and num to 10.
After lexical analysis may be given as:
id1 assign-op id2 mult-op id3 add-op num
Basic Definitions-
Lexemes: They are the smallest logical units of a program. It is a sequence of characters in the source
program for which a token is produced for example, if, 10.0, + etc.
Tokens: classes of similar lexemes are identified by the same token. For example, identifier, number,
reserve word etc.
Pattern: It is a rule which describes a token. For example an identifier is a string of at most 8 characters
consisting of digits and letters. The first character should be a letter.
Example:
Count the number of tokens in following code.
int main()
{
X=y+z:
Int x, y, z:
Printf(“sum%d%d”,x);
}
Top-down Parsing-
When the parser starts constructing the parse tree from the start symbol and then tries to transform the start
symbol to the input, it is called top-down parsing.
• Recursive descent parsing: It is a common form of top-down parsing. It uses recursive procedures to
process the input. Recursive descent parsing suffers from backtracking.
• Backtracking: It means, if one derivation of a production fails, the syntax analyzer restarts the process
using different rules of same production. So here process the input string more than once to determine the
right production.
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Example:
We can apply this translation rule to the following production
E→ E + T|T
E → TE'
E' → + TE' | ∈
Here S = E, α = + T and β = T
BOTTOM UP PARSER-
❏ Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the
root node. Here, we start from a sentence and then apply production rules in reverse manner in order to
reach the start symbol.
3
Shift-Reduce Parsing-
❏ Shift-reduce parsing use two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
❏ Shift step: The shift step refers to the advancement of the input pointer to the next input symbol, which is
called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single
node of the parse tree.
❏ Reduce step: When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known
as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is
performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.
Example:
Let us consider the following grammar again
D → type tlist;
type → int | float
tlist → tlist, id | id.
Table shows the shift reduce parsing for the string in id, id; using this grammar.
Stack Input Action
$ int id, id; $ shift
$ int id, id; $ reduce by type → int
$ type id, id; $ shift
$ type id , id; $ reduce by tlist → id
$ type tlist , id; $ shift
$ type tlist, id; $ shift
$ type tlist, id ;$ reduce by tlist → tlist, id
$ type tlist ;$ shift
$ type tlist; $ reduce by D → type tlist;
$D $ accept
Steps of shift reduce parsing on input int id, id;
LR Parser-
❏ The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free
grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k)
parsers, where L stands for left-to-right scanning of the input stream; R stands for the construction of right-
most derivation in reverse, and k denotes the number of look ahead symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser:
• SLR(1) – Simple LR Parser:
o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction
4
• LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction
Q. PYQ 2019
Shift-reduce parser consists of
(a) input buffer
(b) stack
(c) parse table
Choose the correct option from those given below:
A. (a) and (b) only
B. (a) and (c) only
C. (c) only
D. (a), (b) and (c)
Ans. D
Q. PYQ 2018
A bottom up parser generates_____
A. Right most derivation
B. Rightmost derivation in reverse
C. Leftmost derivation
D. Leftmost derivation in reverse
Ans. B
5
Q. PYQ 2022
Consider the following statements:
Statement (I): LALR Parser is more powerful than canonical LR parser
Statement (II): SLR parser is more powerful than LALR
Which of the following is correct?
A. Statement (I) true and (II) statement (II) false
B. Statement (I) false and statement (II) true
C. Both Statement (I) and Statement (II) false
D. Both statement (I) and statement (II) true
Ans. C
Q. PYQ 2021
Statement I: LL(1) and LR are examples of Bottom-up parsers.
Statement II: Recursive descent parser and SLR are examples of Top-down parsers.
In light of the above statements, choose the correct answer from the options given below Options:
A. Both statements I and statement II are false
B. Both statements I and statement II are true
C. Statement I is false and statement II is true
D. Statement I is true but statement II is false
Ans. A
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if
1
UGC NET
DAILY
CLASS NOTES
Computer
Advantages:
1. Because of the machine-independent intermediate code, portability wills beenhanced. For ex, suppose,
if a compiler translates the source language to its target machine language without having the option
for generating intermediate code, then for each new machine, a full native compiler is required.
Because, obviously, there were some modifications in the compiler itself according to the machine
specifications.
2. It is easier to apply source code modification to improve the performance of source code by optimizing
the intermediate code.
Three-Address Code:
A three address statement involves a maximum of three references, consisting of two for operands and one
for the result.
3
The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent
memory addresses.
Each variable (x, y, and z) in a three address statement is associated with a specific memory location.
Example: The three address code for the expression a + b * c + d: T1 = b * c T2 = a + T1 T3 = T2 + d; T1,
T2, T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
(i) Quadruples
(ii) Triples
(iii) Indirect Triples
Syntax Tree:
A syntax tree serves as a condensed representation of a parse tree.
The operator and keyword nodes present in the parse tree undergo a relocation process to become part of
their respective parent nodes in the syntax tree. The internal nodes are operators and child nodes are
operands.
Example: x = (a + b * c) / (a – b * c)
Code optimization:
To decrease CPU utilization time, less power consumption and efficient usage.
Two techniques:
1. Platform dependent technique: It is dependent on underlying architecture like processor, registers, cache etc.
a. Peephole optimization
b. Instruction level parallelism
c. Data level parallelism
d. Cache optimization
e. Redundant resources
Peephole optimization – It is applied on small piece of code, repeatedly applied, apply on target code. It
includes:
a. Redundant load and store:
b. Strength reduction
c. Simplify algebraic expression
d. Replace slower instruction with faster
e. Dead code elimination
4
Loop optimization-
a. Code motion (frequency reduction)
b. Loop fusion (Loop jamming)
c. Loop unrolling
Code Generation:
It can be considered as the final phase of compilation. Through post code generation, the optimization process can
be applied on the code, but that can be seen as a part of code generation phase itself. Code generated by the
compiler is an object code of some lower-level programming language, i.e. assembly language. We have seen that
source code written in a higher-level language is transformed into a lower-level language that results in a lower-
level object code, which should have the following minimum properties:
a. It should carry the exact meaning of the source code.
b. It should be efficient in terms of CPU usage and memory management.
5
EXAMPLE:
PW Web/App: https://smart.link/7wwosivoicgd4
Library - https://smart.link/sdfez8ejd80if