Module 1
Module 1
AUTOMATA THEORY
AND COMPILER
DESIGN - 21CS51
MODULE-1
Introduction to Automata Theory: Central Concepts of Automata theory, Deterministic Finite
Automata (DFA), Non- Deterministic Finite Automata(NFA) ,Epsilon- NFA, NFA to DFA Conversion,
Minimization of DFA
Alphabets: A symbol is an abstract entity. Letters and digits are examples of frequently used symbol. An
alphabet is a finite, non-empty set of symbol and is denoted by ∑.
Operations on string:
Concatenation: of two strings is formed by writing first string followed by second string with no
space.Ex. V=’a’, W=’cat’ then V.W=’acat’
Reverse: of the string W is obtained by writing the symbols of the string in reverse order and is denoted as W R. ex.
W=’the’ then WR=’eht’.
Length: of a string W, denoted |W| is the number of symbols composing the string. Ex.W=’the’ then
|W|=3
Power of Alphabets:if ∑ is alphabet, the set of all strings of certain length can be expressed from that alphabet
by using exponential notation. Ex. ∑={a,b} then
∑0 = { Ɛ}
∑1={a,b}
∑2={aa,ab,bb,ba}
∑3 ={aaa,aab,aba,abb,bbb,baa,bab,bba}
Language: set of strings, all are chosen from ∑*, where ∑ is a particular alphabet is called a language. L ∑*.
Finite automata are computing devices that accept/recognize regular languages and are used to model
operations of many systems. Their operations can be simulated by a very simple computer program.
Automaton:
A finite automaton (FA, also called a finite-state automaton or a finite-state machine) is a mathematical
tool used to describe processes involving inputs and outputs. An FA can be in one of several states and
can switch between states depending on symbols that it inputs. Once it settles in a state, it reads the next
input symbol, performs some computational task associated with the new input, outputs a symbol, and
switches to a new state depending on the input. Notice that the new state may be identical to the current
state.
Examples
Extended Transition δ*: Describes what happens when we start in any state and follow sequence of
inputs.
Definition:
Let M = (Q, ∑, δ, q0, F) where
Q is non-empty, finite set of states.
∑ is non-empty, finite set of input alphabets.
q0 ∈Q is the start state.
F ⊆ Q is set of accepting or final states.
δ* is extended transition function, which is a mapping from Q X ∑ -> Q. as follows:
i. For any q ∈ Q , δ*(q, ∈)=q
ii. For any q∈Q, y ∈ ∑ *, a ∈ ∑
δ*(q, ya)= δ(δ*(q,y),a)
Let M = (Q, ∑, δ, q0, F) be an NFA with M = (Q, ∑, δ, q0, F) transitions and let S be any subset of Q. The
Ɛ-closure of S denoted as Ɛ(S) is defined by
1. Every element of S is an element of Ɛ(S).
2. For any q Є Ɛ(S) every element of δ(q, Ɛ) is in Ɛ(S)
3. No other element are in Ɛ(S)
0
1 ε
Start q r s
0 ε
1
• ε-closure(q) = { q }
• ε-closure(r) = { r, s}
Examples
Step3: The state [qa, qb,….qc] ∈ QD is the final state, if at least one of the state in qa, qb, ….. qc∈ AN i.e., at
least one of the component in [qa, qb,….qc] should be the final state of NFA.
1 ε
Start q r s
0 ε
1
Converts to
0,1
Start q sr
0,1
If MD = (QD, ∑D, δD, q0, FD) is the DFA constructed from NFA MN = (QN, ∑, δN, q0, FN) by the subset
construction, then L(MD) = L(MN).
Proof: Let |w| =0, that is w= ε. By the basis definitions of δ* for DFA’s and NFA’s both δ*({ q0 }, ε )
and δ*( q0, ε) are {q0}
Let w be of length n+1, and assume the statement for length n. break w as w=xa, where a is the final
symbol of w. by the inductive hypothesis δ*({ q0 }, x )= δ*( q0, x). let both these sets of N’s states be
{p1,p2,…. pk}
The inductive part of the definition of δ*for NFA’s tells that:
Using eqn 2 and the fact that δ*({ q0 }, x )={p1,p2,…. pk}in the inductive part of the definition of δ* for
DFA’s
There can be zero or one There can be zero, one or There can be zero, one or
transition from a state on an more number of more number of transitions
input symbol; transitions from a state on from a state with or without
an input symbol an input symbol
Difficult to design The NFA are easier to Easy to construct using
design regular expression
More number of transitions Less number of More number of transitions
transitions compared to NFA
Less powerful since at any More powerful; than DFA More powerful than NFA
point of time it will be in only since it can be in more since at any point of time it
one state than one state will be in more than one
state with or without giving
any input.
Preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short hands
for longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern
flow-of-control and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by
certain amounts to build-in macro
COMPILER
Compiler is a translator program that translates a program written in (HLL) the
source program and translate it into an equivalent program in (MLL) the target
Source pgm g
m
target pgm
Compiler
Error msg
Executing a program written n HLL programming language is basically of two parts. the
source program must first be compiled translated into a object program. Then the results
object program is loaded into a memory executed.
ASSEMBLER
programmers found it difficult to write or read programs in machine language. They begin to
use a mnemonic (symbols) for each machine instruction, which they would subsequently
translate into machine language. Such a mnemonic machine language is now called an
assembly language. Programs known as assembler were written to automate the translation
of assembly language in to machine language. The input to an assembler program is called
source program, the output is a machine language translation (object program).
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA
also uses interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
Syntax Analysis:-The second stage of translation is called Syntax analysis or parsing. In this
phase expressions, statements, declarations etc… are identified by using the results of lexical
analysis. Syntax analysis is aided by using techniques based on formal grammar of the
programming language.
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
A syntax analyzer is also called as a parser. A parse tree describes a syntactic structure.
Semantic Analysis: Uses syntax tree and information in symbol table to check source program for
semantic consistency with language definition. It gathers type information and saves it in either
syntax tree or symbol table for use in Intermediate code generation.
Type checking- compiler checks whether each operator has the matching operands.
Coercions-language specification may permit some type of conversion.