0% found this document useful (0 votes)
69 views16 pages

Flat 1

Uploaded by

SAROJ KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views16 pages

Flat 1

Uploaded by

SAROJ KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

FLAT

Module 1

Introduction

1.1 Alphabet

Definition − An alphabet is any finite set of symbols.


Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’,
and ‘d’ are symbols.

1.2 Languages and grammars

Definition − A language is a subset of ∑* for some alphabet ∑. It


can be finite or infinite.
Example − If the language takes all possible strings of length 2
over ∑ = {a, b}, then L = { ab, aa, ba, bb }

Grammar

Simplifying Context Free Grammars

A context free grammar (CFG) is in Chomsky Normal Form (CNF) if


all production rules satisfy one of the following conditions:

● A non-terminal generating a terminal (e.g.; X->x)

● A non-terminal generating two non-terminals (e.g.; X->YZ)

● Start symbol generating ε. (e.g.; S->ε)

Consider the following grammars,


G1 = {S->a, S->AZ, A->a, Z->z}

G2 = {S->a, S->aZ, Z->a}

The grammar G1 is in CNF as production rules satisfy the


rules specified for CNF. However, the grammar G2 is not in CNF as
the production rule S->aZ contains terminal followed by non-
terminal which does not satisfy the rules specified for CNF.

Note –

● CNF is a pre-processing step used in various algorithms.

● For generation of string x of length ‘m’, it requires ‘2m-1’


production or steps in CNF.

How to convert CFG to CNF?

Step 1. Remove start from RHS. If start symbol S is at the RHS of


any production in the grammar, create a new production as:S0-
>S where S0 is the new start symbol.

Step 2. Remove null, unit and useless productions. If CFG


comprises of any production rules, remove them.

Step 3. Remove terminals from RHS. For eg; production rule X-


>xY can be decomposed as: X->ZY Z->x

Step 4. Remove RHS with more than two non-terminals. e.g,;


production rule X->XYZ can be decomposed as:
X->PZ
P->XY
1.3 Productions and derivation

A production, if it's left side occurs on its right side, is called


recursive. Production S→aS, for example, is recursive. A
manufacturing A → α is recursive indirectly. If a sentence form
that includes A is derived from A, then suppose we have the
following grammar :

S → b/aA

A → c/bS

Due to the following derivations, the S → aA and A→ bs


productions are both indirectly recursive:

S ⇒ aA ⇒ abS,

A ⇒ bS ⇒ baA

A grammar is recursive if either a recursive or indirectly recursive


production is included in it.

Derivation

Derivation is a sequence of the laws of development. Via these


production laws, it is used to get the input string. We have to
make two decisions during parsing. That are like the following:

● The non-terminal that is to be replaced must be


determined by us.
● We have to determine the law of development by which
the non-terminal would be substituted.

We have two options to assess which non-terminal with the


production rule is to be put.

 Left most derivation

The input is scanned and replaced with the development rule


from left to right in the left-most derivation. So, we read the
input string from left to right in the left-most derivation.

2. Right most derivation

The input is scanned and replaced with the production rule


from right to left in the right-most derivation. So, we read the
input string from right to left in the right - most derivation.

1.4 Chomsky hierarchy of languages

The Chomsky Hierarchy describes the class of languages that the


various machines embrace. The language group in Chomsky's
Hierarchy is as follows :
 Type 0 is known as Unrestricted Grammar.
 Type 1 known as Context Sensitive Grammar.
 Type 2 is known as Context Free Grammar.
 Type 3 Regular Grammar.

Fig 1: Chomsky hierarchy


The above diagram shows the hierarchy of chomsky. Each
language of type 3 is therefore also of type 2, 1 and 0
respectively. Similarly, every language of type 2 is also of type 1
and type 0 likewise.

● Type 0 grammar -

Type 0 grammar is known as Unrestricted grammar.The grammar


rules of these types of languages are unregulated. Turing
machines can model these languages efficiently.

Example :
 bAa → aa
 P→p

● Type 1 grammar -

Type 1 grammar is known as Context Sensitive Grammar.Context-


sensitive grammar is used to describe language that is sensitive
to context. The context-sensitive grammar follows the rules
below:

● There may be more than one symbol on the left side of


their development rules for context-sensitive grammar.

● The number of left-hand symbols does not surpass the


number of right-hand symbols.

● The A → ε type rule is not permitted unless A is a


beginning symbol. It does not occur on the right-hand side of
any rule.

● The Type 1 grammar should be Type 0. Production is in the


form of V →T in type1

Key takeaway :
● Where the number of symbols in V is equal to or less than
T

Example :
 S → AT
 T → xy
 A→a

● Type 2 grammar -

Type 2 Grammar is called Context Free Grammar. Context free


languages are the languages which can be represented by the
CNF (context free grammar) Type 2 should be type 1. The
production rule is :

A→α

Key takeaway :

● Where A is any single non-terminal and is any combination


of terminals and non-terminals.

Example :
 A → aBb
 A→b
 B→a

● Type 3 grammar -

Type 3 Grammar is called Regular Grammar.Those languages that


can be represented using regular expressions are regular
languages. These languages may be NFA or DFA-modelled.
Type3 has to be in the form of -

V → T*V / T*

Example :

A → xy

Key takeaway :

● The most limited form of grammar is Type3. Type2 and


Type1 should be the grammar of Type3.

1.5 Regular expressions and languages


A Regular Expression can be recursively defined as follows −
● ε is a Regular Expression indicates the language containing an
empty string. (L (ε) = {ε})
● φ is a Regular Expression denoting an empty language. (L (φ) =
{ })
● x is a Regular Expression where L = {x}
● If X is a Regular Expression denoting the language L(X) and Y is a
Regular Expression denoting the language L(Y), then

language L(X) ∪ L(Y) where L(X+Y) = L(X) ∪ L(Y).


 X + Y is a Regular Expression corresponding to the

 X . Y is a Regular Expression corresponding to the language L(X) .


L(Y) where L(X.Y) = L(X) . L(Y)
 R* is a Regular Expression corresponding to the
language L(R*) where L(R*) = (L(R))*
● If we apply any of the rules several times from 1 to 5, they are
Regular Expressions.

Unix Operator Extensions

Regular expressions are used frequently in Unix:


● In the command line

● Within text editors

● In the context of pattern matching programs such


as grep and egrep

Additional operators are recognized by unix. These operators are used for
convenience only.

● character classes: '[' <list of chars> ']'

● start of a line: '^'

● end of a line: '$'

● wildcard matching any character except newline: '.'

● optional instance: R? = epsilon | R

● one or more instances: R+ == RR*

1.6 Deterministic finite automata (DFA) and equivalence


with regular expressions
In DFA, for each input symbol, one can determine the state to which
the machine will move. Hence, it is called Deterministic Automaton. As it
has a finite number of states, the machine is called Deterministic Finite
Machine or Deterministic Finite Automaton.

An automaton can be represented by a 5-tuple (Q, ∑, δ, q , F), where −


0

● Q is a finite set of states.


● ∑ is a finite set of symbols, called the alphabet of the automaton.
● δ is the transition function.
● q is the initial state from where any input is processed (q ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
0 0

A Review (supplemental)
Recall the example of designing a vending machine selling 20-dolllar food
packs in Figure 1.3.2.What abstract concepts are involved in the
design? ⇒ See the definition next.
Figure Recall the example

Advantages and disadvantages

● DFAs were invented to model of a Turing machine, which was too


general to study properties of real world machines.

● DFAs are one of the most practical models of computation, since


there is a trivial linear time, constant-space, online algorithm there
are efficient algorithms to find a DFA recognizing:

● The union/intersection of the languages recognized by two given


DFAs.

● The complement of the language recognized by a given DFA.

● On the other hand, finite state automata are of strictly limited


power in the languages they can recognize; many simple languages,
including any problem that requires more than constant space to
solve, cannot be recognized by a DFA. • The classical example of a
simply described language that no DFA can recognize is bracket
language, i.e., language that consists of properly paired brackets
such as word "(( )( ))".

1.7 Non deterministic finite automata (NFA) and


equivalence with DFA

In NDFA, for a particular input symbol, the machine can move to any
combination of the states in the machine. In other words, the exact state
to which the machine moves cannot be determined. Hence, it is
called Non-deterministic Automaton. As it has finite number of states,
the machine is called Non-deterministic Finite Machine or Non-

deterministic Finite Automaton.


Q → Finite non-empty set of states.
∑ → Finite non-empty set of input symbols.
∂ → Transitional Function.
q0 → Beginning state.
F → Final State

● Review of a previous example of DFA

● Original version --- see Fig. (the same as Fig.)

Figure Step 1 For NDFA Example

A nondeterministic finite automaton (NFA) version of the above DFA ---


see Fig. 2.

● More intuitive!

● How to design an NFA will be described later.

Figure Step 1 For NDFA Example

Some properties of NFA’s (see Fig. 2.6 for the illustration) ---

Some transitions may “die,” like ˆ  (q2, 0).

Some transitions have multiple choices, like ˆ  (q0, 0) = q0 and q2.

Example --- Design an NFA accepting the following language L = {w |


w∈{0, 1}* and ends in 01}.
DFA vs NDFA

The following table lists the differences between DFA and NDFA.
DFA NDFA

The transition from a state is to a single The transition from a state can be to
particular next state for each input multiple next states for each input
symbol. Hence it is called deterministic. symbol. Hence it is called non-
deterministic.

Empty string transitions are not seen in NDFA permits empty string
DFA. transitions.

Backtracking is allowed in DFA In NDFA, backtracking is not always


possible.

Requires more space. Requires less space.

A string is accepted by a DFA, if it A string is accepted by a NDFA, if at


transits to a final state. least one of all possible transitions
ends in a final state.

1.8 Regular grammars and equivalence with finite


automata

Regular expressions and finite automata have equivalent expressive


power:

● For every regular expression R, there is a corresponding FA that


accepts the set of strings generated by R.

● For every FA A there is a corresponding regular expression that


generates the set of strings accepted by A.

The proof is in two parts:


 an algorithm that, given a regular expression R, produces an FA A such
that L(A) == L(R).
 an algorithm that, given an FA A, produces a regular expression R such
that L(R) == L(A).
Our construction of FA from regular expressions will allow "epsilon
transitions" (a transition from one state to another with epsilon as the
label). Such a transition is always possible, since epsilon (or the empty
string) can be said to exist between any two input symbols. We can show
that such epsilon transitions are a notational convenience; for every FA
with epsilon transitions there is a corresponding FA without them.

Constructing an FA from an RE

We begin by showing how to construct an FA for the operands in a regular


expression.

● If the operand is a character c, then our FA has two states, s0 (the


start state) and sF (the final, accepting state), and a transition from
s0 to sF with label c.

● If the operand is epsilon, then our FA has two states, s0 (the start
state) and sF (the final, accepting state), and an epsilon transition
from s0 to sF.

● If the operand is null, then our FA has two states, s0 (the start state)
and sF (the final, accepting state), and no transitions.

Given FA for R1 and R2, we now show how to build an FA for R1R2, R1|
R2, and R1*. Let A (with start state a0 and final state aF) be the machine
accepting L(R1) and B (with start state b0 and final state bF) be the
machine accepting L(R2).

● The machine C accepting L(R1R2) includes A and B, with start state


a0, final state bF, and an epsilon transition from aF to b0.

● The machine C accepting L(R1|R2) includes A and B, with a new


start state c0, a new final state cF, and epsilon transitions from c0 to
a0 and b0, and from aF and bF to cF.

● The machine C accepting L(R1*) includes A, with a new start state


c0, a new final state cF, and epsilon transitions from c0 to a0 and
cF, and from aF to a0, and from aF to cF.

Eliminating Epsilon Transitions

If epsilon transitions can be eliminated from an FA, then


construction of an FA from a regular expression can be completed.

Epsilon transitions offers a choice: It allows us to stay in a state or


move to a new state, regardless of the input symbol.
If starting in state s1, state s2 can be reached via a series of epsilon
transitions followed by a transition on input symbol x, replacement of the
epsilon transitions with a single transition from s1 to s2 on symbol x.

Algorithm for Eliminating Epsilon Transitions

A finite automaton F2 can be build with no epsilon transitions from a


finite automaton F1 as follows:
 The states of F2 are all the states of F1 that have an entering transition
labeled by some symbol other than epsilon, plus the start state of F1,
which is also the start state of F2.
 For each state in F1, determine which other states are reachable via
epsilon transitions only. If a state of F1 can reach a final state in F1 via
epsilon transitions, then the corresponding state is a final state in F2.

For each pair of states i and j in F2, there is a transition from state i to
state j on input x if there exists a state k that is reachable from state i via
epsilon transitions in F1, and there is a transition in F1 from state k to
state j on input x.

1.9 Properties of regular languages

Regular languages are closed under a wide variety of operations.

Union and intersection

Pick DFAs recognizing the two languages and use the cross-
product construction to build a DFA recognizing their union or
intersection. See Sipser Theorem 1.25. Also see Sipser 1.45
for another way to do union.

Set complement

Pick a DFA recognizing the language, then swap the


accept/non-accept markings on its states.

String reversal

Pick an NFA recognizing the language. Create a new final


state, with epsilon transitions to it from all the old final states.
Then swap the final and start states and reverse all the
transition arrows.
Set difference

Re-write set difference using a combination of intersection and


set complement

Concatenation and Star

Pick an NFA recognizing the language and modify it as


described in Sipser Theorems 1.47 and 1.49

Homomorphism

A homomorphism is a function from strings to strings. What


makes it a homomorphism is that its output on a multi-
character string is just the concatenation of its outputs on
each individual character in the string. Or, equivalently, h(xy)
= h(x)h(y) for any strings x and y. If S is a set of strings, then
h(S) is {w : w = h(x) for some x in S}.

To show that regular languages are closed under


homomorphism, choose an arbitrary regular language L and a
homomorphism h. It can be represented using a regular
expression R. But then h(R) is a regular expression
representing h(L). So h(L) must also be regular.

Notice that regular languages are not closed under the subset/superset
relation. For example, 0 1 is regular, but its subset {O 1 : n >= 0} is not
* * n n

regular, but its subset {01, 0011, 000111} is regular again.

1.10 Pumping lemma for regular languages,


minimization of finite automata

Lemma: The language = is not context free.

Proof (By contradiction)


Assuming that this language is context-free; hence it will have a context-
free grammar.
Let be the constant of the Pumping Lemma.
Considering the string , where is length greater than .

By the Pumping Lemma this is represented as , such that all


are also in , which is not possible, as:
either or cannot contain many letters from ; else they are in
the wrong order .

if or consists of a's, b's or c's, then cannot maintain the


balance amongst the three letters.

Lemma: The language = is not context free.

Proof (By contradiction)


Assuming that this language is context-free; hence it will have a context-
free grammar.

Let be the constant of the Pumping Lemma.


Considering the string , which is > .
By the Pumping Lemma this must be represented as , such that

all are also in .

-As mentioned previously neither nor may contain a mixture of


symbols.

-Suppose consists of a's.


Then there is no way cannot have b's and c's. It generate enough letters
to keep them more than that of the a's (it can do it for one or the other of
them, not both).
Similarly cannot consist of just a's.

-So suppose then that or contains only b's or only c's.

Consider the string which must be in . Since we have dropped


both and , we must have at least one b' or one c' less than we had
in , which was . Consequently, this string no longer has
enough of either b's or c's to be a member of .
Reference books

1. Harry R. Lewis and Christos H. Papadimitriou, Elements of the Theory of


Computation, Pearson Education Asia.

2. Dexter C. Kozen, Automata and Computability, Undergraduate Texts in


Computer Science, Springer.

3. Michael Sipser, Introduction to the Theory of Computation, PWS


Publishing.

4. John Martin, Introduction to Languages and the Theory of Computation,


Tata McGraw Hill.

You might also like