0% found this document useful (0 votes)

141 views35 pages

Cs8602 Unit 1 Eliminating Transitions

The document discusses converting an NFA with epsilon transitions to an equivalent NFA without epsilon transitions. It provides a 4-step method: 1) calculate the epsilon-closure of each state, 2) determine the delta-prime transitions between epsilon-closures based on input symbols, 3) repeat for all symbols, 4) build the transition table for the resulting NFA without epsilon transitions. An example conversion is provided and explained step-by-step.

Uploaded by

Sasi Balaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views35 pages

Cs8602 Unit 1 Eliminating Transitions

Uploaded by

Sasi Balaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

www.binils.

com for Anna University | Polytechnic and Schools

E liminating ε Transitions
NFA with ε can be converted to NFA without ε, and this NFA without ε can be convert d to DFA.
e To do
this, we will use a method, which can remove all the ε transition from given NFA. The method will be:

1. Findout all the εtransitions fromeach state from Q. Thatwill be called as ε-closure{q1}where qi ∈ Q.

2. Then δ' transitions can be btained. The δ' transitions mean a ε-closure on δ oves.
o symbol and each state of given NFA.
3. Repeat Step-2 for each input m

4. Using the resultant states, the transition table for equivalent NFA without ε can be built.

Example:

Convert the following NFA with ε toNFA without ε.

Solutions: We will first obtain ε-closures of q0, q1 and q2 as follows:

1. ε-closure(q0) = {q0}
2. ε-closure(q1) = {q1, q2}
3. ε-closure(q2) = {q2}

Now the δ' transition on each input symbol is obtained as:

1. δ'(q0, a) = ε-closure(δ(δ^(q0, ε),a))

2. = ε-closure(δ(ε-closure(q0),a))
3. = ε-closure(δ(q0, a))
4. = ε-closure(q1)
5. = {q1, q2}
6.
7. δ'(q0, b) = ε-closure(δ(δ^(q0, ε),b))
8. = ε-closure(δ(ε-closure(q0),b))

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

9. = ε-closure(δ(q0, b))
10. =Ф

Now the δ' transition on q1 is obtained as:

1. δ'(q1, a) = ε-closure(δ(δ^(q1, ε),a))

2. = ε-closure(δ(ε-closure(q1),a))
3. = ε-closure(δ(q1, q2), a)
4. = ε-closure(δ(q1, a) ∪ δ(q2, a))
5. = ε-closure(Ф ∪Ф)
6. =Ф
7.
8. δ'(q1, b) = ε-closure(δ(δ^(q1, ε),b))
9. = ε-closure(δ(ε-closure(q1),b))
10. = ε-closure(δ(q1, q2), b)
11. = ε-closure(δ(q1, b) ∪ δ(q2, b))
12. = ε-closure(Ф ∪q2)
13. = {q2}

The δ' transition on q2 is obtained as:

1. δ'(q2, a) = ε-closure(δ(δ^(q2, ε),a))

2. = ε-closure(δ(ε-closure(q2),a))
3. = ε-closure(δ(q2, a))
4. = ε-closure(Ф)
5. =Ф
6.
7. δ'(q2, b) = ε-closure(δ(δ^(q2, ε),b))
8. = ε-closure(δ(ε-closure(q2),b))
9. = ε-closure(δ(q2, b))
10. = ε-closure(q2)
11. = {q2}

Nowwewillsummarizeallthecomputedδ'transitions: 1.

δ'(q0, a) = {q0, q1}

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

2. δ'(q0, b) = Ф
3. δ'(q1, a) = Ф
4. δ'(q1, b) = {q2}
5. δ'(q2, a) = Ф
6. δ'(q2, b) = {q2}

The transition table can be:

States A B

→q0 {q1, q2} Ф

*q1 Ф {q2}

*q2 Ф {q2}

State q1 and q2 become the final state as ε-closure of q1 and q2 contain the final state q2. The NFA can be shown
by the following transition diagram:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

FINITE AUTOMATA
Finite automata can be represented by input tape and finite control.

Input tape: It is a linear tape havingg some number of cells. Each input symbol is placed in each cell.

Finite control: The finite control decides the next state on receiving particular input from input tape. The tape
reader reads the cells one by one from left to right, and at a time only one input symbol is read.

Types of Automata:

There are two types of finite autom

ata:

1. DFA(deterministic finite automata)

2. NFA(non-deterministic finite automata)

DFA

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In the
DFA, the machine goes to one state only for a particular input character. DFA does not accept the null move.

NFA

NFAstands for non-deterministic finite automata. Itis used totransmit any number ofstatesfor a
particular input. It can accept the null move.

Some important points about DFA and NFA:

1. Every DFA is NFA, but NFA is not DFA.

2. There can be multiple final states in both NFA and DFA.

3. DFA is used in Lexical Analysis in Compiler.

4. NFA is more of a theoretical concept.

Transition Diagram

A transition diagram or state transition diagram is a directed graph which can be constructed as follows:

o Thereisa nodeforeachstateinQ,whichisrepresentedbythe circle.

o There is a directed edge from node q to node p labeled a if δ(q, a) = p.

o In the start state, there is an arrow with no source.

o Accepting states or final states are indicating by a double circle.

Some Notations that are used in the transition diagram:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

There is a description of how a DFA operates:

1. In DFA, the input to the automata can be any string. Now, put a pointer to the start state q and read the
input string w from left to right and move the pointer according to the transition function, δ. We can read one
symbol at a time. If the next symbol of string w is a and the pointer is on state p, move the pointer to δ(p, a).
When the end of the input string w is encountered, then the pointer is on some state F.

2. The string w is said to be accepted by the DFA if r ∈ F that means the input string w is processed
successfully and the automata reached its final state. The string is said to be rejected by DFA if r ∉ F.

Example :

DFA with ∑ = {0, 1} accepts all strings starting with 1.

Solution:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

The finite automata can be represented using a transition graph. In the above diagram, the machine initially is
in start state q0 then on receiving input 1 the machine changes its state to q1. From q0 on receiving0,the
machinechanges itsstatetoq2,whichisthedeadstate.Fromq1onreceivinginput0,1
the machine changes its state to q 1 , which is the final state. The possible input strings that can be
generated are 10, 11, 110, 101, 11 1 ......., that means all string starts with 1.

Example :

NFA with ∑ = {0, 1} accepts all strings starting with 1.

Solution:

The NFAcanberepresentedusing a transitiongraph. Inthe abovediagram,themachine initiallyis in start state

q0 thenonreceivinginput 1 the machine changes its stateto q1. From q1onreceiving input
0, 1 the machine changes its state to q1. The possible input string that can be generat d is 10,
e 11, 110,
101, 111......, that means all string starts with 1.

Transition Table

The transitiontableis basically atabularrepresentationofthe transitionfunction. Ittakes two

arguments (a state and a symbol) and returns a state (the "next state").

A transition table is represented by the following things:

o Columns correspond to input symbols.

o Rows correspond to states.

o Entries correspond to the ext

n state.
Download Binils Android App in Playstore Download Photoplex App
www.binils.com for Anna University | Polytechnic and Schools

o The start state is denoted by an arrow with no source.

o The accept state is denoted by a star.

Example :

Solution:

Transition table of given DFA is as follows:

Present State Next state for Input 0 Next State of Input 1

→q0 q1 q2

q1 q0 q2

*q2 q2 q2

Explanation:

o In the above table, the first column indicates all the current states. Under column 0 and 1, the next
states are shown.

o The first row of the transition table can be read as, when the current state is q0, on input 0 the next
state will be q1 and on input 1 the next state will be q2.

o In the second row, when the current state is q1, on input 0, the next state will be q0, and on 1 input
the next state will be q2.

o In the third row, when the current state is q2 on input 0, the next state will be q2, and on 1 input
the next state will be q2.

o The arrow marked to q0 indicates that it is a start state and circle marked to q2 indicates that it is a
final state.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

DFA (Deterministic finite automata)

o DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the
computation. The finite automata are called deterministic finite automata if the machine is read an
input string one symbol at a time.

o In DFA, there is only one path for specific input from the current state to the next state.

o DFA does not accept the null move, i.e., the DFA cannot change state without any input
character.

o DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.

In the following diagram, we can see that from state q0 for input a, there is only one path which is going to q1.
Similarly, from q0, there is only one path for input b going to q2.

Formal Definition of DFA

A DFA is a collection of 5-tuples same as we described in the definition of FA.

1. Q: finite set of states

2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function

Transition function can be defined as:

1. δ: Q x ∑→Q

Graphical Representation of DFA

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

A DFA can be represented by digraphs called state diagram. In which:

1. The state is represented by vertices.

2. The arc labeled with an input character show the transitions.

3. The initial state is marked with an arrow.

4. The final state is denoted by a double circle.

Example :
1. Q = {q0, q1, q2}
2. ∑ = {0, 1}
3. q0 = {q0}
4. F = {q2}

Solution:

Transition Diagram:

Transition Table:

Present State Next state for Input 0 Next State of Input 1

→q0 q0 q1

q1 q2 q1

*q2 q2 q2

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

NFA (Non-Deterministic finite automata)

o NFA stands for non-deterministic finite automata. It is easy to construct an NFA than DFA for a given
regular language.

o The finiteautomataarecalledNFAwhenthereexistmanypaths forspecificinput fromthe

current state to the next state.

o Every NFA is not DFA, but each NFA can be translated into DFA.

o NFAisdefined inthesameway asDFAbutwith thefollowingtwoexceptions,itcontains

multiple next states, and it contains ε transition.

In the following image, we can see that from state q0 for input a, there are two next states q1 and q2,
similarly, from q0 for input b, the next states are q0 and q1. Thus it is not fixed or determined that with a
particular input where to go next. Hence this FA is called non-deterministic finite automata.

Formal definition of NFA:

NFA also has five states same as DFA, but with different transition function, as shown follows: δ: Q x ∑

→2Q

where,

1. Q: finite set of states

2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Graphical Representation of an NFA

An NFA can be represented by digraphs called state diagram. In which:

1. The state is represented by vertices.

2. The arc labeled with an input character show the transitions.

3. The initial state is marked with an arrow.

4. The final state is denoted by the double circle.

Example :
1. Q = {q0, q1, q2}
2. ∑ = {0, 1}
3. q0 = {q0}
4. F = {q2}

Solution:

Transition diagram:

Transition Table:

Present State Next statefor Input 0 Next State of Input 1

→q0 q0, q1 q1

q1 q2 q0

*q2 q2 q1, q2

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

INTRODUCTION TO COMPILERS
INTRODUCTION TO LANGUAGE PROCESSING:

As Computers became inevitable and indigenous part of human life, and several languages with
different and more advanced features are evolved into this stream to satisfy or comfort the user in
communicating with the machine , the development of the translators or mediator Software‘s have
become essential to fill the huge gap between the human and machine understanding. This process is called
Language Processing to reflect the goal and intent of the process. On the way to this process to understand it
in a better way, we have to be familiar with some key terms and concepts explained in following lines.

LANGUAGE TRANSLATORS :

Is a computer program which translates a program written in one (Source) language to its
equivalent program in other [Target]language. The Source program is a high level language where as the Target
language can be any thing from the machine language of a target machine (between Microprocessor to
Supercomputer) to another high level language program.
Two commonly Used Translators are Compiler and Interpreter
1. Compiler : Compilerisaprogram, readsprogram inonelanguage called Source Language and translates
in to its equivalent program in another Language called Target Language, in addition to this its presents
the error information to the User.

If the target program is an executable machine-language program, it can then be

called by the users to process inputs and produce outputs.
Input Output
Target Program

Figure: Running the target Program

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

2. Interpreter: An interpreter is another commonly used language processor. Instead of producing a target
program as a single translation unit, an interpreter appears to directly execute the operations specified in
the source program on inputs supplied by theuser.

Source Program
Input Interpreter Output

Figure : Running the target Program

LANGUAGE PROCESSING SYSTEM:

Basedontheinputthetranslatortakesandtheoutputitproduces,alanguagetranslatorcanbecalledas any one
of the following.
Preprocessor: Apreprocessor takes the skeletalsource program asinputandproduces anextended versionof
it,whichistheresultantofexpandingtheMacros,manifestconstantsifany,andincluding headerfilesetcin
thesourcefile.Forexample,theCpreprocessorisamacroprocessorthatisused automaticallybytheC
compilertotransformoursourcebeforeactualcompilation.Over andabove a preprocessor performs the
following activities:
• Collects all the modules, files in case ifthe source program is divided into different modules stored at
different files.
• Expands short hands / macros into source languagestatements.
Compiler: Is a translator that takes as input a source program written in high level language and converts it into its
equivalenttarget program inmachine language. Inadditionto above the compiler also
• Reports to its user the presence of errors in the source program.
• Facilitatesthe userinrectifyingtheerrors,andexecutethecode.
Assembler: Is a program that takes as input an assembly language program and converts it into its equivalent machine
language code.
Loader /Linker: This is aprogram that takes as input arelocatablecodeandcollects thelibrary functions,
relocatable object files, and produces its equivalent absolute machine code.
Specifically,
• Loading consists of taking the relocatable machine code, altering the relocatable addresses, and
placing the altered instructions and data in memory at the proper locations.
• Linking allows us to make a single program from several files of relocatable machine code. These
files may have been result of several different compilations, one or more may be library
routines provided by the system available to any program that needsthem.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

In addition to these translators, programs like interpreters, text formatters etc., may be used in
language processing system. To translate a program in a high level language program to an executable
one, the Compiler performs by default the compile and linking functions.
Normally the steps in a language processing system includes Preprocessing the skeletal Source program which
produces an extended or expanded source program or a ready to compile unit of the source
program, followed by compiling the resultant, then linking / loading , and finally its equivalent
executable code is produced. As I said earlier not all these steps are mandatory. In some cases, the
Compiler only performs this linking and loading functions implicitly.
The steps involved in a typical language processing system can be understood with following diagram.
Source Program

Preprocessor

Modified Source Program

Compiler

Target Assembly Program

Assembler

Relocatable Machine Code

Loader/Linker Library files Relocatable

Object files

Target Machine Code ]

Figure : Context of a Compiler in Language Processing System

TYPES OF COMPILERS:

Basedonthespecificinputittakesandtheoutputitproduces,theCompilerscanbeclassifiedintothe
following types;
Traditional Compilers(C, C++, Pascal): TheseCompilersconvertasourceprograminaHLLintoits equivalent
in native machine code or object code.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Interpreters(LISP, SNOBOL, Java1.0): These Compilers first convert Source code into intermediate code, and then
interprets (emulates) it to its equivalent machine code.
Cross-Compilers: These are the compilers that run on one machine and produce code for another
machine.
Incremental Compilers: These compilers separate the source into user defined–steps;
Compiling/recompiling step- by- step; interpreting steps in a given order
Converters (e.g. COBOL to C++): These Programs will be compiling from one high level language to another.
Just-In-Time (JIT) Compilers (Java, Micosoft.NET): These are the runtime compilers from intermediate language
(byte code, MSIL) to executable code or native machine code. These perform type –based verification
which makes the executable code more trustworthy
Ahead-of-Time (AOT) Compilers (e.g., .NET ngen): These are the pre-compilers to the native code for Java and
.NET
Binary Compilation: These compilers will be compiling object code of one platform into object code of another
platform.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

LEXICAL ANALYSIS
As the first phase of a compiler, the main task of the lexical analyzer is to read the input
characters of the source program, group them into lexemes, and produce as output tokens for each lexeme
in the source program. This stream of tokens is sent to the parser for syntax analysis. It is common for the
lexical analyzer to interact with the symbol table as well.

When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that
lexeme into the symbol table. This process is shown in the following figure.

Figure : Lexical Analyzer

. When lexical analyzer identifies the first token it will send it to the parser, the parser receives the
token and calls the lexical analyzer to send next token by issuing the getNextToken() command. This Process
continues until the lexical analyzer identifies all the tokens. During this process the lexical analyzer will
neglect or discard the white spaces and comment lines.

TOKENS, PATTERNS AND LEXEMES:

A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract
symbol representing a kind of lexical unit, e.g., a particular keyword, or a sequence of input characters
denoting an identifier. The token names are the input symbols that the parser processes. In what follows, we
shall generally write the name of a token in boldface. We will often refer to a token by its token name.

A pattern is a description of the form that the lexemes of a token may take [ or match]. In the case of a
keyword as a token, the pattern is just the sequence of characters that form the keyword. For identifiers and
some other tokens, the pattern is a more complex structure that is matched by many strings.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

A lexeme is a sequence of characters in the source program that matches the pattern for a token
and is identified by the lexical analyzer as an instance of that token.

Example: In the following C language statement , printf

("Total = %d\n‖, score) ;

both printf and score are lexemes matching the pattern for token id, and "Total = %d\n‖ is a lexeme matching literal
[or string].

Figure 1.7: Examples of Tokens

LEXICAL ANALYSIS Vs PARSING:

There are a number of reasons why the analysis portion of a compiler is normally separated into lexical analysis and
parsing (syntax analysis) phases.

1. Simplicity of design is the most important consideration. The separation of Lexical and Syntactic analysis
often allows us to simplify at least one of these tasks. For example, a parser that had to deal with
comments and whitespace as syntactic units would be considerably more complex than one that can
assume comments and whitespace have already been removed by the lexical analyzer.

2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply specialized techniques
that serve only the lexical task, not the job of parsing. In addition, specialized buffering
techniques for reading input characters can speed up the compiler significantly.

3. Compiler portability is enhanced: Input-device-specific peculiarities can be

restricted to the lexical analyzer.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

INPUT BUFFERING:

Before discussing the problem of recognizing lexemes in the input, let us examine some ways that the
simple but important task of reading the source program can be speeded. This task is made difficult by the
fact that we often have to look one or more characters beyond the next lexeme before we can be sure we
have the right lexeme. There are many situations where we need to look at least one additional character
ahead. For instance, we cannot be sure we've seen the end of an identifier until we see a character that is not a
letter or digit, and therefore is not part of the lexeme for id. In C, single- character operators like -, =, or <
could also be the beginning of a two-character operator like ->, ==, or <=. Thus, we shall introduce a two-
buffer scheme that handles large look aheads safely. We then consider an improvement involving
"sentinels" that saves time checking for the ends of buffers.

Buffer Pairs

Because of the amount of time taken to process characters and the large number of characters that must be
processed during the compilation of a large source program, specialized buffering techniques have been
developed to reduce the amount of overhead required to process a single input character. An important scheme
involves two buffers that are alternately reloaded.

Figure : Using a Pair of Input Buffers

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes. Using one
system read command we can read N characters in to a buffer, rather than using one system call per
character. If fewer than N characters remain in the input file, then a special character, represented by eof,
marks the end of the source file and is different from any possible character of the source program.

Two pointers to the input are maintained:

1. The Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are
attempting todetermine.

2. Pointer forward scans ahead until a pattern match is found; the exact strategy whereby this
determination is made will be covered in the balance of this chapter.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Once the next lexeme is determined, forward is set to the character at its right end. Then, after the
lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin is set to the character
immediately after the lexeme just found. In Fig, we see forward has passed the end of the next lexeme, **
(the FORTRANexponentiation operator), andmust be retractedone position to its left.

Advancing forward requires that we first test whether we have reached the end of one of the
buffers, and if so, we must reload the other buffer from the input, and move forward to the beginning of the
newly loaded buffer. As long as we never need to look so far ahead of the actual lexeme that the sum of the
lexeme's length plus the distance we look ahead is greater than N, we shall never overwrite the lexeme in its
buffer before determining it.

Sentinels To Improve Scanners Performance:

If we use the above scheme as described, we must check, each time we advance forward, that we have
not moved off one of the buffers; if we do, then we must also reload the other buffer. Thus, for each character
read, we make two tests: one for the end of the buffer, and one to determine what character is read (the
latter may be a multi way branch). We can combine the buffer-end test with the test for the current
character if we extend each buffer to hold a sentinel character at the end. The sentinel is a special
character that cannot be part of the source program, and a natural choice is the character eof. Figure 1.8
shows the same arrangement as Figure 1.7, but with the sentinels added. Note that eof retains its use as a
marker for the end of the entire input.

Figure : Sentential at the end of each buffer

Any eof that appears other than at the end of a buffer means that the input is at an end. Figure 1.9
summarizes the algorithm for advancing forward. Notice how the first test, whichcan be part of

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

a multiway branch based on the character pointed to by forward, is the only test we make, except in the case
where we actually are at the end of a buffer or the end of the input.

switch ( *forward++ )

case eof: if (forward is at end of first buffer )

reload second buffer;

forward = beginning of second buffer;

else if (forward is at end of second buffer )

}
reload first buffer;

forward = beginning of first buffer;

else /* eof within a buffer marks the end of input */

terminate lexical analysis;

break;

Figure: use of switch-case for the sentential

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

PHASES OF A COMPILER

To reduce the complexity of compilation task, a Compiler typically proceeds in a Sequence

of compilation phases. The phases communicate with each other via clearly defined
interfaces. Generally an interface contains a Data structure (e.g., tree), Set of exported
functions. Each phase works on an abstract intermediate representation of the source program,
not the source program text itself (except the first phase)

Compiler Phases are the individual modules which are chronologically executed to
perform their respective Sub-activities, and finally integrate the solutions to give target code.

It is desirable to have relatively few phases, since it takes time to read and write
immediate files. Following diagram depicts the phases of a compiler through which it goes during
the compilation. There fore a typical Compiler is having the following Phases:

1. Lexical Analyzer (Scanner),

2. 2. Syntax Analyzer(Parser),
3. 3.Semantic Analyzer,
4. 4.Intermediate Code Generator(ICG),
5. 5.Code Optimizer(CO) ,and
6. 6.Code Generator(CG)

In addition to these, it also has Symbol table management, and Error handler phases. Not all the
phases are mandatory in every Compiler. e.g, Code Optimizer phase is optional in some cases.
The description is given in next section.

The Phases ofcompilerdivided into two parts, first three phases we are called as
Analysis part remaining three called as Synthesis part.

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Figure : Phases of a Compiler

PHASE, PASSES OF A COMPILER:

In some application we can have a compiler that is organized into what is called
passes. Where a pass is a collection of phases that convert the input from one
representation to a completely deferent representation. Each pass makes a complete scan of
the input and produces its output to be processed by the subsequent pass. For example a two
pass Assembler.

THE FRONT-END & BACK-END OF A COMPILER

All of these phases of a general Compiler are conceptually divided into The Front-

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

end, and The Back-end. This division is due to their dependence on either the Source
Language or the Target machine. This model is called an Analysis & Synthesis model of a
compiler.
The Front-end of the compiler consists of phases that depend primarily on the Source
language and are largely independent on the target machine. For example, front- end of the
compiler includes Scanner, Parser, Creation of Symbol table, Semantic Analyzer, and the
Intermediate Code Generator.

The Back-end of the compiler consists of phases that depend on the target
machine, and those portions don‘t dependent on the Source language, just the
Intermediate language. In this we have different aspects of Code Optimization phase, code
generation along with the necessary Error handling, and Symbol table operations.

LEXICAL ANALYZER (SCANNER): The Scanner is the first phase that works as interface between
the compiler and the Source language program and performs the following functions:

o Reads the characters in the Source program and groups them into a stream of tokens
in which each token specifies a logically cohesive sequence of characters, such as an
identifier , a Keyword , a punctuationmark, a multi character operator like := .

o The character sequence forming a token is called a lexeme of the token.

o The Scanner generates a token-id, and also enters that identifiers name in the
Symbol table if it doesn‘t exist.

o Also removes the Comments, and unnecessary spaces.

The format of the token is < Token name, Attribute value>

SYNTAX ANALYZER (PARSER): The Parser interacts with the Scanner, and its subsequent phase
Semantic Analyzer and performs the following functions:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

o Groups the above received, and recorded token stream into syntactic
structures, usually into a structure called Parse Tree whose leaves are tokens.

o The interior node of this tree represents the stream of tokens that logically
belongs together.

o It means it checks the syntax of program elements.

SEMANTIC ANALYZER: This phase receives the syntax tree as input, and
checks the semantically correctness of the program. Though the tokens are valid and
syntactically correct, it may happen that they are not correct semantically. Therefore the
semantic analyzer checks the semantics (meaning) of the statements formed.

The Syntactically and Semantically correct structures are produced here in the form of
a Syntax tree or DAG or some other sequential representation like matrix.

INTERMEDIATE CODE GENERATOR(ICG): This phase takes the syntactically and semantically
correct structure as input, and produces its equivalent intermediate notation of the source
program. The Intermediate Code should have two important properties specified below:

o It should be easy to produce,and Easy to translate into the target program.

Example intermediate code forms are:

o Three address codes,

o Polish notations, etc.

CODE OPTIMIZER: This phase is optional in some Compilers, but so useful and beneficial in terms
of saving development time, effort, and cost. This phase performs the following specific
functions:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

o Attempts to improve the IC so as to have a faster machine code. Typical functions

include –Loop Optimization, Removal of redundant computations, Strength
reduction, Frequency reductions etc.
o Sometimes the data structures used in representing the intermediate forms may also
be changed.

CODE GENERATOR: This is the final phase of the compiler and generates the target code, normally
consisting of the relocatable machine code or Assembly code or absolute machine code.

o Memory locations are selected for each variable used, and assignment of
variables to registers is done.

o Intermediate instructions are translated into a sequence of machine instructions.

The Compiler also performs the Symbol table management and Error handling
throughout the compilation process. Symbol table is nothing but a data structure that
stores different source language constructs, and tokens generated during the
compilation. These two interact with all phases of the Compiler.

For example the source program is an assignment statement; the following figure shows how
the phases of compiler will process the program.

The input source program is Position=initial+rate*60

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Regular Expression

o The language accepted by finite automata can be easily described by simple expressions called
Regular Expressions. It is the most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching
algorithm used this pattern to find the operations on a string.

For instance:

In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx, xxx, xxxx, }

In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, . }

Operations on Regular Language

The various operations on regular language are:

Union: If L and M are two regular languages then their union L U M is also a union.

L U M = {s | s is in L or s is in M}

Intersection: If L and M are two regular languages then their intersection is also an intersection.

L ⋂ M = {st | s is in L and t is in M}

Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular language.

L* = Zero or more occurrence of language L.

Example :

Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}

Solution:

All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times,
that means a null string. That is we expect the set of {ε, a, aa, aaa, }. So we give a regular expression
for
this as:

1. R = a*
Download Binils Android App in Playstore Download Photoplex App
www.binils.com for Anna University | Polytechnic and Schools

That is Kleen closure of a.

Example :

Write the regular expression for the language accepting all combinations of a's except the null string,
over the set ∑ = {a}

Solution:

The regular expression has to be built for the language

1. L = {a, aa, aaa, .. }

This set indicates that there is no null string. So we can denote regular expression as:

R = a+
Example 3:

Write the regular expression for the language accepting all the string containing any number of a's and
b's.

Solution:

The regular expression will be:

1. r.e. = (a + b)*

This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, . }, any combination of a and b.

The (a + b)* shows any combination with a and b even a null string.

Examples of Regular Expression

Example :

Write the regular expression for the language accepting all the string which are starting with 1 and
ending with 0, over ∑ = {0, 1}.

Solution:

In a regular expression, the first symbol should be 1, and the last symbol should be 0. The r.e. is
as follows:

1. R = 1 (0+1)* 0

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Example :

Write the regular expression for the language starting and ending with a and having any having
any combination of b's in between.

Solution:

The regular expression will be:

1. R = a b* b
Example :

Write the regular expression for the language starting with a but not having consecutive b's.

Solution: The regular expression has to be built for the language:

1. L = {a, aba, aab, aba, aaa, abab, .. }

The regular expression for the above language is:

1. R = {a + ab}*
Example :

Write the regular expression for the language accepting all the string in which any number of a's is
followed by any number of b's is followed by any number of c's.

Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's
means c*. Since as given in problem statement, b's appear after a's and c's appear after b's. So
the regular expression could be:

Conversion of RE to FA

To convert the RE to FA, we are going to use a method called the subset method. This method is used to
obtain FA from the given regular expression. This method is given below:

Step 1: Design a transition diagram for given regular expression, using NFA with ε moves.

Step 2: Convert this NFA with ε to NFA without ε.

Step 3: Convert the obtained NFA to equivalent DFA.

Example :

Design a FA from given regular expression 10 + (0 + 11)0* 1.

Download Binils Android App in Playstore Download Photoplex App
www.binils.com for Anna University | Polytechnic and Schools

Solution: First we will construct th e transition diagram for a given regular expression.

Step 1:

Step 2:

Step 3:

Step 4:

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

Step 5:

Now we have got NFA without ε. Now we will convert it into required DFA for that, we will first write
a transition table for this NFA.

State 0 1

→q0 q3 {q1, q2}

q1 qf ϕ

q2 ϕ q3

q3 q3 qf

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

*qf ϕ ϕ

The equivalent DFA will be:

State 0 1

→[q0] [q3] [q1, q2]

[q1] [qf] ϕ

[q2] Φ [q3]

[q3] [q3] [qf]

[q1, q2] [qf] [qf]

*[qf] Φ ϕ

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

SPECIFICATION OF TOKENS
Regular expressions are an important notation for specifying lexeme patterns. While they
cannot express all possible patterns, they are very effective in specifying those types of
patterns that we actually need for tokens.

LEX the Lexical Analyzer generator

Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool is
referred to as the Lex language and the tool itself is the Lex compiler. Behind the scenes,
the Lex compiler transforms the input patterns into a transition diagram and generates
code, in a file called lex .yy .c, it is a c program given for C Compiler, gives the Object code. Here
we need to know how to write the Lex language. The structure of the Lex program is given
below.
Structure of LEX Program : A Lex program has the following form:

Declarations
%%
Translation rules
%%
Auxiliary functions definitions
The declarations section : includes declarations of variables, manifest constants
(identifiers declared to stand for a constant, e.g., the name of a token), and regular
definitions. It appears between %{. . .%}

In the Translation rules section, We place Pattern Action pairs where each pair have the

form Pattern {Action}

The auxiliary function definitions section includes the definitions of functions used to install
identifiers and numbers in the Symbol tale.

LEX Program Example:

/* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN, ELSE,ID, NUMBER, RELOP */

/* regular definitions */

Download Binils Android App in Playstore Download Photoplex App

www.binils.com for Anna University | Polytechnic and Schools

delim [ \t\n]

ws { delim}+

letter [A-Za-z]

digit [o-91

id {letter} ({letter} | {digit}) *

number {digit}+ (\ . {digit}+)? (E [+-I]?{digit}+)?

{ws} {/* no action and no return */}

if {return(1F) ; }

then {return(THEN) ; }

else {return(ELSE) ; }

(id) {yylval = (int) installID(); return(1D);}

(number) {yylval = (int) installNum() ; return(NUMBER) ; }

‖< ‖ {yylval = LT; return(REL0P) ; )}

―<=‖ {yylval = LE; return(REL0P) ; }

―=‖ {yylval = EQ ; return(REL0P) ; }

―<>‖ {yylval = NE; return(REL0P);}

―<‖ {yylval = GT; return(REL0P);)}

―<=‖ {yylval = GE; return(REL0P);}

int installID0() {/* function to install the lexeme, whose first character is pointed to by

yytext, and whose length is yyleng, into the symbol table and

return a pointer thereto */

int installNum() {/* similar to installID, but puts numerical constants into a separate table */}

Figure : Lex Program for tokens common tokens

Download Binils Android App in Playstore Download Photoplex App

TOC Notes
No ratings yet
TOC Notes
104 pages
Non-Deterministic Finite Automata
No ratings yet
Non-Deterministic Finite Automata
36 pages
Automata Theory Chapter-Finite Automata
No ratings yet
Automata Theory Chapter-Finite Automata
28 pages
FAFL
No ratings yet
FAFL
136 pages
Toc Unit-1 Part-2
No ratings yet
Toc Unit-1 Part-2
23 pages
Difference Between DFA NFA - NFA Vs DFA Automata - Engineer's Portal
No ratings yet
Difference Between DFA NFA - NFA Vs DFA Automata - Engineer's Portal
3 pages
Module 4
No ratings yet
Module 4
55 pages
CS8501 Theory of Computation Important Question
No ratings yet
CS8501 Theory of Computation Important Question
18 pages
Automata Stud
No ratings yet
Automata Stud
240 pages
Finite State Machine: 2.1 Automata
No ratings yet
Finite State Machine: 2.1 Automata
65 pages
Week-12 Assignment Solution
No ratings yet
Week-12 Assignment Solution
4 pages
Mealy & Moore Machine: Rao Wakeel Ahmad
No ratings yet
Mealy & Moore Machine: Rao Wakeel Ahmad
26 pages
Questions-Fundamentals of Data Structures-Pointers-MCQ-Set 2-20240624-0010
No ratings yet
Questions-Fundamentals of Data Structures-Pointers-MCQ-Set 2-20240624-0010
3 pages
Test Your C-Skills (Yashwant Kanitkar)
No ratings yet
Test Your C-Skills (Yashwant Kanitkar)
144 pages
TOC - Question Answer
No ratings yet
TOC - Question Answer
41 pages
Digital Electronics
No ratings yet
Digital Electronics
13 pages
Pushdown Automata (PDA) : Theory of Computation
No ratings yet
Pushdown Automata (PDA) : Theory of Computation
39 pages
Arden's Theorem, Moore Machine, Mealy Machine
No ratings yet
Arden's Theorem, Moore Machine, Mealy Machine
31 pages
Automata Theory Solved Mcqs
No ratings yet
Automata Theory Solved Mcqs
18 pages
Turning Machine: Unit-V
No ratings yet
Turning Machine: Unit-V
23 pages
Fiitjee: JEE (Advanced), 2017
No ratings yet
Fiitjee: JEE (Advanced), 2017
17 pages
Problem Solving in C January To September 2019
100% (2)
Problem Solving in C January To September 2019
73 pages
Theory of Computation - Context Free Languages
No ratings yet
Theory of Computation - Context Free Languages
25 pages
Raspinder Kaur Tableau Lab Manual
No ratings yet
Raspinder Kaur Tableau Lab Manual
44 pages
WT Lab Manual PDF
67% (3)
WT Lab Manual PDF
125 pages
A Presentation ON Nfa & Dfa: Aman Kumar C.S.E 4 Sem
No ratings yet
A Presentation ON Nfa & Dfa: Aman Kumar C.S.E 4 Sem
17 pages
Assumptions Are Made Regarding The Transition System:: Arden's Theorem
No ratings yet
Assumptions Are Made Regarding The Transition System:: Arden's Theorem
4 pages
365 39 Solutions Instructor Manual Chapter 6 Pushdown Automata
No ratings yet
365 39 Solutions Instructor Manual Chapter 6 Pushdown Automata
8 pages
Toc NOTES - 2
No ratings yet
Toc NOTES - 2
0 pages
Question Solved TCS
No ratings yet
Question Solved TCS
15 pages
Notes For C Programming
100% (1)
Notes For C Programming
476 pages
Class 11 converting NFA with ε to DFA
No ratings yet
Class 11 converting NFA with ε to DFA
9 pages
Assignment-12 Solution July 2019
No ratings yet
Assignment-12 Solution July 2019
6 pages
C Array
100% (1)
C Array
17 pages
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
No ratings yet
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
4 pages
Unit 2 TOC ODL
No ratings yet
Unit 2 TOC ODL
42 pages
Design Thinking
No ratings yet
Design Thinking
10 pages
Finite Automata: Basic Terminologies
100% (1)
Finite Automata: Basic Terminologies
12 pages
Practice Questions Logic
No ratings yet
Practice Questions Logic
15 pages
Volume 2
100% (1)
Volume 2
696 pages
Web Technology and Its Application 18CS63
No ratings yet
Web Technology and Its Application 18CS63
63 pages
Toc PPT 1
No ratings yet
Toc PPT 1
13 pages
Extraprobsolns
No ratings yet
Extraprobsolns
151 pages
DFA - Deterministic Finite Automata
No ratings yet
DFA - Deterministic Finite Automata
8 pages
1.2.b. Stack ADT
No ratings yet
1.2.b. Stack ADT
13 pages
Nongatebook
100% (1)
Nongatebook
2,245 pages
Simplification of CFG: Presented To Presented by
100% (2)
Simplification of CFG: Presented To Presented by
12 pages
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
No ratings yet
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
13 pages
Complete C Questions AND ANSWERS11 PDF
67% (3)
Complete C Questions AND ANSWERS11 PDF
92 pages
Unit-3 ATCD
No ratings yet
Unit-3 ATCD
36 pages
Epsilon NFA Into NFA Into DFA
No ratings yet
Epsilon NFA Into NFA Into DFA
18 pages
Data Structure4
No ratings yet
Data Structure4
6 pages
FLAT
No ratings yet
FLAT
85 pages
Hamilton Cycle
No ratings yet
Hamilton Cycle
7 pages
DSPD Cse-4 Course File 09-10
No ratings yet
DSPD Cse-4 Course File 09-10
34 pages
ATCD All Units Basics
No ratings yet
ATCD All Units Basics
38 pages
BCS313 L6 Epsilon in NFAs
No ratings yet
BCS313 L6 Epsilon in NFAs
9 pages
Finite Automata: Formal Definition of FA
No ratings yet
Finite Automata: Formal Definition of FA
39 pages
Unit 01 - Part 3
No ratings yet
Unit 01 - Part 3
18 pages
Slide#03 Week #6 - TAFL
No ratings yet
Slide#03 Week #6 - TAFL
28 pages
DocScanner 23-Jun-2023 9-08 PM
No ratings yet
DocScanner 23-Jun-2023 9-08 PM
28 pages
Ma8151
No ratings yet
Ma8151
23 pages
CS8251
No ratings yet
CS8251
31 pages
Anna-Oru Nokku
No ratings yet
Anna-Oru Nokku
79 pages
Cy8151
No ratings yet
Cy8151
31 pages
CS8251
No ratings yet
CS8251
29 pages
Tmpoi440qu4 Decrypted
No ratings yet
Tmpoi440qu4 Decrypted
37 pages
DocScanner 23-Jun-2023 9-23 PM
No ratings yet
DocScanner 23-Jun-2023 9-23 PM
29 pages
CN Assignment
No ratings yet
CN Assignment
20 pages
ANT R17 NovDec 19
No ratings yet
ANT R17 NovDec 19
2 pages
DocScanner 23-Jun-2023 9-16 PM
No ratings yet
DocScanner 23-Jun-2023 9-16 PM
19 pages
Ant Internal I 2022 Important Question
No ratings yet
Ant Internal I 2022 Important Question
1 page
DocScanner 23-Jun-2023 9-03 PM
No ratings yet
DocScanner 23-Jun-2023 9-03 PM
23 pages
ANT Unit-1 Groups & Rings (First Half)
No ratings yet
ANT Unit-1 Groups & Rings (First Half)
23 pages
GE8073 Unit 3
No ratings yet
GE8073 Unit 3
32 pages
GE8073 Unit 1
No ratings yet
GE8073 Unit 1
22 pages
MCQ On Mathematics 5eea6a1439140f30f369f0d4
No ratings yet
MCQ On Mathematics 5eea6a1439140f30f369f0d4
21 pages
CS8603 UNIT 2 Asynchronous Execution With Synchronous Communication
No ratings yet
CS8603 UNIT 2 Asynchronous Execution With Synchronous Communication
27 pages
Ant Solved Uqp Dec 2019
No ratings yet
Ant Solved Uqp Dec 2019
31 pages
CS8501 R17 NovDec 20
No ratings yet
CS8501 R17 NovDec 20
2 pages
Cs8602 Unit 4 Access To Nonlocal Data On The Stack
No ratings yet
Cs8602 Unit 4 Access To Nonlocal Data On The Stack
15 pages
MA8351 Discrete Mathematics Notes 2
No ratings yet
MA8351 Discrete Mathematics Notes 2
187 pages
Tamil Font Catalog Modular Infotech Pvt. Ltd. 1: 800 Regular (800)
No ratings yet
Tamil Font Catalog Modular Infotech Pvt. Ltd. 1: 800 Regular (800)
30 pages
Week 02 Assignment
No ratings yet
Week 02 Assignment
6 pages
English Symbols
No ratings yet
English Symbols
84 pages
Cs8383 - Oop Lab Manul Final
No ratings yet
Cs8383 - Oop Lab Manul Final
96 pages
094 - MA8351, MA6566 Discrete Mathematics - Question Bank 3
No ratings yet
094 - MA8351, MA6566 Discrete Mathematics - Question Bank 3
6 pages
094 - MA8351, MA6566 Discrete Mathematics - Important Questions
No ratings yet
094 - MA8351, MA6566 Discrete Mathematics - Important Questions
39 pages
Week 01 Assignment
No ratings yet
Week 01 Assignment
5 pages
Unit 1
No ratings yet
Unit 1
23 pages
Information Bulletin 2024 - Admission To Self Financed PG Programs
No ratings yet
Information Bulletin 2024 - Admission To Self Financed PG Programs
15 pages
SEZ Online Manual DTA Procurement With Export Benefit
No ratings yet
SEZ Online Manual DTA Procurement With Export Benefit
25 pages
Sneha Sabu Project Report
No ratings yet
Sneha Sabu Project Report
81 pages
DSD Module 2
No ratings yet
DSD Module 2
118 pages
Digiloker Project
No ratings yet
Digiloker Project
9 pages
A Jazz Improvisation Primer
No ratings yet
A Jazz Improvisation Primer
2 pages
Hortonworks Spark ODBC Driver User Guide
No ratings yet
Hortonworks Spark ODBC Driver User Guide
96 pages
2021 Threat Hunting Report
0% (1)
2021 Threat Hunting Report
20 pages
Infinite Algebra 1 - Continuous Relations With Answer Keys
100% (1)
Infinite Algebra 1 - Continuous Relations With Answer Keys
4 pages
DA Workshop - Partner Deck
No ratings yet
DA Workshop - Partner Deck
14 pages
Static Routing Packet Tracer
No ratings yet
Static Routing Packet Tracer
4 pages
18BCS47S U2
No ratings yet
18BCS47S U2
118 pages
Mipi I3C Basic Specification v1 1 1
No ratings yet
Mipi I3C Basic Specification v1 1 1
446 pages
Buy The Complete Version of This Book At: The Organized Writer: 30 Days To More Time, More Money and Less Frustration
No ratings yet
Buy The Complete Version of This Book At: The Organized Writer: 30 Days To More Time, More Money and Less Frustration
14 pages
5G Core 21.1 Session Management ISSUE 1.0
100% (1)
5G Core 21.1 Session Management ISSUE 1.0
59 pages
Lab 4 DBS - 082759
No ratings yet
Lab 4 DBS - 082759
8 pages
Terms FTMO
No ratings yet
Terms FTMO
13 pages
Tanawit Mil
No ratings yet
Tanawit Mil
18 pages
Artificial Intelligence Question Bank 2021
100% (1)
Artificial Intelligence Question Bank 2021
28 pages
Ume Ingepac Da PT Eng
No ratings yet
Ume Ingepac Da PT Eng
610 pages
Project
No ratings yet
Project
2 pages
Creating A Custom Cursor in Flash 8
No ratings yet
Creating A Custom Cursor in Flash 8
2 pages
It
No ratings yet
It
12 pages
Compendium Training 421 Ckad Certified Kubernetes Application Developer
No ratings yet
Compendium Training 421 Ckad Certified Kubernetes Application Developer
3 pages
Classification of Computer
No ratings yet
Classification of Computer
8 pages
Setting Up Units and Drawing Limits in AutoCad
No ratings yet
Setting Up Units and Drawing Limits in AutoCad
4 pages
1.) Autonomy
No ratings yet
1.) Autonomy
1 page
Manage Multiple Currencies: User Permissions Needed
No ratings yet
Manage Multiple Currencies: User Permissions Needed
5 pages
UNIT 5 Notes For Cyber Security English
No ratings yet
UNIT 5 Notes For Cyber Security English
11 pages