0% found this document useful (0 votes)
8 views49 pages

compiler design

The document outlines the syllabus and course objectives for the Compiler Design course (CS3501) at Dr. Navalar Nedunchezhiyan College of Engineering. It includes details on the various units covering topics such as lexical analysis, syntax analysis, intermediate code generation, runtime environment, and code optimization. Additionally, it provides a lecture plan, important questions, and recommended textbooks for the course.

Uploaded by

nnce ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views49 pages

compiler design

The document outlines the syllabus and course objectives for the Compiler Design course (CS3501) at Dr. Navalar Nedunchezhiyan College of Engineering. It includes details on the various units covering topics such as lexical analysis, syntax analysis, intermediate code generation, runtime environment, and code optimization. Additionally, it provides a lecture plan, important questions, and recommended textbooks for the course.

Uploaded by

nnce ece
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 49

DR.

NNCE III / 05 CD - QB

CS3501 COMPILER DESIGN

III YEAR / V SEMESTER – B.E – CSE

UNIT – I

INTRODUCTION TO COMPILERS & LEXICAL ANALYSIS

PREPARED BY

M.VIJI M.E AP/CSE

VERIFIED BY

HOD PRINCIPAL CEO/DIRECTOR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

1
DR.NNCE III / 05 CD - QB

DR.NAVALAR NEDUNCHEZHIYAN COLLEGE OF

ENGINEERING

ANNA UNIVERSITY CHENNAI

NON AUTANOMOUS AFFILATED INSTITUTIONS

BE COMPUTER SCIENCE AND ENGINEERING

CREDIT BASED CHOICE SYSTEMS

REGULATION 2021
PERIODS PER TOTAL
S. COURS CATE
COURSE TITLE WEEK CONTACT CREDITS
N E GORY
O. L T P PERIODS
CODE
THEORY
1. CS3591 Computer Networks PCC 3 0 2 5 4
2. CS3501 Compiler Design PCC 3 0 2 5 4
3. Cryptography and Cyber
CB3491 PCC 3 0 0 3 3
Security
4. CS3551 Distributed Computing PCC 3 0 0 3 3
5. Professional Elective I PEC - - - - 3
6. Professional Elective II PEC - - - - 3
7. Mandatory Course-I& MC 3 0 0 3 0
TOTAL - - - - 20

2
DR.NNCE III / 05 CD - QB

SEMESTER V

SYLLABUS

CS3501 COMPILER DESIGN LTPC3024


COURSE OBJECTIVES:

To learn the various phases of compiler.

To learn the various parsing techniques.

To understand intermediate code generation and run-time environment.

To learn to implement the front-end of the compiler.

To learn to implement code generator.

To learn to implement code optimization.

UNIT I INTRODUCTION TO COMPILERS & LEXICAL ANALYSIS 8

Introduction- Translators- Compilation and Interpretation- Language processors -The Phases of


Compiler – Lexical Analysis – Role of Lexical Analyzer – Input Buffering – Specification of
Tokens

Recognition of Tokens – Finite Automata – Regular Expressions to Automata NFA, DFA –


Minimizing DFA - Language for Specifying Lexical Analyzers – Lex tool.

UNIT II SYNTAX ANALYSIS 11

Role of Parser – Grammars – Context-free grammars – Writing a grammar Top Down Parsing -
General Strategies - Recursive Descent Parser Predictive Parser-LL(1) - Parser-Shift Reduce
Parser-LR Parser- LR (0)Item Construction of SLR Parsing Table - Introduction to LALR Parser
- Error Handling and Recovery in Syntax Analyzer-YACC tool - Design of a syntax Analyzer for
a Sample Language

UNIT III SYNTAX DIRECTED TRANSLATION & INTERMEDIATE CODE


GENERATION 9

Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-Attribute


Definitions- Design of predictive translator - Type Systems-Specification of a simple type

3
DR.NNCE III / 05 CD - QB

Checker- Equivalence of Type Expressions-Type Conversions. Intermediate Languages: Syntax


Tree, Three Address Code, Types and Declarations, Translation of Expressions, Type Checking,
Back patching.

UNIT IV RUN-TIME ENVIRONMENT AND CODE GENERATION 9

Runtime Environments – source language issues – Storage organization – Storage Allocation


Strategies: Static, Stack and Heap allocation - Parameter Passing-Symbol Tables - Dynamic
Storage Allocation - Issues in the Design of a code generator – Basic Blocks and Flow graphs -
Design of a simple Code Generator - Optimal Code Generation for Expressions– Dynamic
Programming Code Generation.

UNIT V CODE OPTIMIZATION 8

Principal Sources of Optimization – Peep-hole optimization - DAG- Optimization of Basic


Blocks - Global Data Flow Analysis - Efficient Data Flow Algorithm – Recent trends in
Compiler Design.

COURSE OUTCOME

CO1: Understand the vtechniques used in different phases of compile

CO2:Design a lexical analyser for a sample language and learn to use the LEX tool.

CO3:Apply different parsing algorithms to develop a parser and learn to use YACC tool

CO4:Understand semantics rules (SDT), intermediate code generation and run-time


environment.

CO5:Implement code generation and apply code optimization techniques.

TEXT BOOK:

1. Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, “Compilers: Principles,
Techniques and Tools”, Second Edition, Pearson Education, 2009.

REFERENCES

4
DR.NNCE III / 05 CD - QB

Randy Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence
based Approach, Morgan Kaufmann Publishers, 2002.

Steven S. Muchnick, Advanced Compiler Design and Implementation‖, Morgan Kaufmann


Publishers - Elsevier Science, India, Indian Reprint 2003.

Keith D Cooper and Linda Torczon, Engineering a Compiler‖, Morgan Kaufmann Publishers
Elsevier Science, 2004.

V. Raghavan, Principles of Compiler Design‖, Tata McGraw Hill Education Publishers, 2010.

Allen I. Holub, Compiler Design in C‖, Prentice-Hall Software Series, 1993

5
DR.NNCE III / 05 CD - QB

DR.NAVALAR NEDUNCHEZHIYAN COLLEGE OF ENGINEERING


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LECTURE PLAN

Subject Code : CS3501

Subject Name : COMPILER DESIGN

Name of the faculty : Mrs.M.VIJI

Designation : ASSISTANT PROFESSOR

Course : BE-CSE

Academic Year : 2023-2024

Recommeneded Text books/Reference Books

S.N Title of the book Auth Referen


o or ce

1 Compilers: Principles, Alfred V. Aho, T1


Techniques and Tools Monica S. Lam, Ravi
Sethi, Jeffrey D.
Ullman

2 Optimizing Compilers for Randy Allen, Ken R1


Modern Architectures: A Kennedy
Dependence based
Approach

6
DR.NNCE III / 05 CD - QB

PERIODS
HOURS
NUMBER TEACHING
S.NO TOPICS PLANNED REMARKS
OF AID
(L/T)
HOURS

UNIT I - INTRODUCTION TO COMPILERS AND LEXICAL ANALYSIS

1 1 Structure of a compiler L BB

2 1 Lexical Analysis L BB

3 1 Role of Lexical Analyzer L BB

4 1 Input Buffering L BB

5 1 Specification of Tokens L BB

6 1 Recognition of Tokens L BB

7 1 Regular Expressions to Automata L BB

8 1 Minimizing DFA L BB

7
DR.NNCE III / 05 CD - QB

9 1 LEX TOOL L BB

SUBJECT INCHARGE HOD

PERIODS
HOURS
NUMBER TEACHING
S.NO TOPICS PLANNED REMARKS
OF AID
(L/T)
HOURS

UNIT II - SYNTAX ANALYSIS

10 1 Role of Parsers L BB

11 1 Grammars L BB

12 1 Context-free grammar L BB

13 1 Top Down Parsing L BB

General Strategies Recursive Descent Parser


14 1 L BB
Predictive Parser

15 1 LL(1) Parser-Shift Reduce Parser L BB

LR Parser-LR (0)Item Construction of SLR


16 1 L BB
Parsing Table

8
DR.NNCE III / 05 CD - QB

17 1 CANONICAL LR PARSING L BB

18 1 Introduction to LALR Parser L BB

Error Handling and Recovery in Syntax


19 1 L BB
AnalyzeR

20 1 YACC L BB

21 1 L BB
Design of a syntax Analyzer for a Sample
Language

SUBJECT INCHARGE HOD

PERIODS
HOURS
NUMBER TEACHING
S.NO TOPICS PLANNED REMARKS
OF AID
(L/T)
HOURS

UNIT III - SYNTAX DIRECTED TRANSLATION AND INTERMEDIATE CODE GENERATION

22 1 Syntax Directed Definitions L BB

23 1 Evaluation Orders for Syntax Directed Definitions L BB

24 1 Intermediate Languages , Three address code L BB

9
DR.NNCE III / 05 CD - QB

25 1 Syntax Tree L BB

26 1 Three Address Code L BB

27 1 Equivalence of Type Expressions-Type L BB


Conversions

28 1 Type Checking. L BB

29 1 L BB
Back patching

SUBJECT INCHARGE HOD

PERIODS
HOURS
NUMBER TEACHING
S.NO TOPICS PLANNED REMARKS
OF AID
(L/T)
HOURS

UNIT IV -RUN-TIME ENVIRONMENT AND CODE GENERATIONS

30 1 Storage Organization L PPT

31 1 Stack Allocation Space L BB

32 1 Access to Non-local Data on the Stack L BB

10
DR.NNCE III / 05 CD - QB

33 2 Heap Management L PPT

34 2 Issues in Code Generation L BB

35 1 Design of a simple Code Generator L BB

SUBJECT INCHARGE HOD

PERIODS
HOURS
NUMBER TEACHING
S.NO TOPICS PLANNED REMARKS
OF AID
(L/T)
HOURS

UNIT V - CODE OPTIMIZATION

36 2 Principal Sources of Optimization L BB

37 1 Peep-hole optimization L PPT

38 2 DAG L BB

11
DR.NNCE III / 05 CD - QB

39 1 Optimization of Basic Blocks L BB

40 1 Global Data Flow Analysis L BB

41 1 Efficient Data Flow Algorithm L BB

SUBJECT INCHARGE HOD

12
DR.NNCE III / 05 CD - QB

UNIT I

Introduction- Translators- Compilation and Interpretation- Language processors -The Phases of


Compiler – Lexical Analysis – Role of Lexical Analyzer – Input Buffering – Specification of
Tokens Recognition of Tokens – Finite Automata – Regular Expressions to Automata NFA,
DFA – Minimizing DFA - Language for Specifying Lexical Analyzers – Lex tool.

IMPORTANT QUESTIONS

PART – A

1. Explain the role of lexical analyzer. APRIL/MAY 2024

2. Define lex. APRIL/MAY 2024

3. What is the difference between Complier and interpreter? APRIL/MAY 2024

4. Define interaction between lexical analyzer and parser. APRIL/MAY 2024

5. Define compiler. NOV/DEC 2023

6. What is finite automata. NOV/DEC 2023

7. Give the significance of symbol table. Draw a sample table. NOV/DEC 2023

8. Compare and contrast compiler and interpreter. NOV/DEC 2023

9. Programmer A has written a program which need to modify frequently. Which of


the 2 languages visual basic and c++ can he use for his programming justify in 2
sentence. APRIL/MAY 2022

10. LIST OUT THE PHASES INCLUDE IN THE ANALYSIS PHASES OF


COMPILER. APRIL/MAY 2022

11. Write regular expression to represent all possible numbers. NOV/DEC22

12. How and why input buffering is occurring? NOV/ DEC 22

13. What are the cousins of compiler? APRIL/MAY 2004, APRIL/MAY 2005

13
DR.NNCE III / 05 CD - QB

14. What are the main two parts of compilation? What are they performing? MAY
16,17,18,DCE18

15. State some compiler construction tools? ARPIL /MAY 2008

16. What is a Symbol table? DEC16

17. What is the need for separating the analysis phase into lexical analysis and
parsing? (Or) What are the issues of lexical analyzer? MAY13,14,

18. What is a lexeme? Define a regular set. NOV/DEC 2006

19. What is a sentinel? What is its usage? APRIL/MAY 2004

20. What are the Error-recovery actions in a lexical analyzer? MAY12,18

21. What is recognizer? Dec11,17

22. What is the role of lexical analyzer? Dec11, 17

23. Write a regular expression for identifier and number. DEC12,MAY17,19

24. define tokens, patterns, lexeme. MAY13,19, DEC16

25. Apply rules used to define a regular expression. Give example. MAY18, DEC18

26. Construct regular expressions for the language L={WЄ{a,b}| w ends in abb}
DEC18

PART – B

1. Divide the following C++ program.

Float limited square(x){

/*returns x-squared, but never Float x; more than 100*/

Return(x<=-10.0||x>=10.0)?100:x*x;

Into appropriate lexemes. Which lexemes should get associated lexical values? What
should those values be? APRIL/MAY 2024

2. Explain the role of lexical analyzer. APRIL/MAY 2024

14
DR.NNCE III / 05 CD - QB

27. Explain the structure of a compiler. Illustrate the output of each phase of
compilation. APRIL/MAY 2024

28. Construct minimized DFA for the regular expression (a|b)*abb APRIL/MAY 2024

29. What are the phases of compiler? Explain each phase in detail. NOV/DEC 2023

30. Briefly discuss about role of lexical analyzer. NOV/DEC 2023

31. Every statement of the software written in any programming language translated
to machine understandable language before execution. Elaborate the translation
process. Explain the process using the statement.

If(a==10) { print(“welcome”);} else { print(“exit”);} (Assume the statement written


in python language.) NOV/DEC 2023

NOV/DEC 2023,
APRIL/MAY 2022

3. Elaborate the difference phases of compiler with a neat sketch. Show the output of
each phase of a compiler when the following statement is parsed

SI= (p*n*r)/100

Where n should be an integer

P and r should be a floating point number APRIL/MAY 2022

4. List out the functions of a lexical analyzer? state the reasons for the separation of
analysis of program into lexical, syntax and semantic analysis. (APRIL/MAY 2023)

5. discuss the phases of a compiler indicating the input and output of each phase and
translating the statement “amount=principle +rate*360 (APRIL/MAY 2022,
NOV /DEC 2022, NOV / DEC 2020)

6. Discuss input buffering technique in detail. (DEC 10,11,13)


15
DR.NNCE III / 05 CD - QB

7. Write a lex program to recognize valid operators of c program. MAY 18

8. Explain compiler construction tool

9. Draw a transition diagram for relational operators MAY18,19

10. Draw a transition diagram for identifiers and keywords (NOV/ DEC 2020)

11. Explain about specification of tokens (NOV /DEC 22)

12. Explain recognization of token

13. Explain converting an regular expression to DFA.

11. Prove that the following two regular expressions are equivalent by showing that the
minimum state DFA’s are same. i) (a/b)* (APRIL / MAY 2022)

12. Construct a DFA without constructing NFA for following regular expression. Find
minimized DFA r = (a|b)*abb#

16
DR.NNCE III / 05 CD - QB

PART – A

1. What is a Complier?

A Complier is a program that reads a program written in one language-the source language-
and translates it in to an equivalent program in another language-the target language . As an
important part of this translation process, the compiler reports to its user the presence of errors in
the source program

2. State some software tools that manipulate source program?

 Structure editors

 Pretty printers

 Static

 checkers

 Interpreters.

3. What are the cousins of compiler? April/May 2004, April/May 2005

The following are the cousins of

 Preprocessors ii.Assemblers

 Loaders

 Link editors.

4. What are the main two parts of compilation? What are theyperforming?

MAY 16,17,18,DCE18
The two main parts are

17
DR.NNCE III / 05 CD - QB

 Analysis part breaks up the source program into constituent pieces and creates
an intermediate representation of the source program.

 Synthesis part constructs the desired target program from the intermediate representation

5. State some compiler construction tools? Arpil /May 2008

 Parse generator

 Scanner generators

 Syntax-directed translation engines

 Automatic code generator

 Data flow engines.

6. What is a Loader? What does the loading process do?

A Loader is a program that performs the two functions

 Loading

 Link editing

The process of loading consists of taking relocatable machine code, altering the
relocatable address and placing the altered instructions and data in memory at the proper
locations.

7. What does the Link Editing does?

Link editing: This allows us to make a single program from several files of relocatable
machine code. These files may have been the result of several compilations, and one or more
may be library files of routines provided by the system and available to any program that needs
them.

8. What is a preprocessor?

A preprocessor is one, which produces input to compilers. A source program may be


divided into modules stored in separate files. The task of collecting the source program is
sometimes entrusted to a distinct program called a preprocessor.

18
DR.NNCE III / 05 CD - QB

The preprocessor may also expand macros into source language statements.

Skeletalsource program

Preprocessor

Source program

9. State some functions of Preprocessors

 Macro processing

 File inclusion

 Relational Preprocessor

 Language extensions

10. What is a Symbol table? DEC16

A Symbol table is a data structure containing a record for each identifier, with fields for the
attributes of the identifier. The data structure allows us to find the record for each identifier
quickly and to store or retrieve data from that record quickly.

11. State the general phases of a compiler

 Lexical analysis

 Syntax analysis

 Semantic analysis

 Intermediate code generation

 Code optimization

 Code generation

12. What is an assembler?

Assembler is a program, which converts the source language in to assembly language.

13. What is the need for separating the analysis phase into lexical analysis and parsing?
(Or) What are the issues of lexical analyzer? MAY13,14,

Simpler design is perhaps the most important consideration. The separation of lexical
analysis from syntax analysis often allows us to simplify one or the other of these phases.
19
DR.NNCE III / 05 CD - QB

Compiler efficiency is improved.

Compiler portability is enhanced.

14. What is Lexical Analysis?

The first phase of compiler is Lexical Analysis. This is also known as linear analysis in
which the stream of characters making up the source program is read from left-to- right and
grouped into tokens that are sequences of characters having a collective meaning.

15. What is a lexeme? Define a regular set. Nov/Dec 2006

A Lexeme is a sequence of characters in the source program that is matched by the


pattern for a token.

A language denoted by a regular expression is said to be a regular set

16. What is a sentinel? What is its usage? April/May 2004

A Sentinel is a special character that cannot be part of the source program. Normally weuse
‘eof’ as the sentinel. This is used for speeding-up the lexical analyzer.

17. What is a regular expression? State the rules, which define regular expression?

Regular expression is a method to describe regular language Rules:

ε-is a regular expression that denotes {ε} that is the set containing the empty string

If a is a symbol in ∑,then a is a regular expression that denotes {a}

Suppose r and s are regular expressions denoting the languages L(r ) and L(s) Then,

(r )/(s) is a regular expression denoting L(r) U L(s).

(r )(s) is a regular expression denoting L(r )L(s)

(r )* is a regular expression denoting L(r)*.

(r) is a regular expression denoting L(r ).

18. What are the Error-recovery actions in a lexical analyzer? May12,18

Deleting an extraneous character


20
DR.NNCE III / 05 CD - QB

Inserting a missing character

Replacing an incorrect character by a correct character

Transposing two adjacent characters

19. Construct Regular expression for the language

L= {w ε{a,b}/w ends in abb} Ans: {a/b}*abb.

20. What is recognizer? Dec11,17

Recognizers are machines. These are the machines which accept the strings belonging to
certain language. If the valid strings of such language are accepted by the machine then it is said
that the corresponding language is accepted by that machine, otherwise it is rejected.

21. What is the role of lexical analyzer? Dec11, 17

The lexical analyzer scans the source program and separates out the tokens from it.

22. Write a regular expression for identifier and number. DEC12,MAY17,19

Regular expression for identifier is

R.E=letter(letter+digit)*

Regular expression for integer number is

R.E=digit.digit*

23. define tokens, patterns, lexeme. MAY13,19, DEC16

TOKEN: it describes the class or category of input string. Eg.


Identifier,keywords,constants are called tokens.

PATTERN: the set of rules that describe the token.

LEXEMES: it represent the sequence of charecters in the source program that are
matched with pattern of token.

24. Apply rules used to define a regular expression. Give example. MAY18, DEC18
21
DR.NNCE III / 05 CD - QB

ɛ is a regular expression that denotes the set containing empty string.

If R1 and R2 are regular expressions then R=R1+R2 (same as R=R1|R2) is also a regular
expression which represents union expression.

If R1 and R2 are regular expressions then R=R1.R2 is also a regular expression which represents
concatenation operation.

If R1 is a regular expressions then R=R1* is also a regular expression which represents kleen
closure.

25. Construct regular expressions for the language L={WЄ{a,b}| w ends in abb} DEC18

Regular expression =(a+b)*abb

32. Write regular expression to represent all possible numbers. DEC22

digit-> [0-9]

digits-> digit*

number-> digits(.digits)? (E[+-]? Digits)?

33. How and why input buffering is occurring? DEC 22

In a lexical analysis phase, during token recognition process, one or more characters beyond the
lexeme must be searched to get the correct identification of lexeme. Hence there is need to
manage some lookaheads. For this purpose input buffering is used

A two buffer scheme is used to manage the lookaheads. Using the sentinels the buffer ends are
marked and between the two end pointers the correct lexeme is identified.

PART – B

Important questions
1. List out the functions of a lexical analyzer? state the reasons for the separation of
analysis of program into lexical, syntax and semantic analysis.
(APRIL/MAY 2023)

 It is the first phase of the compiler. It gets input from the source program and produces tokens as output. It reads
the characters one by one, starting from left to right and forms the tokens.

22
DR.NNCE III / 05 CD - QB

 Token : It represents a logically cohesive sequence of characters such as keywords,

 operators, identifiers, special symbols etc.

 Example: a + b = 20

 Here, a,b,+,=,20 are all separate tokens.

 Group of characters forming a token is called the Lexeme.

 The lexical analyser not only generates a token but also enters the lexeme into the symbol table if it is not
already there.

 Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for
syntax analysis.

 Upon receiving a “get next token” command from the parser, the lexical analyzer reads input characters until
it can identify the next token.

Reason for separation of program analysis

Modularity - each analysis phase has distinct responsibilities and can be implemented
independently, making the compiler design modular and easier to understand.

Error isolation- separating analysis helps in locating errors to specific phasesmake easier
to fix issues

Efficiency - focus on specific tasks. Optimizing the overall compilation.

Language specification- where lexical syntax and semantic aspects are distinct and defined
separately

Reusability - individual analysis phase can be reused for different programming lang,
reducing redundancy and improving development efficiency.

2. discuss the phases of a compiler indicating the input and output of each phase and
translating the statement “amount=principle+rate*360 (APRIL/MAY 2023)

PHASES OF COMPILER

A Compiler operates in phases, each of which transforms the source program from one representation into another.
The following are the phases of the compiler:

23
DR.NNCE III / 05 CD - QB

Main phases:

Lexical analysis

Syntax analysis

Semantic analysis

Intermediate code generation

Code optimization

Code generation

Sub-Phases:

Symbol table management

Error handling

LEXICAL ANALYSIS:

It is the first phase of the compiler. It gets input from the source program and produces tokens as output. It reads the
characters one by one, starting from left to right and forms the tokens.
24
DR.NNCE III / 05 CD - QB

Token : It represents a logically cohesive sequence of characters such as keywords,

operators, identifiers, special symbols etc.

Example: a + b = 20

Here, a,b,+,=,20 are all separate tokens.

Group of characters forming a token is called the Lexeme.

The lexical analyser not only generates a token but also enters the lexeme into the symbol table if it is not
already there.

SYNTAX ANALYSIS:

It is the second phase of the compiler. It is also known as parser.

It gets the token stream as input from the lexical analyser of the compiler and generates

syntax tree as the output.

Syntax tree:

It is a tree in which interior nodes are operators and exterior nodes are operands.

Example: For a=b+c*2, syntax tree is

a +

b *

c 2

SEMANTIC ANALYSIS:

It is the third phase of the compiler.

It gets input from the syntax analysis as parse tree and checks whether the given syntax is correct or not. It performs type
conversion of all the data types into real data types.

INTERMEDIATE CODE GENERATION:

It is the fourth phase of the compiler.


25
DR.NNCE III / 05 CD - QB

It gets input from the semantic analysis and converts the input into output as intermediate code such as three
address code.

The three-address code consists of a sequence of instructions, each of which has atmost three operands.

Example: t1=t2+t3

CODE OPTIMIZATION:

It is the fifth phase of the compiler.

It gets the intermediate code as input and produces optimized intermediate code as output.

This phase reduces the redundant code and attempts to improve the intermediate code so that faster-running machine
code will result.

During the code optimization, the result of the program is not affected. To improve the code generation, the optimization
involves

deduction and removal of dead code (unreachable code).

calculation of constants in expressions and terms.

collapsing of repeated expression into temporary string.

loop unrolling.

moving code outside the loop.

removal of unwanted temporary variables.

CODE GENERATION:

It is the final phase of the compiler.

It gets input from code optimization phase and produces the target code or object code as result. Intermediate instructions
are translated into a sequence of machine instructions that perform the same task. The code generation involves

allocation of register and memory

generation of correct references

generation of correct data types

26
DR.NNCE III / 05 CD - QB

generation of missing code

SYMBOL TABLE MANAGEMENT:

Symbol table is used to store all the information about identifiers used in the program.

It is a data structure containing a record for each identifier, with fields for the attributes of the identifier.

It allows to find the record for each identifier quickly and to store or retrieve data from that record.

Whenever an identifier is detected in any of the phases, it is stored in the symbol table.

Position:= initial + rate *360

27
DR.NNCE III / 05 CD - QB

temp1:= int to real


(360) temp2:= id3 *
temp1 temp3:= id2
+ temp2

id1:=temp3

Code optimiation

28
DR.NNCE III / 05 CD - QB

Temp1:= id3 *360.0


Id1:= id2 +temp1

Code Generator

MOVF id3, r2
MULF *60.0,
r2
MOVF id2,
r2
ADDF r2, r1
MOVF r1,
id1

ERROR HANDLING:
Each phase can encounter errors. After detecting an error, a phase must handle the error so that compilation can
proceed.

In lexical analysis, errors occur in separation of tokens.

In syntax analysis, errors occur during construction of syntax tree.

In semantic analysis, errors occur when the compiler detects constructs with right syntactic structure but no meaning
and during type conversion.

In code optimization, errors occur when the result is affected by the optimization. In code generation, it shows error
when code is missing etc.

3. Discuss input buffering technique in detail. (DEC 10,11,13)

INPUT BUFFERING

We often have to look one or more characters beyond the next lexeme before we can be sure we have the right lexeme.
As characters are read from left to right, each character is stored in the buffer to form a meaningful token as shown below:

Forward pointer

A = B + C

Beginning of the token Look ahead pointer

29
DR.NNCE III / 05 CD - QB

We introduce a two-buffer scheme that handles large look aheads safely. We then consider an improvement involving
"sentinels" that saves time checking for the ends of buffers.

BUFFER PAIRS

A buffer is divided into two N-character halves, as shown below

::E::=::M:* : * : : * : 2 : eof
C

lexeme_beginning

forward

Each buffer is of the same size N, and N is usually the number of characters on one disk block. E.g., 1024 or 4096
bytes.

Using one system read command we can read N characters into a buffer.

If fewer than N characters remain in the input file, then a special character, represented by eof, marks the end of the
source file.

Two pointers to the input are maintained:

Pointer lexeme_beginning, marks the beginning of the current lexeme,

whose extent we are attempting to determine.

Pointer forward scans ahead until a pattern match is found.

Once the next lexeme is determined, forward is set to the character at its right end.

The string of characters between the two pointers is the current lexeme.

After the lexeme is recorded as an attribute value of a token returned to the parser,

lexeme_beginning is set to the character immediately after the lexeme just found.

Advancing forward pointer:

Advancing forward pointer requires that we first test whether we have reached the end of one of the buffers, and if so, we
must reload the other buffer from the input, and move forward to the beginning of the newly loaded buffer. If the end of
30
DR.NNCE III / 05 CD - QB

second buffer is reached, we must again reload the first buffer with input and the pointer wraps to the beginning of the
buffer.

Code to advance forward pointer:

if forward at end of first half then begin reload second half;

forward := forward + 1

end

else if forward at end of second half then begin reload second half;

move forward to beginning of first half

end

else forward := forward + 1;

SENTINELS

For each character read, we make two tests: one for the end of the buffer, and one to determine what character is
read. We can combine the buffer-end test with the test for the current character if we extend each buffer to hold a
sentinel character at the end.

The sentinel is a special character that cannot be part of the source program, and a natural choice is the character eof.

: : E : : = : : M : * : eof C : * : : * : 2 : eof : : : eof

lexeme_beginning

forward

Note that eof retains its use as a marker for the end of the entire input. Any eof that appears other than at the end of a
buffer means that the input is at an end.

4. Write a lex program to recognize valid operators of c program. May 18

LEX

31
DR.NNCE III / 05 CD - QB

Lex is a computer program that generates lexical analyzers. Lex is commonly used with

the yacc parser generator.

Creating a lexical analyzer

First, a specification of a lexical analyzer is prepared by creating a program lex.l in the Lex language.

Then, lex.l is run through the Lex compiler to produce a C program lex.yy.c.

Finally, lex.yy.c is run through the C compiler to produce an object program a.out, which is the lexical analyzer that
transforms an input stream into a sequence of tokens.

Lex.x.l LEX compiler Lex.yy.c

Lex.yy.c C compiler a.out

Stream of output
Input string a.out

Lex Specification

A Lex program consists of three parts:

{ definitions }

%%

{ rules }

%%

{ user subroutines }

Definitions include declarations of variables, constants, and regular definitions

Rules are statements of the form p1

{action1}

p2 {action2}

32
DR.NNCE III / 05 CD - QB

pn {actionn}

where pi is regular expression and actioni describes what action the lexical analyzer should take when pattern pi
matches a lexeme. Actions are written in C code.

User subroutines are auxiliary procedures needed by the actions. These can be compiled separately and loaded
with the lexical analyzer.

5. Explain compiler construction tool

These are specialized tools that have been developed for helping implement various phases of a compiler.

The following are the compiler construction tools:

Parser Generators:

-These produce syntax analyzers, normally from input that is based on a context-free grammar.

-It consumes a large fraction of the running time of a compiler.

-Example-YACC (Yet another Compiler-Compiler).

Scanner Generator:

-These generate lexical analyzers, normally from a specification based on regular expressions.

-The basic organization of lexical analyzers is based on finite automation.

Syntax-Directed Translation:

-These produce routines that walk the parse tree and as a result generate intermediate code.

-Each translation is defined in terms of translations at its neighbor nodes in the tree.

Automatic Code Generators:

-It takes a collection of rules to translate intermediate language into machine language. The rules must
include sufficient details to handle different possible access methods for data.

Data-Flow Engines:

33
DR.NNCE III / 05 CD - QB

-It does code optimization using data-flow analysis, that is, the gathering of information about how values
are transmitted from one part of a program to each other part.

6. Draw a transition diagram for relational operators MAY18,19

7. Draw a transition diagram for identifiers and keywords

8. Explain about specification of tokens

SPECIFICATION OF TOKENS

There are 3 specifications of tokens:

Strings

Language
34
DR.NNCE III / 05 CD - QB

Regular expression

Strings and Languages

An alphabet or character class is a finite set of symbols.

A string over an alphabet is a finite sequence of symbols drawn from that alphabet. A language is any countable set
of strings over some fixed alphabet.

In language theory, the terms "sentence" and "word" are often used as synonyms for "string." The length of a string s,
usually written |s|, is the number of occurrences of symbols in s. For example, banana is a string of length six. The
empty string, denoted ε, is the string of length zero.

Operations on strings

The following string-related terms are commonly used:

A prefix of string s is any string obtained by removing zero or more symbols from the end of string s.

A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s. For
example, nana is a suffix of banana.

A substring of s is obtained by deleting any prefix and any suffix from s. For example, nan is a substring of
banana.

The proper prefixes, suffixes, and substrings of a string s are those prefixes, suffixes, and substrings,
respectively of s that are not ε or not equal to s itself.

A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For
example, baan is a subsequence of banana.

Operations on languages:

The following are the operations that can be applied to languages:

1.Union 2.Concatenation 3.Kleene closure 4.Positive closure

The following example shows the operations on strings: Let L={0,1} and S={a,b,c}

Union : L U S={0,1,a,b,c}

Concatenation : L.S={0a,1a,0b,1b,0c,1c}

Kleene closure : L*={ ε,0,1,00….}

Positive closure : L+={0,1,00….}

35
DR.NNCE III / 05 CD - QB

9. Explain recognization of token

Consider the following grammar fragment:

stmt → if expr then stmt

| if expr then stmt else stmt

expr → term relop term

| term

term → id

| num

where the terminals if , then, else, relop, id and num generate sets of strings given by the following regular definitions:

if → if

then → then

else → else

relop → <|<=|=|<>|>|>=

id → letter(letter|digit)*

num → digit+ (.digit+)?(E(+|-)?digit+)?

For this language fragment the lexical analyzer will recognize the keywords if, then, else, as well as the lexemes
denoted by relop, id, and num. To simplify matters, we assume keywords are reserved; that is, they cannot be used as
identifiers.

10. Explain converting an regular expression to DFA.

The task of a scanner generator, such as flex, is to generate the transition tables or to
synthesize the scanner program given a scanner specification (in the form of a set of REs). So it
needs to convert a RE into a DFA. This is accomplished in two steps: first it convert a RE to
NFA then convert the NFA to DFA

DFA.

36
DR.NNCE III / 05 CD - QB

A NFA is similar to a DFA but it also permits multiple transitions over the same
character and transitions over . The first type indicates that, when reading the common character
associated with these transitions, we have more than one choice; the NFA succeeds if at least one
of these choices succeeds. The transition doesn't consume any input characters, so you may
jump to another state for free.

Clearly DFAs are a subset of NFAs. But it turns out that DFAs and NFAs have the same
expressive power. The problem is that when converting a NFA to a DFA we may get an
exponential blowup in the number of states.

We will first learn how to convert a RE into a NFA. This is the easy part. There are only
5 rules, one for each type of RE:

The algorithm constructs NFAs with only one final state. For example, the third rule
indicates that, to construct the NFA for the RE AB, we construct the NFAs for A and B which are
represented as two boxes with one start and one final state for each box. Then the NFA for AB
is constructed by connecting the final state of A to the start state of B using an empty transition.

For example, the RE (a|b)c is mapped to the following NFA:

37
DR.NNCE III / 05 CD - QB

The next step is to convert a NFA to a DFA (called subset construction). Suppose that you assign
a number to each NFA state. The DFA states generated by subset construction have sets of
numbers, instead of just one number.

First we need to handle transitions that lead to other states for free (without
consuming any input). These are the transitions. We define the closure of a
NFA node as the set of all the nodes reachable by this node using zero, one,
or more transitions. For example, The closure of node 1 in the left figure
below

is the set {1,2}. The start state of the constructed DFA is labeled by the
closure of the NFA start state. For every DFA state labeled by some set
and for every character c in the language alphabet, you find all
the states reachable by s1, s2, ..., or sn using c arrows and you union
together the closures of these nodes.
If this set is not the label of any other node in the DFA constructed so
far, you create a new DFA node with this label. For example, node {1,2} in
the DFA above has an arrow to a {3,4,5} for the character a since the NFA
node 3 can be reached by 1 on a and nodes 4 and 5 can be reached by 2. The
b arrow for node {1,2} goes to the error node which is associated with an
empty set of NFA nodes. The following NFA recognizes ,

even though it wasn't constructed with the 5 RE-to-NFA rules. It has the
following DFA:
38
DR.NNCE III / 05 CD - QB

11.Prove that the following two regular expressions are equivalent by showing that the
minimum state DFA’s are same. i) (a/b)*

Transition table

NFA State DFA State a b

{0,1,2,3,7} A B C

{1,2,3,4,6,7} B B C

{1,2,3,5,6,7} C B C

ɛ-CLOSURE{q0} = {q0,q1,q2,q4,q7}
ɛ-CLOSURE{q1} = {q1,q2,q4}

39
DR.NNCE III / 05 CD - QB

ɛ-CLOSURE{q2} ={q2}
ɛ-CLOSURE{q3} ={q3,q6,q1,q2,q4,q7}
ɛ-CLOSURE{q4} ={q4}
ɛ-CLOSURE{q5} ={q1,q2,q4,q5,q6,q7}
ɛ-CLOSURE{q6} ={ q1,q2,q4,q6,q7}
ɛ-CLOSURE{q7} ={q7}

consider ɛ-CLOSURE{q0} = {q0,q1,q2,q4,q7}  call as state A

ś(A,a)= ɛ-CLOSURE{ ś(A,a)}


=ɛ-CLOSURE{ ś(q0,q1,q2,q4,q7),a}
{q1,q2,q3,q4,q6,q7} call as state B
ś(A,b)= ɛ-CLOSURE{ ś(A,b)}
=ɛ-CLOSURE{ ś(q0,q1,q2,q4,q7),b}
{q1,q2,q4,q5,q6,q7}  call as state C

ś(B,a)= ɛ-CLOSURE{ ś(B,a) }


=ɛ-CLOSURE{ ś(q1,q2,q3,q4,q6,q7),a}
{q1,q2,q3,q4,q6,q7} call as state B
ś(B,b)= ɛ-CLOSURE{ ś(B,b) }
=ɛ-CLOSURE{ ś(q1,q2,q3,q4,q6,q7),b}
{q1,q2,q4,q5,q6,q7}  call as state C
ś(C,a)= ɛ-CLOSURE{ ś(C,a) }
=ɛ-CLOSURE{ ś(q1,q2,q4,q5,q6,q7),a}
{q1,q2,q3,q4,q6,q7} call as state B
ś(C,b)= ɛ-CLOSURE{ ś(C,b) }
=ɛ-CLOSURE{ ś(q1,q2,q4,q5,q6,q7),b}

40
DR.NNCE III / 05 CD - QB

{q1,q2,q4,q5,q6,q7}  call as state C

DFA

13. Construct a DFA without constructing NFA for following regular expression. Find
minimized DFA r = (a|b)*abb#

1. Firstly, we construct the augmented regular expression for the given expression. By
concatenating a unique right-end marker ‘#’ to a regular expression r, we give the
accepting state for r a transition on ‘#’ making it an important state of the NFA for r#.
So, r' = (a|b)*abb#
2. Then we construct the syntax tree for r#.

41
DR.NNCE III / 05 CD - QB

Next we need to evaluate four functions nullable, firstpos, lastpos, and followpos.

1. nullable(n) is true for a syntax tree node n if and only if the regular expression represented
by n has € in its language.

2. firstpos(n) gives the set of positions that can match the first symbol of a string generated by
the subexpression rooted at n.

3. lastpos(n) gives the set of positions that can match the last symbol of a string generated by
the subexpression rooted at n.
4.
We refer to an interior node as a cat-node, or-node, or star-node if it is labeled by a
concatenation, | or * operator, respectively.

Rules for computing nullable, firstpos, and lastpos:

Node n nullable(n) firstpos(n) lastpos(n)

∅ ∅
n is a leaf node
true
labeled €

42
DR.NNCE III / 05 CD - QB

Node n nullable(n) firstpos(n) lastpos(n)

n is a leaf node
labelled with false {i} {i}
position i

firstpos(c1) ∪
lastpos(c1) ∪ lastpos(c2)
n is an or node with
nullable(c1) or
left child c1 and
nullable(c2) firstpos(c2)
right child c2

firstpos(c1) ∪
If nullable(c1) then
lastpos(c2) ∪ lastpos(c1)
If nullable(c2) then
n is a cat node with nullable(c1) and
left child c1 and nullable(c2) firstpos(c2) else
else lastpos(c2)
right child c2 firstpos(c1)

n is a star node
true firstpos(c1) lastpos(c1)
with child node c1

1. If n is a cat-node with left child c1 and right child c2 and i is a position in lastpos(c1),
then all positions in firstpos(c2) are in followpos(i).

2. If n is a star-node and i is a position in lastpos(n), then all positions in firstpos(n) are in


followpos(i).

3. Now that we have seen the rules for computing firstpos and lastpos, we now proceed to
calculate the values of the same for the syntax tree of the given regular expression (a|b)*abb#

43
DR.NNCE III / 05 CD - QB

Let us now compute the followpos bottom up for each node in the syntax tree.

DE followpos

1 {1, 2, 3}

2 {1, 2, 3}

3 {4}

4 {5}

5 {6}

6 ∅

Now we construct Dstates, the set of states of DFA D and Dtran, the transition table
for D. The start state of DFA D is firstpos(root) and the accepting states are all those
containing the position associated with the endmarker symbol #.

consider the input symbol a. Positions 1 and 3 are for a, so let B = followpos(1) ∪
According to our example, the firstpos of the root is {1, 2, 3}. Let this state be A and

followpos(3) = {1, 2, 3, 4}. Since this set has not yet been seen, we set Dtran[A, a] := B.

44
DR.NNCE III / 05 CD - QB

When we consider input b, we find that out of the positions in A, only 2 is associated
with b, thus we consider the set followpos(2) = {1, 2, 3}. Since this set has already been seen
before, we do not add it to Dstates but we add the transition Dtran[A, b]:= A.

Continuing like this with the rest of the states, we arrive at the below transition table.

Input

State a b

⇢A B A

B B C

C B D

D B A

Here, A is the start state and D is the accepting state.

5. Finally we draw the DFA for the above transition table.


The final DFA will be :

14. Divide the following C++ program.

45
DR.NNCE III / 05 CD - QB

Float limited square(x){

/*returns x-squared, but never Float x; more than 100*/

Return(x<=-10.0||x>=10.0)?100:x*x;

Into appropriate lexemes. Which lexemes should get associated lexical values? What
should those values be? APRIL/MAY 2024

Scanning: The scanning phase-only eliminates the non-token elements from the source program. Such
as eliminating comments, compacting the consecutive white spaces etc.

Lexical Analysis: Lexical analysis phase performs the tokenization on the output provided by the
scanner and thereby produces tokens.

<Float> <limited> <square><(><x>)<{>

/*returns x-squared, but never Float x; more than 100*/

<Return><(><x><<=><-10.0><||><x><>=><10.0><)><?><100><:><x><*><x>;

Keyword -> <Float>

Keyword -> <limited>

Identifier -> <square>

Special Character -> <(>

Identifier -> <x>

Special Character -> <)>

Special Character -> <{>

Literal -> /*returns x-squared, but never Float x; more than 100*/

Keyword -> <Return>

Special Character -> <(>

Identifier -> <x>

Operators-> <<=>

46
DR.NNCE III / 05 CD - QB

Constant-> <-10.0>

Operators-> <||>

Identifier -> <x>

Operators-> <>=>

Constant-> <10.0>

Special Character -> <)>

Operators-> <?>

Constant-> <100>

Operators-> <:>

Identifier -> <x>

Operators-> <*>

Identifier -> <x>

Separator -> <;>

Special Character -> <}>

15. Explain the role of lexical analyzer. APRIL/MAY 2024

Being the first phase in the analysis of the source program the lexical analyzer plays an important role
in the transformation of the source program to the target program.

This entire scenario can be realized with the help of the figure given below:

47
DR.NNCE III / 05 CD - QB

The lexical analyzer phase has the scanner or lexer program implemented in it which produces tokens
only when they are commanded by the parser to do so.

The parser generates the getNextToken command and sends it to the lexical analyzer as a response to
this the lexical analyzer starts reading the input stream character by character until it identifies a
lexeme that can be recognized as a token.

As soon as a token is produced the lexical analyzer sends it to the syntax analyzer for parsing.

Along with the syntax analyzer, the lexical analyzer also communicates with the symbol table. When
a lexical analyzer identifies a lexeme as an identifier it enters that lexeme into the symbol table.

Sometimes the information of identifier in symbol table helps lexical analyzer in determining the
token that has to be sent to the parser.

Apart from identifying the tokens in the input stream, the lexical analyzer also eliminates the blank
space/white space and the comments of the program. Such other things include characters the
separates tokens, tabs, blank spaces, new lines.

The lexical analyzer helps in relating the error messages produced by the compiler. Just, for example,
the lexical analyzer keeps the record of each new line character it comes across while scanning the
source program so it easily relates the error message with the line number of the source program.

If the source program uses macros, the lexical analyzer expands the macros in the source program.

Lexical Error

The lexical analyzer itself is not efficient to determine the error from the source program. For
example, consider a statement:

prtf(“ value of i is %d ”, i);

48
DR.NNCE III / 05 CD - QB

Now, in the above statement when the string prtf is encountered the lexical analyzer is unable to
guess whether the prtf is an incorrect spelling of the keyword ‘printf’ or it is an undeclared function
identifier.

But according to the predefined rule prtf is a valid lexeme whose pattern concludes it to be an
identifier token. Now, the lexical analyzer will send prtf token to the next phase i.e. parser that will be
handling the error that occurred due to the transposition of letters.

Error Recovery

Well, sometimes it is even impossible for a lexical analyzer to identify a lexeme as a token, as the
pattern of the lexeme does not match any of the predefined patterns for tokens. In this case, we have
to apply some error recovery strategies.

In panic mode recovery the successive character from the lexeme is deleted until the lexical analyzer
identifies a valid token.

Eliminate the first character from the remaining input.

Identify the possible missing character and insert it into the remaining input appropriately.

Replace a character in the remaining input to get a valid token.

Exchange the position of two adjacent characters in the remaining input.

While performing the above error recovery actions check whether the prefix of the remaining input
matches any pattern of tokens. Generally, a lexical error occurs due to a single character. So, you can
correct the lexical error with a single transformation. And as far as possible a smaller number of
transformations must convert the source program into a sequence of valid tokens that it can hand over
to the parser.

49

You might also like