100% found this document useful (1 vote)

257 views11 pages

III Year-V Semester: B.Tech. Computer Science and Engineering 5CS4-02: Compiler Design UNIT-1

1. The document discusses various topics related to compiler design including: - The phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. 2. It reviews finite automata and their use in lexical analysis. Lexical analysis is the first phase of compilation that breaks source code into tokens. 3. It defines bootstrapping as the process of using a compiler to compile itself, allowing the creation of a self-hosting compiler for a language.

Uploaded by

shelo berman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

257 views11 pages

III Year-V Semester: B.Tech. Computer Science and Engineering 5CS4-02: Compiler Design UNIT-1

Uploaded by

shelo berman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

III Year-V Semester: B.Tech.

Computer Science and Engineering

5CS4-02: Compiler Design

UNIT-1

1 Introduction: Objective, scope and outcome of the course. Compiler, Translator,

Interpreter definition, Phase of compiler, Bootstrapping, Review of Finite automata
lexical analyzer, Input, Recognition of tokens, Idea about LEX: A lexical analyzer
generator, Error handling.
2 Review of CFG Ambiguity of grammars: Introduction to parsing. Top down parsing, LL
grammars & passers error handling of LL parser, Recursive descent parsing predictive
parsers, Bottom up parsing, Shift reduce parsing, LR parsers, Construction of SLR,
Conical LR & LALR parsing tables, parsing with ambiguous grammar. Operator
precedence parsing, Introduction of automatic parser generator: YACC error handling in
LR parsers.
3 Syntax directed definitions; Construction of syntax trees, S- Attributed Definition, L-
attributed definitions, Top down translation. Intermediate code forms using postfix
notation, DAG, Three address code, TAC for various control structures, Representing
TAC using triples and quadruples, Boolean expression and control structures.
4 Storage organization; Storage allocation, Strategies, Activation records, Accessing local
and non-local names in a block structured language, Parameters passing, Symbol table
organization, Data structures used in symbol tables.
5 Definition of basic block control flow graphs; DAG representation of basic block,
Advantages of DAG, Sources of optimization, Loop optimization, Idea about global data
flow analysis, Loop invariant computation, Peephole optimization, Issues in design of
code generator, A simple code generator, Code generation from DAG.

1.1 What is Translators? Different type of translators

A program written in high-level language is called as source code. To convert the source code
into machine code, translators are needed.
A translator takes a program written in source language as input and converts it into a program in
target language as output.
It also detects and reports the error during translation.
Roles of translator are:
• Translating the high-level language program input into an equivalent machine language
program.
• Providing diagnostic messages wherever the programmer violates specification of the high-level
language program.
Different type of translators

The different types of translator are as follows:

Compiler

Compiler is a translator which is used to convert programs in high-level language to low-level

language. It translates the entire program and also reports the errors in source program
encountered during the translation.

Interpreter

Interpreter is a translator which is used to convert programs in high-level language to low-level

language. Interpreter translates line by line and reports the error once it encountered during the
translation process.
It directly executes the operations specified in the source program when the input is given by the
user.
It gives better error diagnostics than a compiler.

Differences between compiler and interpreter

SI. Compiler Interpreter

1 Performs the translation of a program as Performs statement by statement

a whole. translation.

2 Execution is faster. Execution is slower.

3 Requires more memory as linking is Memory usage is efficient as no

needed for the generated intermediate intermediate object code is generated.
object code.

4 Debugging is hard as the error messages It stops translation when the first error
are generated after scanning the entire is met. Hence, debugging is easy.
program only.
5 Programming languages like C, C++ Programming languages like Python,
uses compilers. BASIC, and Ruby uses interpreters.

1.2 What are the Phases of Compiler Design?

Compiler operates in various phases each phase transforms the

source program from one representation to another. Every phase
takes inputs from its previous stage and feeds its output to the
next phase of the compiler.

There are 6 phases in a compiler. Each of this phase help in

converting the high-level langue the machine code. The phases
of a compiler are:

1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generator
5. Code optimizer
6. Code generator

Lexical Analysis:
Lexical analyzer phase is the first phase of compilation process. It takes source code as input. It
reads the source program one character at a time and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens.

Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and
generates a parse tree as output. In syntax analysis phase, the parser checks that the expression
made by the tokens is syntactically correct or not.

Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether the parse tree
follows the rules of language. Semantic analyzer keeps track of identifiers, their types and
expressions. The output of semantic analysis phase is the annotated tree syntax.

Intermediate Code Generation

In the intermediate code generation, compiler generates the source code into the intermediate
code. Intermediate code is generated between the high-level language and the machine language.
The intermediate code should be generated in such a way that you can easily translate it into the
target machine code.
Code Optimization
Code optimization is an optional phase. It is used to improve the intermediate code so that the
output of the program could run faster and take less space. It removes the unnecessary lines of
the code and arranges the sequence of statements in order to speed up the program execution.

Code Generation
Code generation is the final stage of the compilation process. It takes the optimized intermediate
code as input and maps it to the target machine language. Code generator translates the
intermediate code into the machine code of the specified computer.

Example:

1.3 Bootstrapping
o Bootstrapping is widely used in the compilation development.
o Bootstrapping is used to produce a self-hosting compiler. Self-hosting compiler is a type of
compiler that can compile its own source code.
o Bootstrap compiler is used to compile the compiler and then you can use this compiled
compiler to compile everything else as well as future versions of itself.

A compiler can be characterized by three languages:

1. Source Language
2. Target Language
3. Implementation Language

The T- diagram shows a compiler SCIT for Source S, Target T, implemented in I.

Follow some steps to produce a new language L for machine A:

1. Create a compiler SCAA for subset, S of the desired language, L using language "A" and that
compiler runs on machine A.

2. Create a compiler LCSA for language L written in a subset of L.

3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler for language L, which
runs on machine A and produces code for machine A.

The process described by the T-diagrams is called bootstrapping.

1.4 Review of Finite Automata

Finite automata is a state machine that takes a string of symbols as input and changes its state
accordingly. Finite automata is a recognizer for regular expressions. When a regular expression
string is fed into finite automata, it changes its state for each literal. If the input string is
successfully processed and the automata reaches its final state, it is accepted, i.e., the string just
fed was said to be a valid token of the language in hand.
The mathematical model of finite automata consists of:

 Finite set of states (Q)

 Finite set of input symbols (Σ)
 One Start state (q0)
 Set of final states (qf)
 Transition function (δ)
The transition function (δ) maps the finite set of state (Q) to a finite set of input symbols (Σ), Q × Σ
➔Q

Finite Automata Construction

Let L(r) be a regular language recognized by some finite automata (FA).
 States : States of FA are represented by circles. State names are written inside circles.
 Start state : The state from where the automata starts, is known as the start state. Start
state has an arrow pointed towards it.
 Intermediate states : All intermediate states have at least two arrows; one pointing to and
another pointing out from them.
 Final state : If the input string is successfully parsed, the automata is expected to be in this
state. Final state is represented by double circles. It may have any odd number of arrows
pointing to it and even number of arrows pointing out from it. The number of odd arrows
are one greater than even, i.e. odd = even+1.
 Transition : The transition from one state to another state happens when a desired symbol
in the input is found. Upon transition, automata can either move to the next state or stay in
the same state. Movement from one state to another is shown as a directed arrow, where
the arrows points to the destination state. If automata stays on the same state, an arrow
pointing from a state to itself is drawn.
Example : We assume FA accepts any three digit binary value ending in digit 1. FA = {Q(q0, qf),
Σ(0,1), q0, qf, δ}

1.5 Compiler Design - Lexical Analysis

Lexical analysis is the first phase of a compiler. It takes the modified source code from language
preprocessors that are written in the form of sentences. The lexical analyzer breaks these
syntaxes into a series of tokens, by removing any whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical analyzer works
closely with the syntax analyzer. It reads character streams from the source code, checks for
legal tokens, and passes the data to the syntax analyzer when it demands.
1.6 Tokens

Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some
predefined rules for every lexeme to be identified as a valid token. These rules are defined by
grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns
are defined by means of regular expressions.
In programming language, keywords, constants, identifiers, strings, numbers, operators and
punctuations symbols can be considered as tokens.
For example, in C language, the variable declaration line
int value = 100;
contains the tokens:
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Specifications of Tokens

Let us understand how the language theory undertakes the following terms:
Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a
set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.
Strings
Any finite sequence of alphabets is called a string. Length of the string is the total number of
occurrence of alphabets, e.g., the length of the string tutorialspoint is 14 and is denoted by
|tutorialspoint| = 14. A string having no alphabets, i.e. a string of zero length is known as an empty
string and is denoted by ε (epsilon).
Special Symbols
A typical high-level language contains the following symbols:-

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)

Assignment =
Special Assignment +=, /=, *=, -=

Comparison ==, !=, <, <=, >, >=

Preprocessor #

Location Specifier &

Logical &, &&, |, ||, !

Shift Operator >>, >>>, <<, <<<

Language
A language is considered as a finite set of strings over some finite set of alphabets. Computer
languages are considered as finite sets, and mathematically set operations can be performed on
them. Finite languages can be described by means of regular expressions.

Longest Match Rule

When the lexical analyzer read the source-code, it scans the code letter by letter; and when it
encounters a whitespace, operator symbol, or special symbols, it decides that a word is
completed.
For example:
int intvalue;
While scanning both lexemes till ‘int’, the lexical analyzer cannot determine whether it is a
keyword int or the initials of identifier int value.
The Longest Match Rule states that the lexeme scanned should be determined based on the
longest match among all the tokens available.
The lexical analyzer also follows rule priority where a reserved word, e.g., a keyword, of a
language is given priority over user input. That is, if the lexical analyzer finds a lexeme that
matches with any existing reserved word, it should generate an error.

1.7 Recognition of Tokens in Compiler Design

Recognition of Tokens
Tokens can be recognized by Finite Automata

A Finite automaton(FA) is a simple idealized machine used to recognize patterns within input taken
from some character set(or Alphabet) C. The job of FA is to accept or reject an input depending on
whether the pattern defined by the FA occurs in the input.

There are two notations for representing Finite Automata. They are
Transition Diagram
Transition Table

Transition diagram is a directed labeled graph in which it contains nodes and edges

Nodes represents the states and edges represents the transition of a state

Every transition diagram is only one initial state represented by an arrow mark (-->) and zero or
more final states are represented by double circle
Example:

Where state "1" is initial state and state 3 is final state.

Finite Automata for recognizing identifiers

Finite Automata for recognizing keywords

Finite Automata for recognizing numbers

Finite Automata for relational operators

Finite Automata for recognizing white spaces

1.8 Error Handling in Compiler Design

Types or Sources of Error – There are two types of error: run-time and compile-time error:

1. A run-time error is an error which takes place during the execution of a program, and
usually happens because of adverse system parameters or invalid input data. The lack of
sufficient memory to run an application or a memory conflict with another program and
logical error are example of this. Logic errors, occur when executed code does not produce
the expected result. Logic errors are best handled by meticulous program debugging.
2. Compile-time errors rises at compile time, before execution of the program. Syntax error or
missing file reference that prevents the program from successfully compiling is the example
of this.

Classification of Compile-time error –

1. Lexical : This includes misspellings of identifiers, keywords or operators

2. Syntactical : missing semicolon or unbalanced parenthesis
3. Semantical : incompatible value assignment or type mismatches between operator and
operand
4. Logical : code not reachable, infinite loop.

Finding error or reporting an error – Viable-prefix is the property of a parser which allows early
detection of syntax errors.

 Goal: detection of an error as soon as possible without further consuming unnecessary

input
 How: detect an error as soon as the prefix of the input does not match a prefix of any string
in the
language.
 Example: for(;), this will report an error as for have two semicolons inside braces.

Error Recovery –
The basic requirement for the compiler is to simply stop and issue a message, and cease
compilation. There are some common recovery methods that are follows.

1. Panic mode recovery: This is the easiest way of error-recovery and also, it prevents the
parser from developing infinite loops while recovering error. The parser discards the input
symbol one at a time until one of the designated (like end, semicolon) set of synchronizing
tokens (are typically the statement or expression terminators) is found. This is adequate
when the presence of multiple errors in same statement is rare. Example: Consider the
erroneous expression- (1 + + 2) + 3. Panic-mode recovery: Skip ahead to next integer and
then continue. Bison: use the special terminal error to describe how much input to skip.

E->int|E+E|(E)|error int|(error)

2. Phase level recovery: Perform local correction on the input to repair the error. But error
correction is difficult in this strategy.
3. Error productions: Some common errors are known to the compiler designers that may
occur in the code. Augmented grammars can also be used, as productions that generate
erroneous constructs when these errors are encountered. Example: write 5x instead of 5*x
4. Global correction: Its aim is to make as few changes as possible while converting an
incorrect input string to a valid string. This strategy is costly to implement.

Listening Practice Questions
No ratings yet
Listening Practice Questions
28 pages
Spos - Lab Manual
No ratings yet
Spos - Lab Manual
15 pages
STM Unit-2
No ratings yet
STM Unit-2
72 pages
CS3551 Distributed Computing Unit2
No ratings yet
CS3551 Distributed Computing Unit2
24 pages
Notes Applications of ICT
No ratings yet
Notes Applications of ICT
10 pages
Compiler Design: - Language Processor - Language Processing System - Phases of Compiler
No ratings yet
Compiler Design: - Language Processor - Language Processing System - Phases of Compiler
11 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
I) Bit Stuffing: 1. Write A Program For A HLDC Frame To Perform The Following
No ratings yet
I) Bit Stuffing: 1. Write A Program For A HLDC Frame To Perform The Following
5 pages
Chapter-1 Compiler Design
100% (1)
Chapter-1 Compiler Design
13 pages
3.1overview of Sa/Sd Methodology
No ratings yet
3.1overview of Sa/Sd Methodology
33 pages
ĐỀ NGHE SỐ 13A
No ratings yet
ĐỀ NGHE SỐ 13A
10 pages
What Are The Parts of The Research Paper
No ratings yet
What Are The Parts of The Research Paper
6 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Compiler Design Lecture Notes (10CS63) : D C S & E
No ratings yet
Compiler Design Lecture Notes (10CS63) : D C S & E
96 pages
Software Testing Methodologies Course Page: Paths, Path Products and Regular Expressions
No ratings yet
Software Testing Methodologies Course Page: Paths, Path Products and Regular Expressions
37 pages
Stm-Unit I-Path Testing
No ratings yet
Stm-Unit I-Path Testing
66 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Sppu CN Insem Solved Paper Aug 2018
No ratings yet
Sppu CN Insem Solved Paper Aug 2018
14 pages
Unit 5
No ratings yet
Unit 5
17 pages
Usability Engineering FinalTerm Paper Spring 2022 Solution 15082022 124653am
No ratings yet
Usability Engineering FinalTerm Paper Spring 2022 Solution 15082022 124653am
7 pages
UNIT-1: Compiler Design
No ratings yet
UNIT-1: Compiler Design
17 pages
A6V10316241 NK8237 Installation
No ratings yet
A6V10316241 NK8237 Installation
96 pages
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
50% (2)
CD - Sem 7 - GTU - Study Material - 15112016 - 100740AM PDF
100 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
101 pages
ROV Umbilical Winch 20210111 1S Rev 2 OM 4100 A3 4 180 190 FS NZ
No ratings yet
ROV Umbilical Winch 20210111 1S Rev 2 OM 4100 A3 4 180 190 FS NZ
7 pages
Unit 3 Path Testing
No ratings yet
Unit 3 Path Testing
2 pages
Compiler Unit 1
No ratings yet
Compiler Unit 1
110 pages
Compiler Design Unit 1 Notes
No ratings yet
Compiler Design Unit 1 Notes
21 pages
Unit 1 Spos Notes
No ratings yet
Unit 1 Spos Notes
23 pages
Compiler Design CS8602 Part A & Part B Answers
No ratings yet
Compiler Design CS8602 Part A & Part B Answers
149 pages
OS & SE Lab Manual by Chiru
No ratings yet
OS & SE Lab Manual by Chiru
120 pages
Presentations PPT Unit-1 27042019073920AM
100% (1)
Presentations PPT Unit-1 27042019073920AM
42 pages
RTN 900 V100R019C00 Configuration Guide 01 PDF
No ratings yet
RTN 900 V100R019C00 Configuration Guide 01 PDF
1,883 pages
Module-2 Lexical Analyzer
No ratings yet
Module-2 Lexical Analyzer
36 pages
Ge B90 Gek-131050 PDF
No ratings yet
Ge B90 Gek-131050 PDF
522 pages
Is Unit 4
No ratings yet
Is Unit 4
97 pages
CD Unit 4
No ratings yet
CD Unit 4
41 pages
Processor and Memory Organization
No ratings yet
Processor and Memory Organization
17 pages
CD - 2 Marks Questions With Answers
No ratings yet
CD - 2 Marks Questions With Answers
21 pages
Lesson 1: Structure of A Compiler
No ratings yet
Lesson 1: Structure of A Compiler
20 pages
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
No ratings yet
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
2 pages
6CS5 DS Unit-4
No ratings yet
6CS5 DS Unit-4
64 pages
Gta Int
No ratings yet
Gta Int
44 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
TFT LCD Display Incubator Controller EGGHATCHER 02 V01 0516
No ratings yet
TFT LCD Display Incubator Controller EGGHATCHER 02 V01 0516
36 pages
2 Compiler Design Notes
No ratings yet
2 Compiler Design Notes
31 pages
Concurrent Process and Programming: Processs and Threads Processes
No ratings yet
Concurrent Process and Programming: Processs and Threads Processes
11 pages
Unit Ii
No ratings yet
Unit Ii
61 pages
Code Calculus
No ratings yet
Code Calculus
20 pages
Syntax Analysis: Chapter - 4
No ratings yet
Syntax Analysis: Chapter - 4
41 pages
Gaze-Based Trigger
No ratings yet
Gaze-Based Trigger
22 pages
Compiler Design
No ratings yet
Compiler Design
45 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
Compiler Design Study Material Unit 1
No ratings yet
Compiler Design Study Material Unit 1
26 pages
اكواد لغة سي.... جاهز نماذج اختبارات...
No ratings yet
اكواد لغة سي.... جاهز نماذج اختبارات...
9 pages
Mazak 640 Series How To Restore NC Data
100% (3)
Mazak 640 Series How To Restore NC Data
4 pages
SPCC Viva
No ratings yet
SPCC Viva
11 pages
Unit I
No ratings yet
Unit I
23 pages
Loop Optimization
No ratings yet
Loop Optimization
15 pages
CS8591-Computer Networks Department of CSE 2020-2021
No ratings yet
CS8591-Computer Networks Department of CSE 2020-2021
24 pages
SailPoint IdentityIQ Learning Path 1733317843
No ratings yet
SailPoint IdentityIQ Learning Path 1733317843
16 pages
Drager Perseus A500 DR Ger Brochure
No ratings yet
Drager Perseus A500 DR Ger Brochure
8 pages
NN Models & Architecture of NN: CSE-4619 Machine Learning
No ratings yet
NN Models & Architecture of NN: CSE-4619 Machine Learning
30 pages
Module 3 Notes
No ratings yet
Module 3 Notes
27 pages
Module 5.4 LOGIC
No ratings yet
Module 5.4 LOGIC
11 pages
Power Management Systems - Predictive Maintenance & Energy Sourcing Opportunities
No ratings yet
Power Management Systems - Predictive Maintenance & Energy Sourcing Opportunities
8 pages
Distributed Systems Unit 4
No ratings yet
Distributed Systems Unit 4
26 pages
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
No ratings yet
Automotive Industry Predictive Maintenance - Case Studies - Infinite Uptime
20 pages
Cs3591-Unit 5
No ratings yet
Cs3591-Unit 5
27 pages
Experiment No: 3 Name of The Exp.: Group Communication Aim: Write A Program To Demonstrate Client - Server Chat
No ratings yet
Experiment No: 3 Name of The Exp.: Group Communication Aim: Write A Program To Demonstrate Client - Server Chat
10 pages
Automatic Generation of CNC Codes Based On Machining Features
No ratings yet
Automatic Generation of CNC Codes Based On Machining Features
5 pages
Exercise 2
No ratings yet
Exercise 2
11 pages
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
No ratings yet
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
4 pages
Compiler Design Laboratory I
No ratings yet
Compiler Design Laboratory I
6 pages
BScCSIT Transaction DBMS
No ratings yet
BScCSIT Transaction DBMS
30 pages
BCAC602 - Lession Plan
No ratings yet
BCAC602 - Lession Plan
2 pages
Audit Example 2
No ratings yet
Audit Example 2
1 page
Sequence The Activities
No ratings yet
Sequence The Activities
1 page
Qa - CD Unit-3
No ratings yet
Qa - CD Unit-3
8 pages
Hall Ticket
No ratings yet
Hall Ticket
1 page
MC7402-Object Oriented Analysis and Design
No ratings yet
MC7402-Object Oriented Analysis and Design
10 pages
CS8602 CD
No ratings yet
CS8602 CD
2 pages
Testing Conventional Applications
No ratings yet
Testing Conventional Applications
122 pages
DP Go 3
No ratings yet
DP Go 3
2 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
CD Course File
No ratings yet
CD Course File
114 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Nigerian Air Force
No ratings yet
Nigerian Air Force
1 page
We Speak: Translation and Desktop Publishing
No ratings yet
We Speak: Translation and Desktop Publishing
4 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet