Unit 1

This document provides an overview of compiler design and the various phases involved in compiling a program from source code to executable code. It discusses the structure of a compiler as having two main parts - analysis and synthesis. The analysis part includes lexical analysis, syntax analysis, and semantic analysis. Lexical analysis involves reading the source code and generating tokens. Syntax analysis uses these tokens to build a syntax tree. Semantic analysis performs type checking. The synthesis part generates intermediate code and target code. Lexical analysis is the first phase and involves grouping characters into meaningful tokens. Input buffering and token specification are also discussed.

Uploaded by

gayathri

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Unit 1

Uploaded by

gayathri

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

Compiler Design

R.Rajakumari
Assistant Professor(Sr. Gr.)
Department of Computer Science and Engineering
National Engineering College- Kovilpatti
Overview
• Structure of a Compiler
• Lexical Analysis
• Role of Lexical Analysis
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
• Lex
Introduction
• Programming Languages
• Compilers
• Reads a program in one language(source language) and translates into equivalent
program in another language(target language)
• Reports any error in the source program
• Interpreter
• Directly executes the operations specified in the source program on inputs supplied
by the user
• Line by line execution
• Compiler is faster than a interpreter
• Interpreter is better at error diagnostics
Contd…
• Assembler
• Assembly language to relocatable machine code
• Linker
• Links relocatable object files and library files
• Loader
• Puts all the executable object files into memory for execution
Structure of a Compiler
• Single box that maps the source program into a semantically
equivalent target program
• Two parts - Analysis and Synthesis
• Analysis
• Breaks the source program into constituent pieces
• Imposes grammatical structure
• Intermediate representation of the source program
• If source program is syntactically ill formed or semantically unsound give
informative messages for corrective action
• Collects information and stores in a data structure called Symbol Table
Contd…
• Synthesis
• Constructs the desired target program from the intermediate representation
and the information in the symbol table
• Analysis part - Front End
• Synthesis part - Back End
Phases of a Compiler
• Lexical Analysis
• Syntax Analysis
• Semantic Analysis
• Intermediate Code Generation
• Machine Independent Code Optimization
• Code Generation
• Machine Dependent Code Optimization
Lexical Analysis
• First phase
• Lexical Analysis or Scanning
• Reads the stream of characters and groups the characters into
meaningful sequences called lexemes
• Outputs <token_name, attribute_value>
• Token_name is an abstract symbol used during syntax analysis
• Attribute_value points to entry in symbol table for this token
• Symbol Table entry is needed for semantic analysis and code
generator
Example
• position = initial + rate * 60
• position is a lexeme
• Token <id,1>
• Assignment Symbol = is a lexeme
• Token <=>
• initial is a lexeme
• Token < id,2>
• Addition operator + is a lexeme
• Token <+>
Example
• rate is a lexeme
• Token <id,3>
• Multiplication operator * is a lexeme
• Token <*>
• 60 is a lexeme
• Token <60>
• <id,1> <=> <id,2> <+> <id,3> <*> <60>
• Blank separating the lexemes would be discarded by the lexical
analyzer
Syntax Analysis
• Second Phase
• Syntax Analysis or Parsing
• Tokens are used to produce a tree-like intermediate representation
• Syntax Tree
• Interior node represent operation
• Children of the node represent the arguments of the operation
• Tree shows the order in which the operations are executed
• Context Free Grammar is used to specify the grammatical structure of
the programming language
Semantic Analysis
• Checks for semantic consistency
• Gathers type information
• Performs type checking, compiler checks whether each operator has
matching operands
• Example:
• Integer as array index
• Coercion – Type conversion
• Example:
• inttofloat
Intermediate Code Generation
• Intermediate representation
• Syntax trees are a form of intermediate representation, used during
syntax and semantic analysis
• After syntax analysis and semantic analysis, compilers generate an
explicit low-level or machine-like intermediate representation
• Two properties of intermediate representation
• Easy to produce
• Easy to translate into the target machine
Three-address Code
• An intermediate form
• Sequence of assembly-like instructions
• Three operands per instruction
• Each operand can act like a register
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Contd…
• At most one operator in the right side
• Fix the order in which operations are to be done
• Temporary name is generated to hold the value computed by a three-
address instruction
• Some three-address instruction has fewer than three operands
Code Optimization
• Machine-independent code-optimization improves the intermediate code
• Results in better target code
• Better means faster, shorter code which consumes less power
• Optimizer can deduce that conversion of 60 from integer to floating point
can be done once
• inttofloat operation can be eliminated by replacing the integer 60 by the
floating point number 60.0
• Moreover t3 is used to transmit its value to id1
t1 = id3 * 60.0
id1 = id2 + t1
Code Generation
• Input is the intermediate code and output is the target code
• Registers or memory location are selected for each of the variables
• Intermediate representation are translated into sequences of machine
instruction that perform the same task
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
Contd…
• First operand of each instruction specifies a destination
• F tells that the instruction deals with floating point numbers
• Storage allocation decisions are made either during intermediate
code generation or during code generation
Symbol Table Management
• Compiler records the variable names used in the source program and
collects information about various attributes.
• Provide information about the storage allocated for a name, its type,
its scope
• Procedure names
• Number and type of its arguments, method of passing each argument, type
returned

• Data structure containing a record for each variable name, with fields
for the attributes of the name
Grouping of Phases into Passes
• Several activities can be grouped together into a pass
• Front end phases such as lexical analysis, syntax analysis, semantic
analysis and intermediate code generation can be grouped together into
one pass
• Code optimization may be an optional pass
• Back end pass consists of code generation for a particular target machine
• Combine different front end with back end of a particular target machine
• Combine front end with back ends for different target machines
Compiler Construction Tools
• Scanner Generator
• Parser Generator
• Syntax-directed translation engines
• Code-generator generators
• Data-flow analysis engines
• Compiler-construction toolkits
Lexical Analysis
• Diagram or description for lexemes of each token
• Code to identify each occurrence of each lexeme on the input
• Return information about the token identified
• Role of lexical analyser
• Read the input characters of the source program, group them into lexemes
and produce a sequence of tokens as output
• Stream of tokens is sent to the parser
• Identifier – Enter the lexeme into the symbol table
• Information regarding the kind of identifier may be read from the symbol
table
Contd…
• Parser call the lexical analyser
• getNextToken
• Token returned to parser
• Strips out comments and whitespaces
• Correlate error messages with source
• Two process
• Scanning
• Lexical analysis
Lexical Analysis Vs Parsing
• Simplicity of Design
• Compiler efficiency is improved
• Compiler portability is enhanced
Tokens, Patterns and Lexemes
• Token – Pair containing a token name and an optional attribute value
• Token name is an abstract symbol representing a kind of lexical unit
• Pattern is a description of the form that the lexemes of the token take
• Lexeme is a sequence of characters that match the pattern of a token
Contd…
• One token for each keyword
• Tokens for the operators
• One token representing all the identifiers
• One or more tokens representing the constants
• Tokens for each punctuation symbols
Example
• Find the number of tokens
1. main()
{ printf(“cd”);
\\ print the message
}
2. while (i > 0)
{ printf( i );
i++;
}
Contd…
int main()
{
int a = 10, b = 20;
printf(“sum is :%d”, a+b);
return 0;
}
Attributes for Tokens
• Attribute value describes the lexeme represented by the token
• Example:
• Token id – lexeme, its type, its location
• Pointer to the symbol table entry for that identifier
Lexical Errors
• fi ( a == f(x)) …
• Lexical analyser cannot tell whether fi is a misspelling or undeclared
function identifier
• fi is a valid lexeme for token id, hence returns token id to the parser
• Parser has to handle the error due to transposition of letters
• If none of the patterns match any prefix of the remaining input then
“panic mode” recovery
• Delete successive character from the remaining input, until a well-
formed token is at the beginning
Other error-recovery actions
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent letters

• Single transformation
• Smallest number of transformation
Input Buffering
• Look at one or more characters beyond the next lexeme
• Atleast look one additional character ahead
• Single character operators < , > , - , = could be the beginning of two
character operators <=, >=, ==, ->
• End of the identifier – see a character that is not a letter or a digit
• Two buffer scheme
• Sentinels
Buffer Pairs
• Two buffers alternatively loaded
• Each buffer is of size N
• N is size of disk block
• eof marks the end of the source file
• Two pointers
• lexemeBegin – beginning of the current lexeme
• forward – scans ahead until a pattern match is found
Sentinels
• Each time when advancing forward, check that not moved off one of
the buffers, if so reload the other buffer
• For each character ahead two test are performed
• One for the end of the buffer
• One to determine what character is read
• Each buffer holds a sentinel character at the end
• Sentinel is a special character – eof
Specification of Tokens
• Regular expression
• Alphabet – a finite set of symbols
• String over an alphabet – finite sequence of symbols drawn from that
alphabet
• Length of a string s, denoted as |s| - number of occurrences of
symbols in s
• Empty string, ɛ
• Language is any countable set of strings over some fixed alphabet
Operation on Languages
• Union
• Concatenation
• Closure
Regular Expressions
• Describes all the languages that can be built from these operators
applied to symbols of some alphabet
• C identifiers are described as letter_(letter_|digit)*
• letter_ - any letter or underscore
• digit – for any digit
• | means union
• * means “zero or more occurrences of”
• Each regular expression r denotes a language L(r)
Contd…
• Basis:
• ɛ is a regular expression, a is a regular expression
• Induction:
• r and s are regular expression
• (r) | (s) is a regular expression denoting the language L(r) U L(s)
• (r) (s) is a regular expression denoting the language L(r) L(s)
• (r)* is a regular expression denoting (L(r))*
• (r) is a regular expression denoting L(r)
Precedence and Associativity
• Parentheses can be avoided if conventions are followed
• Unary operator * has highest precedence and left associative
• Concatenation has second highest precedence and left associative
• | has lowest precedence and left associative
• (a) | ((b)* (c))
• a | b* c
Examples
• ∑ = {a , b}
• Regular expression a|b denotes the language {a , b}
• (a| b) (a| b) denotes {aa, ab, ba, bb}
• a* denotes {ɛ, a, aa, aaa …..}
• (a | b)* denotes {ɛ, a, b, ab, ba, aa, bb, aaa, ….}
• a | a*b denotes {a, b, ab, aab, aaab, ….}

• Language defined by a regular expression is called regular set

Algebraic Laws for Regular
Expressions
Regular Definitions
• Regular definition is a sequence of definitions of the form
• d1 -> r1 , d2 -> r2 ….. dn -> rn
• Each di is a new symbol not in ∑ and unique
• Each ri is a regular expression over ∑ U {d1, d2, …. di-1}
• Regular definition of C language identifiers
Extensions of Regular Expressions
• One or more instances – Unary postfix operator + represents the
positive closure of a regular expression
• Zero or one instance – Unary postfix operator ?
• Character classes – A regular expression a1|a2|…|an for each symbol
of the alphabet, replace by the shorthand [a1, a2, …. an], [a1-an]
• Definition of identifiers
Recognition of Tokens
• Examine the input string and find a prefix that is a lexeme matching
one of the pattern
Contd…
• Strip whitespace by recognizing token ws defined by
• Blank, tab, newline are abstract symbols
• Expresses the ASCII characters
• Token ws is not returned to the parser
Tokens, Patterns and Attribute
Transition Diagram
• Intermediate step in the construction of lexical analyser
• Convert pattern into stylized flowchart called transition diagram
• Collection of nodes or circles called States
• Edges are directed from one state to another
• Edge is labelled by a symbol or set of symbols
• All transition diagrams are deterministic
• Accepting or final states indicated by double circle
• One state designated as start state or initial state
Thank You

Ip Project
0% (1)
Ip Project
41 pages
CD Unit 1
No ratings yet
CD Unit 1
42 pages
Lec 2
No ratings yet
Lec 2
21 pages
L2 Compiler Phases
No ratings yet
L2 Compiler Phases
29 pages
Compilation Phases
No ratings yet
Compilation Phases
20 pages
CT - Lecture 2
No ratings yet
CT - Lecture 2
23 pages
Compiler Construction Week 2
No ratings yet
Compiler Construction Week 2
29 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
16 pages
UNIT-1 CD
No ratings yet
UNIT-1 CD
21 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
role of a lexical AN
No ratings yet
role of a lexical AN
26 pages
Compiler Construction Iii B.E. - Vi Sem: Unit - I
No ratings yet
Compiler Construction Iii B.E. - Vi Sem: Unit - I
77 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
_CD -unit 1
No ratings yet
_CD -unit 1
46 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
cd UNIT-1
No ratings yet
cd UNIT-1
60 pages
What Is Translators
No ratings yet
What Is Translators
95 pages
Cousins of Compiler
100% (1)
Cousins of Compiler
25 pages
Day - 1 Intro To Compilers
No ratings yet
Day - 1 Intro To Compilers
53 pages
Lecture 3- Lexical Analysis (1)
No ratings yet
Lecture 3- Lexical Analysis (1)
42 pages
Unit 1 CD
No ratings yet
Unit 1 CD
26 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
SCS13033
No ratings yet
SCS13033
121 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
Phase of A Compiler
No ratings yet
Phase of A Compiler
24 pages
Compiler Design
No ratings yet
Compiler Design
29 pages
Introduction
No ratings yet
Introduction
46 pages
CC_unit_1
No ratings yet
CC_unit_1
70 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
phases of compiler
No ratings yet
phases of compiler
36 pages
CH 02 - PL
No ratings yet
CH 02 - PL
92 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
Compier Design - Unit I
No ratings yet
Compier Design - Unit I
97 pages
Cat 1
No ratings yet
Cat 1
150 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
Compiler Notes
No ratings yet
Compiler Notes
66 pages
Module-1 1
No ratings yet
Module-1 1
53 pages
Compiler Construction
No ratings yet
Compiler Construction
39 pages
compiler design unit 1 srm 21 regulation
No ratings yet
compiler design unit 1 srm 21 regulation
193 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Introduction To Compilers Complier: Ompiler Source Program Target Program Error Message
No ratings yet
Introduction To Compilers Complier: Ompiler Source Program Target Program Error Message
23 pages
Introduction To Compilers
No ratings yet
Introduction To Compilers
14 pages
Unit 1
No ratings yet
Unit 1
37 pages
AK CD CSE 305 ASSIGNMENT 1
No ratings yet
AK CD CSE 305 ASSIGNMENT 1
15 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
17 pages
CH 1
No ratings yet
CH 1
23 pages
Lec#1
No ratings yet
Lec#1
36 pages
Lexical Analysis
No ratings yet
Lexical Analysis
9 pages
Lecture 1 - Ch1. Introduction To Compiler
No ratings yet
Lecture 1 - Ch1. Introduction To Compiler
29 pages
Assembler Linker Loader
No ratings yet
Assembler Linker Loader
64 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
Animesh Mani (CD)
No ratings yet
Animesh Mani (CD)
15 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
66fe65b5746f9CCWeek-02Lecture03
No ratings yet
66fe65b5746f9CCWeek-02Lecture03
47 pages
Lexical
No ratings yet
Lexical
34 pages
Compiler Construction
No ratings yet
Compiler Construction
63 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
What Is Translators - Different Type of Translators
No ratings yet
What Is Translators - Different Type of Translators
1 page
CS3251 Programming in C - Notes
No ratings yet
CS3251 Programming in C - Notes
125 pages
Maple Model(s) PLC or Controller
No ratings yet
Maple Model(s) PLC or Controller
4 pages
Programming in C BCA 1st Sem
No ratings yet
Programming in C BCA 1st Sem
62 pages
Java Notes
No ratings yet
Java Notes
388 pages
Compiler - Introduction
No ratings yet
Compiler - Introduction
23 pages
PParts For Developers Created 11 - 2015 BVO
No ratings yet
PParts For Developers Created 11 - 2015 BVO
17 pages
Unit 1 Core Java PDF
No ratings yet
Unit 1 Core Java PDF
270 pages
Python Notes
No ratings yet
Python Notes
249 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
52 pages
Java Mail Server
No ratings yet
Java Mail Server
62 pages
Statistical Analysis Overview
No ratings yet
Statistical Analysis Overview
9 pages
(Ebook) Data Structures & Algorithms in Python by John Canning, Alan Broder, Robert Lafore ISBN 9780134855912, 0134855914 All Chapters Instant Download
100% (6)
(Ebook) Data Structures & Algorithms in Python by John Canning, Alan Broder, Robert Lafore ISBN 9780134855912, 0134855914 All Chapters Instant Download
81 pages
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
No ratings yet
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
44 pages
Python Developer's Guide Documentation: Brett Cannon
No ratings yet
Python Developer's Guide Documentation: Brett Cannon
169 pages
Principles of Programming-2016
100% (1)
Principles of Programming-2016
90 pages
Chapter 3-JAVA GUI Programming-Reveiw Final
No ratings yet
Chapter 3-JAVA GUI Programming-Reveiw Final
88 pages
IPRG5111_Lecture 1S_2024
No ratings yet
IPRG5111_Lecture 1S_2024
73 pages
Extending and Embedding The Python Interpreter: Release 2.5
No ratings yet
Extending and Embedding The Python Interpreter: Release 2.5
88 pages
Integ Reviewer
No ratings yet
Integ Reviewer
68 pages
C by Pavan
No ratings yet
C by Pavan
34 pages
Compiler Design Short Notes
No ratings yet
Compiler Design Short Notes
133 pages
Chapter 1: A Brief History of C
No ratings yet
Chapter 1: A Brief History of C
20 pages
Module - I: Introduction To Compiling: 1.1 Introduction of Language Processing System
No ratings yet
Module - I: Introduction To Compiling: 1.1 Introduction of Language Processing System
7 pages
TCS
No ratings yet
TCS
2 pages
GNS221 E-Exam Question1000
No ratings yet
GNS221 E-Exam Question1000
49 pages
gt1 Test 10
No ratings yet
gt1 Test 10
2 pages
CC102 Week-4 Lesson-Introduction To Java Programming
No ratings yet
CC102 Week-4 Lesson-Introduction To Java Programming
28 pages
Interview Questions On Programming Basics
No ratings yet
Interview Questions On Programming Basics
15 pages