Principles of Compiler Design
(PoCD)
Course Code
19ECSC203
Credits: 3-1-0
2/22/2025 School of Computer Science and Engineering
NEED
The foundations of computer science can be partitioned into two sub-
disciplines:
1. Theory of Computation: It aims at understanding the nature of
computation, and specifically the inherent possibilities and
limitations of efficient computations.
2. Theory of Programming: It is concerned with the actual task
of implementing computations (i.e., writing computer programs).
2/22/2025 School of Computer Science and Engineering
Role of Compilers
High-level programming languages Low-level machine details
Increases programmer productivity Instruction selection
Better maintenance Addressing modes
Portable Pipeline
Registers & cache
Instruction level parallelism
Compilers are needed to efficiently bridge the gap!
2/22/2025 School of Computer Science and Engineering
Prerequisites and Course Outcomes
Prerequisites
1. High-level programming languages (typically C language)
2. Data structures course.
Course Outcomes (COs)
1. Illustrate the components of a language processing system and automata concepts in the
phases of compiler design.
2. Construct an appropriate finite automaton for a given formal language.
3. Design a context free grammar and suitable parser for a given context-free language.
4. Perform the semantic analysis and build the intermediate code for a given programming
construct.
5. Implement algorithms to demonstrate lexical and syntactical phases of a compiler.
2/22/2025 School of Computer Science and Engineering
Course Materials
Text Books
1. Alfred V Aho, Monica S. Lam, Ravi Sethi, Jeffrey D Ullman, Compilers - Principles, Techniques and
Tools, Updated 2nd Edition, Pearson, 2023.
2. Kenneth C Louden: Compiler Construction Principles & Practice, Cengage Learning, 1997.
References
1. Andrew W Apple, Modern Compiler Implementation in C, Cambridge University Press, 1999.
2. Charles N. Fischer, Richard J. Le Blanc, Jr, Crafting a Compiler with C, Pearson, 2011.
3. Peter Linz, An Introduction to formal languages and Automata, IV edition, Narosa, 2016.
4. Basavaraj S Anami, Karibasappa K.G, Formal Languages and Automata Theory, First, Wiley India,
2011.
LMS, PPT, NOTES
2/22/2025 School of Computer Science and Engineering
Course Evaluation
ISA-I and Tutorials
Assessment Weightage in Marks ISA-II
In-Semester Assessment - I 15
In-Semester Assessment - II 15 30 20
Tutorial :
Problem solving assignment 10
50
Programming assignment: 10
LEX and YACC programs, Total
Building scanner and parser for a
given HLL.
Total 50
2/22/2025 School of Computer Science and Engineering
What is a compiler?
A program that translates a source program written in one language into another
language called target program.
The compiler typically lowers the level of abstraction of the program.
The program produced by the compiler will be better than the original program.
Source Target
Compiler
program program
MOV Id3, R2
a=b+c*2 MUL #2,R2
error MOV Id2, R1
messages ADD R2, R1
MOV R1, Id1
2/22/2025 School of Computer Science and Engineering
Compilers and Interpreters
“Compilation”
Translation of a program written in a “Interpretation”
source language into a semantically Performing the operations implied by the
equivalent program written in a target source program
language
Input
Source
Program
Source Target Interpreter Output
Compiler
Program Program Input
Error messages
Error messages Output
2/22/2025 School of Computer Science and Engineering
Language Processing System
•Any computer system is made of hardware and software.
•The hardware understands a language, which humans cannot easily understand.
•So we write programs in high-level language, which is easier for us to
understand and remember.
•These programs are then fed into a series of tools(IDE) and OS components to
get the desired code that can be used by the machine.
•This is known as Language Processing System.
2/22/2025 School of Computer Science and Engineering
Language Processing System
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program Try for example:
Assembler gcc -v myprog.c
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Relocatable Linked Object Code
Loader
Absolute Machine Code
2/22/2025 School of Computer Science and Engineering
The Phases of a Compiler
Source code
Scanner
Tokens
Literal
Parser
Table
Syntax Tree
Semantics Analyzer
Symbol
Annotated Tree Table
Source Code Optimizer
Intermediate code
Error
Code Generator Handler
Target code
Target Code Optimizer
Target code
2/22/2025 School of Computer Science and Engineering
Course Overview
2/22/2025 School of Computer Science and Engineering
UNIT-wise distribution
UNIT Chapters Chapter Name Number of
Number hours
I 1 Introduction to compilers 06
2 Finite Automata 06
3 Introduction to Syntax Analysis 04
II 4 Top Down Parsing 08
5 Bottom up Parsing 08
III 6 Semantic Analysis 04
7 Intermediate Code Generation 04
2/22/2025 School of Computer Science and Engineering
Chapter 01: Introduction to compilers
1. Brief History of Compilers
2. Translation process
3. Major data structures in Compilers
4. Chomsky hierarchy
5. Lexical analysis: Scanning process
6. Regular expressions for tokens
7. Lexical errors
8. Applications of Regular expressions
2/22/2025 School of Computer Science and Engineering
Chapter 01: Introduction to compilers
2/22/2025 School of Computer Science and Engineering
Chapter 01: Introduction to compilers
2/22/2025 School of Computer Science and Engineering
Chapter 01: Introduction to compilers
2/22/2025 School of Computer Science and Engineering
Chapter 02: Finite Automata
1. Introduction: Language, automata
2. From regular expressions to Deterministic Finite Automata (DFA)
3. Є-Nondeterministic Finite Automata (Є-NFA), NFA, DFA
4. DFA optimization
5. Finite automata as recognizer
6. Implementation of finite automata
2/22/2025 School of Computer Science and Engineering
Chapter 02: Finite Automata
2/22/2025 School of Computer Science and Engineering
Chapter 03: Introduction to Syntax Analysis
1. Introduction to Grammars
2. Context-Free Grammars (CFGs)
3. Ambiguity in Grammars and Languages
4. Role of parsing
2/22/2025 School of Computer Science and Engineering
Chapter 04: Top Down Parsing
1. Introduction
2. Left Recursion
3. Left factoring
4. LL (1) Parsing
5. FIRST and FOLLOW sets
6. Error recovery in Top Down Parsing
2/22/2025 School of Computer Science and Engineering
Chapter 05: Bottom up Parsing
1. Introduction
2. SLR (1) parsing
3. General LR (1) and LALR (1) Parsing
4. Error recovery in bottom up parsing
2/22/2025 School of Computer Science and Engineering
Chapter 04 and 05: Parsing
2/22/2025 School of Computer Science and Engineering
Chapter 06: Semantic Analysis
1. Attributes and Attributes grammars
2. Algorithm for attribute computation
3. Symbol table
4. Data types and Data checking
2/22/2025 School of Computer Science and Engineering
Chapter 07: Intermediate Code Generation
1. Intermediate Code and data structure for code generation
2. Code generation of data structure references
3. Code generation of control statements
2/22/2025 School of Computer Science and Engineering
First computer - Charles Babbage in 1822
2/22/2025 School of Computer Science and Engineering
Brief History of Compiler
The first compiler was developed between 1954 and 1957
-The FORTRAN language and its compiler by a team at IBM led by John Backus
-The structure of natural language was studied at about the same time by Noam Chomsky
2/22/2025 School of Computer Science and Engineering
Brief History of Compiler (Cont..)
The related theories and algorithms in the 1960s and 1970s
The classification of language: Chomsky hierarchy
The parsing problem was pursued:
-Context-free language, parsing algorithms
The symbolic methods for expressing the structure of the words of a
programming language:
- Finite automata, Regular expressions
Methods have been developed for generating efficient object code:
-Optimization techniques or code, improvement techniques
2/22/2025 School of Computer Science and Engineering
Brief History of Compiler (Cont..)
Programs were developed to automate the complier development for parsing
Parser generators,
such as Yacc by Steve Johnson in 1975 for the Unix system
Scanner generators,
such as Lex by Mike Lesk for Unix system about same time
2/22/2025 School of Computer Science and Engineering
Brief History of Compiler (Cont..)
Projects focused on automating the generation of other parts of a compiler
-Code generation was undertaken during the late 1970s and early 1980s
-Less success due to our less than perfect understanding of them
2/22/2025 School of Computer Science and Engineering
Brief History of Compiler (Cont..)
Recent advances in compiler design
More sophisticated algorithms for inferring and/or simplifying the information
contained in program,
such as the unification algorithm of Hindley-Milner type checking
Window-based Integrated Development Environment,
IDE, that includes editors, linkers, debuggers, and project managers.
However, the basic of compiler design have not changed much in the last 20
years.
2/22/2025 School of Computer Science and Engineering
The Analysis-Synthesis Model of Compilation
There are two parts to compilation:
Analysis determines the operations implied by the source program
which are recorded in a tree structure
Synthesis takes the tree structure and translates the operations therein
into the target program
2/22/2025 School of Computer Science and Engineering
The Grouping of Phases
Compiler front and back ends:
Front end: analysis (machine independent)
Back end: synthesis (machine dependent)
Compiler passes:
A collection of phases is done only once (single pass) or multiple times
(multi pass)
Single pass: usually requires everything to be defined before being used in
source program
Multi pass: compiler may have to keep entire program representation in
memory
2/22/2025 School of Computer Science and Engineering
Compiler structure
IR
source target
Front End Back End
code code
2/22/2025 School of Computer Science and Engineering
The Phases of a Compiler – Translation Process
Source code
Scanner
Tokens
Literal
Parser
Table
Syntax Tree
Semantics Analyzer
Symbol
Annotated Tree Table
Source Code Optimizer
Intermediate code
Error
Code Generator Handler
Target code
Target Code Optimizer
Target code
2/22/2025 School of Computer Science and Engineering
The Scanner
• Lexical analysis: it collects sequences of characters into meaningful units called
tokens
• An example: a[index]=4+2
• a identifier
• [ left bracket
• index identifier
• ] right bracket
• = assignment
• 4 number
• + plus sign
• 2 number
2/22/2025 School of Computer Science and Engineering
The Scanner
2/22/2025 School of Computer Science and Engineering
The Parser
• Syntax analysis: It determines the structure of the program
• The results of syntax analysis is a parse tree or a syntax tree
• An example: a[index]=4+2
• Parse tree or Syntax tree ( abstract syntax tree)
2/22/2025 School of Computer Science and Engineering
The Parse Tree for a[index]=4+2
expression
Assign-expression
expression = expression
subscript-expression additive-expression
expression [ expression ] expression + expression
identifier identifier number number
a index 4 2
2/22/2025 School of Computer Science and Engineering
Abstract Syntax Tree for a[index]=4+2
Assign-expression
subscript-expression additive-expression
identifier identifier number number
a index 4 2
2/22/2025 School of Computer Science and Engineering
The Semantic Analyzer
• The semantics of a program are its “meaning”, as opposed to its syntax, or
structure, that
• determines some of its running time behaviors prior to execution.
• Static semantics: declarations and type checking
• Attributes: The extra pieces of information computed by semantic analyzer
• An example: a[index]=4=2
• The syntax tree annotated with attributes
2/22/2025 School of Computer Science and Engineering
The Annotated Syntax Tree
Assign-expression
subscript-expression additive-expression
integer integer
identifier identifier number number
a index 4 2
array of integer integer integer integer
2/22/2025 School of Computer Science and Engineering
The Source Code Optimizer
• The earliest point of most optimization steps is just after semantic
analysis
• The code improvement depends only on the source code, and as a
separate phase
• Individual compilers exhibit a wide variation in optimization kinds as
well as placement
• An example: a[index]=4+2
• Constant folding performed directly on annotated tree
• Using intermediate code: three-address code, p-code
2/22/2025 School of Computer Science and Engineering
Optimizations on Annotated Tree
Assign-expression
subscript-expression additive-expression
integer integer
identifier identifier number number
a index 4 2
array of integer integer integer integer
2/22/2025 School of Computer Science and Engineering
Optimizations on Annotated Tree
Assign-expression
subscript-expression
integer
identifier identifier number
a index 6
array of integer integer integer
2/22/2025 School of Computer Science and Engineering
Optimization on Intermediate Code
t = 4 + 2
a[index]=t
t= 6
a[index]=t
a[index]=6
2/22/2025 School of Computer Science and Engineering
The Code Generate
• It takes the intermediate code or IR and generates code for target
machine
• The properties of the target machine become the major factor:
• Using instructions and representation of data
• An example: a[index]=4+2
• Code sequence in a hypothetical assembly language
2/22/2025 School of Computer Science and Engineering
A possible code sequence
MOV R0, index
MUL R0,2
a[index]=6 MOV R1,&a
ADD R1,R0
MOV *R1,6
2/22/2025 School of Computer Science and Engineering
The Target Code Optimizer
• It improves the target code generated by the code generator:
• Choosing of Address modes
• Instructions replacing
• As well as redundant eliminating
MOV R0, index
MUL R0,2 MOV R0, index
MOV R1,&a SHL R0
ADD R1,R0 MOV &a[R0],6
MOV *R1,6
2/22/2025 School of Computer Science and Engineering
Principle Data Structure for Communication among Phases
• TOKENS
• A scanner collects characters into a token, as a value of an enumerated data type for tokens
• May also preserve the string of characters or other derived information, such as name of
identifier, value of a number token
• A single global variable or an array of tokens
• THE SYNTAX TREE
• A standard pointer-based structure generated by parser
• Each node represents information collect by parser or later, which maybe dynamically
allocated or stored in symbol table
• The node requires different attributes depending on kind of language structure, which may
be represented as variable record.
2/22/2025 School of Computer Science and Engineering
Principle Data Structure for Communication among Phases
• THE SYMBOL TABLE
• Keeps information associated with identifiers: function, variable, constants, and data types
• Interacts with almost every phase of compiler
• Analysis phase enters values and synthesis phase uses the stored values
• Operations: Insertion, deletion and access
• Access operation need to be constant-time
• One or several hash tables are often used
• THE LITERAL TABLE
• Stores constants and strings, reducing size of program
• Quick insertion and lookup are essential
• No duplicates
2/22/2025 School of Computer Science and Engineering
Principle Data Structure for Communication among Phases
• INTERMEDIATE CODE
• Kept as an array of text string, a temporary text, or a linked list of structures, depending on
kind of intermediate code (e.g. three-address code and p-code)
• Optimization performed here
• Should be easy for reorganization
• TEMPORARY FILES
• Holds the results of intermediate steps during compiling
• Instead of complete program in memory, during compilation small chunks can be stored
on the fly
2/22/2025 School of Computer Science and Engineering
Compiler-Construction Tools
• Software development tools are available to implement one or
more compiler phases
• Scanner generators (LEX tool)
• Parser generators (YACC tool)
• Syntax-directed translation engines
• Automatic code generators
• Data-flow engines
2/22/2025 School of Computer Science and Engineering
Chomsky Hierarchy
2/22/2025 School of Computer Science and Engineering
Lexical analysis and Lexical errors
2/22/2025 School of Computer Science and Engineering