0% found this document useful (0 votes)
8 views56 pages

PoCD Chapter 01 PPT 2024-25

The document outlines the course 'Principles of Compiler Design' (PoCD), detailing its prerequisites, outcomes, and evaluation methods. It explains the role of compilers in bridging high-level programming languages with low-level machine details, and describes the phases of compilation including lexical analysis, syntax analysis, and code generation. Additionally, it provides an overview of course materials and a unit-wise distribution of topics covered in the course.

Uploaded by

Akshat Dodwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views56 pages

PoCD Chapter 01 PPT 2024-25

The document outlines the course 'Principles of Compiler Design' (PoCD), detailing its prerequisites, outcomes, and evaluation methods. It explains the role of compilers in bridging high-level programming languages with low-level machine details, and describes the phases of compilation including lexical analysis, syntax analysis, and code generation. Additionally, it provides an overview of course materials and a unit-wise distribution of topics covered in the course.

Uploaded by

Akshat Dodwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Principles of Compiler Design

(PoCD)

Course Code

19ECSC203

Credits: 3-1-0
2/22/2025 School of Computer Science and Engineering
NEED
The foundations of computer science can be partitioned into two sub-
disciplines:

1. Theory of Computation: It aims at understanding the nature of


computation, and specifically the inherent possibilities and
limitations of efficient computations.
2. Theory of Programming: It is concerned with the actual task
of implementing computations (i.e., writing computer programs).

2/22/2025 School of Computer Science and Engineering


Role of Compilers

High-level programming languages Low-level machine details


Increases programmer productivity Instruction selection
Better maintenance Addressing modes
Portable Pipeline
Registers & cache
Instruction level parallelism

Compilers are needed to efficiently bridge the gap!

2/22/2025 School of Computer Science and Engineering


Prerequisites and Course Outcomes
Prerequisites
1. High-level programming languages (typically C language)
2. Data structures course.
Course Outcomes (COs)
1. Illustrate the components of a language processing system and automata concepts in the
phases of compiler design.
2. Construct an appropriate finite automaton for a given formal language.
3. Design a context free grammar and suitable parser for a given context-free language.
4. Perform the semantic analysis and build the intermediate code for a given programming
construct.
5. Implement algorithms to demonstrate lexical and syntactical phases of a compiler.
2/22/2025 School of Computer Science and Engineering
Course Materials
Text Books
1. Alfred V Aho, Monica S. Lam, Ravi Sethi, Jeffrey D Ullman, Compilers - Principles, Techniques and
Tools, Updated 2nd Edition, Pearson, 2023.
2. Kenneth C Louden: Compiler Construction Principles & Practice, Cengage Learning, 1997.

References
1. Andrew W Apple, Modern Compiler Implementation in C, Cambridge University Press, 1999.
2. Charles N. Fischer, Richard J. Le Blanc, Jr, Crafting a Compiler with C, Pearson, 2011.
3. Peter Linz, An Introduction to formal languages and Automata, IV edition, Narosa, 2016.
4. Basavaraj S Anami, Karibasappa K.G, Formal Languages and Automata Theory, First, Wiley India,
2011.

LMS, PPT, NOTES

2/22/2025 School of Computer Science and Engineering


Course Evaluation
ISA-I and Tutorials
Assessment Weightage in Marks ISA-II
In-Semester Assessment - I 15
In-Semester Assessment - II 15 30 20
Tutorial :
Problem solving assignment 10
50
Programming assignment: 10
LEX and YACC programs, Total
Building scanner and parser for a
given HLL.
Total 50

2/22/2025 School of Computer Science and Engineering


What is a compiler?
A program that translates a source program written in one language into another
language called target program.

The compiler typically lowers the level of abstraction of the program.

The program produced by the compiler will be better than the original program.

Source Target
Compiler
program program
MOV Id3, R2
a=b+c*2 MUL #2,R2
error MOV Id2, R1
messages ADD R2, R1
MOV R1, Id1
2/22/2025 School of Computer Science and Engineering
Compilers and Interpreters
“Compilation”
Translation of a program written in a “Interpretation”
source language into a semantically Performing the operations implied by the
equivalent program written in a target source program
language
Input
Source
Program
Source Target Interpreter Output
Compiler
Program Program Input
Error messages
Error messages Output

2/22/2025 School of Computer Science and Engineering


Language Processing System
•Any computer system is made of hardware and software.

•The hardware understands a language, which humans cannot easily understand.

•So we write programs in high-level language, which is easier for us to


understand and remember.

•These programs are then fed into a series of tools(IDE) and OS components to
get the desired code that can be used by the machine.

•This is known as Language Processing System.

2/22/2025 School of Computer Science and Engineering


Language Processing System
Skeletal Source Program

Preprocessor
Source Program
Compiler
Target Assembly Program Try for example:
Assembler gcc -v myprog.c
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Relocatable Linked Object Code
Loader

Absolute Machine Code


2/22/2025 School of Computer Science and Engineering
The Phases of a Compiler
Source code
Scanner
Tokens
Literal
Parser
Table
Syntax Tree

Semantics Analyzer
Symbol
Annotated Tree Table
Source Code Optimizer
Intermediate code
Error
Code Generator Handler
Target code

Target Code Optimizer


Target code

2/22/2025 School of Computer Science and Engineering


Course Overview

2/22/2025 School of Computer Science and Engineering


UNIT-wise distribution

UNIT Chapters Chapter Name Number of


Number hours
I 1 Introduction to compilers 06
2 Finite Automata 06
3 Introduction to Syntax Analysis 04
II 4 Top Down Parsing 08
5 Bottom up Parsing 08
III 6 Semantic Analysis 04
7 Intermediate Code Generation 04

2/22/2025 School of Computer Science and Engineering


Chapter 01: Introduction to compilers
1. Brief History of Compilers
2. Translation process
3. Major data structures in Compilers
4. Chomsky hierarchy
5. Lexical analysis: Scanning process
6. Regular expressions for tokens
7. Lexical errors
8. Applications of Regular expressions

2/22/2025 School of Computer Science and Engineering


Chapter 01: Introduction to compilers

2/22/2025 School of Computer Science and Engineering


Chapter 01: Introduction to compilers

2/22/2025 School of Computer Science and Engineering


Chapter 01: Introduction to compilers

2/22/2025 School of Computer Science and Engineering


Chapter 02: Finite Automata

1. Introduction: Language, automata


2. From regular expressions to Deterministic Finite Automata (DFA)
3. Є-Nondeterministic Finite Automata (Є-NFA), NFA, DFA
4. DFA optimization
5. Finite automata as recognizer
6. Implementation of finite automata

2/22/2025 School of Computer Science and Engineering


Chapter 02: Finite Automata

2/22/2025 School of Computer Science and Engineering


Chapter 03: Introduction to Syntax Analysis

1. Introduction to Grammars
2. Context-Free Grammars (CFGs)
3. Ambiguity in Grammars and Languages
4. Role of parsing

2/22/2025 School of Computer Science and Engineering


Chapter 04: Top Down Parsing

1. Introduction
2. Left Recursion
3. Left factoring
4. LL (1) Parsing
5. FIRST and FOLLOW sets
6. Error recovery in Top Down Parsing

2/22/2025 School of Computer Science and Engineering


Chapter 05: Bottom up Parsing

1. Introduction
2. SLR (1) parsing
3. General LR (1) and LALR (1) Parsing
4. Error recovery in bottom up parsing

2/22/2025 School of Computer Science and Engineering


Chapter 04 and 05: Parsing

2/22/2025 School of Computer Science and Engineering


Chapter 06: Semantic Analysis

1. Attributes and Attributes grammars


2. Algorithm for attribute computation
3. Symbol table
4. Data types and Data checking

2/22/2025 School of Computer Science and Engineering


Chapter 07: Intermediate Code Generation

1. Intermediate Code and data structure for code generation


2. Code generation of data structure references
3. Code generation of control statements

2/22/2025 School of Computer Science and Engineering


First computer - Charles Babbage in 1822

2/22/2025 School of Computer Science and Engineering


Brief History of Compiler
The first compiler was developed between 1954 and 1957

-The FORTRAN language and its compiler by a team at IBM led by John Backus

-The structure of natural language was studied at about the same time by Noam Chomsky

2/22/2025 School of Computer Science and Engineering


Brief History of Compiler (Cont..)
The related theories and algorithms in the 1960s and 1970s
The classification of language: Chomsky hierarchy

The parsing problem was pursued:


-Context-free language, parsing algorithms

The symbolic methods for expressing the structure of the words of a


programming language:
- Finite automata, Regular expressions

Methods have been developed for generating efficient object code:


-Optimization techniques or code, improvement techniques

2/22/2025 School of Computer Science and Engineering


Brief History of Compiler (Cont..)
Programs were developed to automate the complier development for parsing

Parser generators,
such as Yacc by Steve Johnson in 1975 for the Unix system

Scanner generators,
such as Lex by Mike Lesk for Unix system about same time

2/22/2025 School of Computer Science and Engineering


Brief History of Compiler (Cont..)
Projects focused on automating the generation of other parts of a compiler

-Code generation was undertaken during the late 1970s and early 1980s

-Less success due to our less than perfect understanding of them

2/22/2025 School of Computer Science and Engineering


Brief History of Compiler (Cont..)
Recent advances in compiler design

More sophisticated algorithms for inferring and/or simplifying the information


contained in program,
such as the unification algorithm of Hindley-Milner type checking

Window-based Integrated Development Environment,


IDE, that includes editors, linkers, debuggers, and project managers.

However, the basic of compiler design have not changed much in the last 20
years.

2/22/2025 School of Computer Science and Engineering


The Analysis-Synthesis Model of Compilation
There are two parts to compilation:

Analysis determines the operations implied by the source program


which are recorded in a tree structure

Synthesis takes the tree structure and translates the operations therein
into the target program

2/22/2025 School of Computer Science and Engineering


The Grouping of Phases

Compiler front and back ends:


Front end: analysis (machine independent)
Back end: synthesis (machine dependent)

Compiler passes:
A collection of phases is done only once (single pass) or multiple times
(multi pass)
Single pass: usually requires everything to be defined before being used in
source program
Multi pass: compiler may have to keep entire program representation in
memory
2/22/2025 School of Computer Science and Engineering
Compiler structure

IR
source target
Front End Back End
code code

2/22/2025 School of Computer Science and Engineering


The Phases of a Compiler – Translation Process
Source code
Scanner
Tokens
Literal
Parser
Table
Syntax Tree

Semantics Analyzer
Symbol
Annotated Tree Table
Source Code Optimizer
Intermediate code
Error
Code Generator Handler
Target code

Target Code Optimizer


Target code

2/22/2025 School of Computer Science and Engineering


The Scanner

• Lexical analysis: it collects sequences of characters into meaningful units called


tokens
• An example: a[index]=4+2
• a identifier
• [ left bracket
• index identifier
• ] right bracket
• = assignment
• 4 number
• + plus sign
• 2 number

2/22/2025 School of Computer Science and Engineering


The Scanner

2/22/2025 School of Computer Science and Engineering


The Parser

• Syntax analysis: It determines the structure of the program

• The results of syntax analysis is a parse tree or a syntax tree

• An example: a[index]=4+2
• Parse tree or Syntax tree ( abstract syntax tree)

2/22/2025 School of Computer Science and Engineering


The Parse Tree for a[index]=4+2

expression

Assign-expression

expression = expression

subscript-expression additive-expression

expression [ expression ] expression + expression

identifier identifier number number


a index 4 2

2/22/2025 School of Computer Science and Engineering


Abstract Syntax Tree for a[index]=4+2

Assign-expression

subscript-expression additive-expression

identifier identifier number number


a index 4 2

2/22/2025 School of Computer Science and Engineering


The Semantic Analyzer

• The semantics of a program are its “meaning”, as opposed to its syntax, or


structure, that
• determines some of its running time behaviors prior to execution.

• Static semantics: declarations and type checking

• Attributes: The extra pieces of information computed by semantic analyzer

• An example: a[index]=4=2
• The syntax tree annotated with attributes

2/22/2025 School of Computer Science and Engineering


The Annotated Syntax Tree

Assign-expression

subscript-expression additive-expression
integer integer

identifier identifier number number


a index 4 2
array of integer integer integer integer

2/22/2025 School of Computer Science and Engineering


The Source Code Optimizer

• The earliest point of most optimization steps is just after semantic


analysis
• The code improvement depends only on the source code, and as a
separate phase
• Individual compilers exhibit a wide variation in optimization kinds as
well as placement

• An example: a[index]=4+2
• Constant folding performed directly on annotated tree
• Using intermediate code: three-address code, p-code

2/22/2025 School of Computer Science and Engineering


Optimizations on Annotated Tree

Assign-expression

subscript-expression additive-expression
integer integer

identifier identifier number number


a index 4 2
array of integer integer integer integer

2/22/2025 School of Computer Science and Engineering


Optimizations on Annotated Tree

Assign-expression

subscript-expression
integer

identifier identifier number


a index 6
array of integer integer integer

2/22/2025 School of Computer Science and Engineering


Optimization on Intermediate Code

t = 4 + 2
a[index]=t

t= 6
a[index]=t

a[index]=6

2/22/2025 School of Computer Science and Engineering


The Code Generate

• It takes the intermediate code or IR and generates code for target


machine

• The properties of the target machine become the major factor:


• Using instructions and representation of data

• An example: a[index]=4+2
• Code sequence in a hypothetical assembly language

2/22/2025 School of Computer Science and Engineering


A possible code sequence

MOV R0, index


MUL R0,2
a[index]=6 MOV R1,&a
ADD R1,R0
MOV *R1,6

2/22/2025 School of Computer Science and Engineering


The Target Code Optimizer

• It improves the target code generated by the code generator:


• Choosing of Address modes
• Instructions replacing
• As well as redundant eliminating

MOV R0, index


MUL R0,2 MOV R0, index
MOV R1,&a SHL R0
ADD R1,R0 MOV &a[R0],6
MOV *R1,6

2/22/2025 School of Computer Science and Engineering


Principle Data Structure for Communication among Phases

• TOKENS
• A scanner collects characters into a token, as a value of an enumerated data type for tokens
• May also preserve the string of characters or other derived information, such as name of
identifier, value of a number token
• A single global variable or an array of tokens

• THE SYNTAX TREE


• A standard pointer-based structure generated by parser
• Each node represents information collect by parser or later, which maybe dynamically
allocated or stored in symbol table
• The node requires different attributes depending on kind of language structure, which may
be represented as variable record.

2/22/2025 School of Computer Science and Engineering


Principle Data Structure for Communication among Phases

• THE SYMBOL TABLE


• Keeps information associated with identifiers: function, variable, constants, and data types
• Interacts with almost every phase of compiler
• Analysis phase enters values and synthesis phase uses the stored values
• Operations: Insertion, deletion and access
• Access operation need to be constant-time
• One or several hash tables are often used

• THE LITERAL TABLE


• Stores constants and strings, reducing size of program
• Quick insertion and lookup are essential
• No duplicates

2/22/2025 School of Computer Science and Engineering


Principle Data Structure for Communication among Phases

• INTERMEDIATE CODE
• Kept as an array of text string, a temporary text, or a linked list of structures, depending on
kind of intermediate code (e.g. three-address code and p-code)
• Optimization performed here
• Should be easy for reorganization

• TEMPORARY FILES
• Holds the results of intermediate steps during compiling
• Instead of complete program in memory, during compilation small chunks can be stored
on the fly

2/22/2025 School of Computer Science and Engineering


Compiler-Construction Tools

• Software development tools are available to implement one or


more compiler phases
• Scanner generators (LEX tool)
• Parser generators (YACC tool)
• Syntax-directed translation engines
• Automatic code generators
• Data-flow engines

2/22/2025 School of Computer Science and Engineering


Chomsky Hierarchy

2/22/2025 School of Computer Science and Engineering


Lexical analysis and Lexical errors

2/22/2025 School of Computer Science and Engineering

You might also like