compilers_intro_jan2025
compilers_intro_jan2025
SP 2025
Read the basic course information : Course Objectives, Course Outcomes, CO-PO-PSO mapping; syllabus, from the home page.
Course Objectives
This course enables the students to:
1. Understand the need of compiler in Computer Engineering
2. Provide a thorough understanding of design, working, and implementation of
programming languages
3. Trace the major concept areas of language translation and compiler design
4. Create an awareness of the functioning and complexity of modern compilers
Course Outcomes
After the completion of this course, students will be:
1. Analyze the need of compiler for interfacing between user and machine
2. Explain the role of several phases of compilation process
3. Create an awareness of the function and complexity of modern compilers
4. Outline the major concept areas of languages translation and Compiler design
5. Develop a comprehensive Compiler for a given language
6. Apply knowledge for developing tool for natural language processing
Course Conduct
4 contact hrs / week : 3 lectures and 1 tutorial
Prescribed Evaluation scheme will be followed
Prepare for fill in the blanks type of questions in quizzes
Attendance norms of the institute apply – do not approach
the instructors for relaxation in attendance
Direct positive correlation observed in FLAT between active
presence and skill enhancement / better performace
Course and the associated laboratory course CS 334 will
be synchronized to reinforce the concepts and skills.
Distinctive Features of a Compilers Course
Unique course in CSE discipline. It bridges deep theoretical concepts to the most
recent advances in architectures and operating systems
Awareness of compiler features helps in increased programming productivity
(invest time and effort to know the capabilities of your compiler)
Two drivers for Research in compilers - design of new programming languages
and invention of new architectures.
Research has produced spectacular enhancements in technology. Designing a
working compiler 3 decades back required several human years.
Today, given m different languages and n different architectures, m*n compilers
can be generated in hours / days.
Research in compilers continues, specially for automatic optimization, automatic
parallelization of sequential programs.
The hupe leaps in hardware technology have not been efficiently expolited till date
because software tools have not been able to catch up with the native
computational powers of comtemporary High Performance Architectures.
Complexity of Software Systems
The following table gives an idea about the size and complexity of large
software systems from 3 highly popular domains.
The difference in the level of abstractions used by HLL and LLL are huge.
HLLs focus on application domain computations while LLLs depend on
the target architecture.
The translation process used in a compiler is non-trivial. A compiler slowly
brings down the HLL computations to instructions supprted by an
architecture.
Another view of a Compiler
Source program compiler Target Code
Front Back
End End
Intermediate Code
The front-end of a compiler performs HLL specific analyses and
produces semantically equivalent code (intermediate code)
whose abstraction level is significantly lower than HLL but still
higher than LLL.
The back-end of a compiler analyses the intermediate code
produced by the front-end and is responsible for generating
target code.
Compiler Phase and Pass
Phase and pass are two commonly used terms in compiler
design.
The process of compilation is carried out in distinct logical
steps, which are called phases.
A compiler typically has 7 phases : lexical analysis, syntax
analysis, semantic analysis, intermediate code generation,
run time environment, code optimization and code
generation
A pass denotes a processing of the entire source code or its
equivalent form. Industrial quality compilers use large
number of passes in their translation process.
Compiler and Support Software
A compiler is not an end to end software. It uses several
system software tools to accomplish its goal.
Source program
Preprocessed assembly
code code
pre-processor compiler assembler
COMPILERS : EXAMPLE DRIVEN
APPROACH
Simple C program
#include <stdio.h>
int main()
{
int a[1000], i, j;
int sum = 10000;
for (i = 0; i < 1000; i++) a[i] = i;
for (i = 0; i < 1000; i++)
for (j = 0; j < i*i; j++)
a[i]
= a[i] + a[j];
printf(" sum : %d \n", sum);
}
C program – Manual Analysis
Let us manually analyse “firstprog.c”
What is the role played by the statement ? #include <stdio.h>
Someone has to supply the prototype of the i/o function printf()
in order to check whether the call is valid
Let us assume there exists a s/w program that will do this task.
In real world this is task of a tool called C pre-processor, named
as “cpp” which is called by C compiler when it encounters
“#include... “ and similar other statements.
For the manual processing we shall ignore this statement for
the present.
C program – Manual Analysis
The remaining statements in “firstprog.c”
The pretty indented display of the text in the earlier slide was due to an
editor which interpreted certain characters before displaying.
The contents of the raw text in the file is shown below, where the symbol,
↲ represents newline and↔ denotes a single white space. This is how a
compiler sees the program at the first instance.
↲int↔main()↲{↲int↔↔a[1000],↔i,↔j;↲↔↔int
↔sum↔=↔10000;↲↔↔for↔(i↔=↔0;↔i↔<↔1
000;↔i++)↔a[i]↔=↔i;↲ ...............................
The first task is then to break the stream of characters shown above into
meaningful units of language C.
Basic Elements of C program
Breaking the program string into meaningful words of the language give the
smallest logical units of the language.
Why is “main” a word and not “main()” is decided by the programming
language (PL) specifications and not by a compiler.
The elements in the red boxes are called tokens (lexemes) and a compiler first
converts the character stream into a sequence of tokens.
The immediate next task is to determine if the consecutive tokens, when
grouped together, describe some structure of the PL.
A PL defines the linguistic structures that constitute a program in the language.
For example, a C program is a collection of functions and a function has a
hierarchical structure as shown below.
The structure of func-defn is defined using sub-structures, ret-type, func-name,
arguments and body, as shown below.
func-defn
Structure of a Program
Let us intuitively apply the ideas to the C program to discover its structure.
The input “int main() {“
partially matches the program structure as displayed in the tree below.
func-defn
ret-type func-name ( arguments ) { body }
int main ε
Structure of a Program
The structure of a function body is further elaborated into a block-statement
which in turn is specified by one or more declaration statements followed by a
list of statements.
func-defn
decl-stmts stmt-list
Structure of a Program
The structure of a single declaration statement is used to further extend the
tree under decl-stmts – which comprises of a type specifier followed by a list of
variable declarations.
func-defn
decl-stmts stmt-list
decl-stmt decl-stmts
type var-list ;
The tree gets extended when the input examined is :
int main ( ) { int a [ 1000 ] , i , is shown in the next page.
Structure of a Program
func-defn
decl-stmts stmt-list
decl-stmt decl-stmts
type var-list ;
Semantic Analysis
Semantic Analysis
Semantic Analysis
Intermediate Code Generation
Intermediate Code Generation
Intermediate Code Generation
Intermediate Code Generation
Intermediate Code Generation
Run Time Environment
Run Time Environment
Run Time Environment
Run Time Environment
Compiler Optimizations
Intermediate code optimization is an optional phase which is enabled by
an user if so required.
Optimization
Front end Phase Back end
Int code Int code
Source
program Target
program
Purpose is to improve the quality of intermediate code generated by the
front end of the compiler.
Illustrate the capability of this phase through examples.
Compiler Optimizations
Consider the following C program.
int main()
{ int a[1000], i, j;
int sum = 10000;
for (i = 0; i < 1000; i++) a[i] = i;
for (i = 0; i < 1000; i++)
for (j = 0; j < i*i; j++) a[i] = a[i] + a[j];
printf(" sum : %d \n", sum);
}
Compiler Optimizations
Compile this program with
$ gcc -S -fverbose-asm firstprog.c
int main()
{ int a[1000], i, j;
int sum = 10000;
for (i = 0; i < 1000; i++) a[i] = i;
for (i = 0; i < 1000; i++)
for (j = 0; j < i*i; j++) a[i] = a[i] + a[j];
printf(" sum : %d \n", sum);
}
Compiler Optimizations
Compile this program with
$ gcc -S -fverbose-asm firstprog.c -o unopt.s
Now compile using the O2 switch
Code Generation
Compiler Concepts Learned Through Example
THEME 2
STRUCTURE OF A COMPILER
Design of a Compiler
Regular expressions: Finite Automata : Lexical Analyser
Context free grammars : Pushdown automata : Parser or
Syntax Analyser
Nested Symbol Tables : Scope and lifetime analysis :
Data structure and algorithms
Memory layout and activation records : Runtime
environments : Architecture and OS
Design of a Compiler
Semantic Analysis [Context Sensitive Issues such as Type
checking; intermediate code generation] : Syntax directed
translation scheme : Tree, graph; Data structure and algorithms.
Parameter passing mechanisms : Semantic Analyser :
Programming Languages.
Optimization of intermediate code : Optional pass : graph, lattice
theory, solution of equations; Discrete Structures.
Code generation : Compiler back-end design: Algorithms and
complexity; Architecture, Assembly language, OS.
Theory and Practice are intimately connected
THEME 3
Experimentation With Gnu C Compiler
Open Source Production Quality Compiler
Simple C program
#include <stdio.h>
int main()
{
int a[1000], i, j;
int sum = 10000;
for (i = 0; i < 1000; i++) a[i] = i;
for (i = 0; i < 1000; i++)
for (j = 0; j < i*i; j++)
a[i]
= a[i] + a[j];
printf(" sum : %d \n", sum);
}
Compiler Features
Most compilers provide a host of features for programming ease and
enhanced productivity.
Illustrated with Gnu C compiler (gcc) & C++ compiler (g++)
List a few compiler options that are generally useful. Read
the online manual, man gcc or info gcc for more details.
-fverbose-asm -S -c -o
-v -O2 -E
-fump-tree-all and many many others
Experiments to illustrate the working of a compiler through a simple
example.
Checking Correctness
The output produced after the 6 optimizations are
performed in the order discussed are as follows.
10 16 30 36 50 56 70 76 90 96 original source
10 16 30 36 50 56 70 76 90 96 after optimization 1
10 16 30 36 50 56 70 76 90 96 after optimizations 1,2
10 16 30 36 50 56 70 76 90 96 after optimization 1 to 3
10 16 30 36 50 56 70 76 90 96 after optimization 1 to 4
10 16 30 36 50 56 70 76 90 96 after optimization 1 to 5
10 16 30 36 50 56 70 76 90 96 after optimization 1 to 6
10 16 30 36 50 56 70 76 90 96 after optimization by
gcc
Compiling for Dumps