Unit 1 Compiler Design
Unit 1 Compiler Design
System Programming
System programming involves designing and writing computer programs that allow
the computer hardware to interface with the programmer and the user, leading to the
effective execution of application software on the computer system.
Typical system programs include the operating system and firmware, programming
tools such as compilers, assemblers, I/O routines, interpreters, scheduler, loaders
and linkers as well as the run time libraries of the computer programming languages.
This kind of programming requires some level of hardware knowledge and is machine
dependent; the system programmer must therefore know the intended hardware on
which the software is required to operate.
Additionally, the programmer may make some assumptions on the hardware and other
system components. The software is usually written in a low-level programming
language which is able to operate efficiently in a resource-constrained environment,
and with little runtime overhead using a small library, or none at all. The low-level
language enables direct control over memory access and allows the program to be
written directly in an assembly language. The majority of programs are written using
assembly languages such as C, C++ and C#.
What is TOC?
The Theory of Computation is a branch of computer science and mathematics that focuses on
determining problems that can be solved mechanically, using an algorithm or a set of
programming rules. It is also concerned with the efficiency at which the algorithm can perform
the solution.
What problems can the machine solve? What problems can’t it solve?
How fast can a machine solve a problem?
How much memory space does a machine need to solve a problem?
DR SHALOO DADHEECH. 1
Unit -1
The Theory of Computation rests on the fact that computers can’t solve all problems. A given
machine would have limitations, and the Theory of Computation aims to discover these.
The computational model is given inputs and decides whether or not it can process the
information using the developed algorithm.
For example, you can create a machine and design it so that it only accepts red objects. The
algorithm is pretty straightforward, as represented by the image below. The red square is
accepted, and the model rejects the yellow square.
DR SHALOO DADHEECH. 2
Unit -1
Three theories make up what the Theory of Computation is. These are:
1. Automata Theory: Refers to the analysis of how machines work to solve a problem.
One of the most famous inventions that embody these concepts is the Turing machine created
by Alan Turing in the 1930s. The idea is that a Turing machine can run any problem that
algorithms can solve. In reverse, anything that an algorithm can’t do can’t be done by a Turing
machine.
In the real world, the theory has helped with several projects. For one, a group of computer
scientists used the theory to test a book vending machine design. Register machine models are
also based on the Theory of Computation, among other things.
The theory is applicable to other fields besides computer science, such as engineering and life
and social sciences. It is also a fundamental concept in artificial intelligence (AI), natural
language processing (NLP), neural networks and the like.
The Theory of Computation is a basic concept that computer scientists should understand. After
all, it reflects the reality of life—no single solution can solve all problems. As such, it is often
included in the computer science curriculum of universities.
DR SHALOO DADHEECH. 3
Unit -1
Given a set of problems, it would be a waste of time and effort for computer scientists to create
one algorithm to solve all of them. The Theory of Computation dictates that developers first
determine which problems can be solved algorithmically and which ones can’t. After that, they
would also need to find out how efficient the algorithm would be in terms of time and memory
space spent solving the problem.
Now, let’s understand the basic terminologies, which are important and
frequently used in the Theory of Computation.
Symbol:
A symbol (often also called a character) is the smallest building block, which
can be any alphabet, letter, or picture.
DR SHALOO DADHEECH. 4
Unit -1
Alphabets (Σ):
String:
Introduction to Language
DR SHALOO DADHEECH. 5
Unit -1
Grammar
Grammar in theory of computation is a finite set of formal rules that are generating
syntactically correct sentences.
The formal definition of grammar is that it is defined as four tuples −
G=(V,T,P,S)
G is a grammar, which consists of a set of production rules. It is used to generate the
strings of a language.
T is the final set of terminal symbols. It is denoted by lower case letters.
V is the final set of non-terminal symbols. It is denoted by capital letters.
P is a set of production rules, which is used for replacing non-terminal symbols (on the
left side of production) in a string with other terminals (on the right side of production).
S is the start symbol used to derive the string.
Terminal Symbols - Terminal symbols are the components of the sentences that are generated
using grammar and are denoted using small case letters like a, b, c etc.
Non-Terminal Symbols - Non-Terminal Symbols take part in the generation of the sentence
but are not the component of the sentence. These types of symbols are also called Auxiliary
Symbols and Variables. They are represented using a capital letter like A, B, C, etc.
DR SHALOO DADHEECH. 6
Unit -1
Example 1
Consider a grammar
G = (V , T , P , S)
Where,
V={S,A,B} ⇒ Non-Terminal symbols
T={a,b} ⇒ Terminal symbols
Production rules P = { S → ABa , A → BB , B → ab , AA → b }
S={S} ⇒ Start symbol
Example 2
Consider a grammar
G=(V,T,P,S)
Where,
V= {S, A, B} ⇒ non terminal symbols
T = { 0,1} ⇒ terminal symbols
Production rules P = { S→A1B
A→0A| ε
B→0B| 1B| ε }
S= {S} ⇒ start symbol.
DR SHALOO DADHEECH. 7
Unit -1
Union Union of two languages L1 and L2 produce the set of L1∪L2={00, 10,
strings which may be either in language L1 or in language 01, 11}
L2or in both. L1∪L2= {set of string in L1 and string in L2}
DR SHALOO DADHEECH. 8
Unit -1
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generator
5. Code optimizer
6. Code generator
DR SHALOO DADHEECH. 9
Unit -1
All these phases convert the source code by dividing into tokens, creating parse trees, and
optimizing the source code by different phases.
Example:
x = y + 10
Tokens
X identifier
= Assignment operator
Y identifier
+ Addition operator
10 Number
1
DR SHALOO DADHEECH.
0
Unit -1
Example
Any identifier/number is an expression
If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
Consider parse tree for the following example
(a+b)*c
In Parse Tree
Interior node: record with an operator filed and two files for children
Leaf: records with 2/more fields; one for token and other information about the token
Ensure that the components of the program fit together meaningfully
Gathers type information and checks for type compatibility
Checks operands are permitted by the source language
Helps you to store type information gathered and save it in symbol table or syntax tree
Allows you to perform type checking
In the case of type mismatch, where there are no exact type correction rules which
satisfy the desired operation a semantic error is shown
Collects type information and checks for type compatibility
Checks if the source language permits the operands or not
1
DR SHALOO DADHEECH.
1
Unit -1
Example
float x = 20.2;
float y = x*30;
In the above code, the semantic analyser will typecast the integer 30 to float 30.0 before
multiplication
Example
For example,
t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3
1
DR SHALOO DADHEECH.
2
Unit -1
Example:
Consider the following code
a = intofloat(10)
b=c*a
d=e+b
f=d
Can become
b =c * 10.0
f = e+b
Example:
a = b + 60.0
would be possibly translated to registers.
MOVF a, R1
MULF #60.0, R2
ADDF R1, R2
Symbol Table Management
• Symbol table is used to store all the information about identifiers used in the program.
• It is a data structure containing a record for each identifier, with fields for the attributes of
the identifier.
• It allows finding the record for each identifier quickly and to store or retrieve data from that
record.
• Whenever an identifier is detected in any of the phases, it is stored in the symbol table.
Example
int a, b; float c; char z;
a Int 1000
1
DR SHALOO DADHEECH.
3
Unit -1
b Int 1002
c Float 1004
z char 1008
Example
1 extern double test (double x);
2 double sample (int count) {
3 double sum= 0.0;
4 for (int i = 1; i < = count; i++)
5 sum+= test((double) i);
6 return sum;
7 }
Symbol name Type Scope
1
DR SHALOO DADHEECH.
4
Unit -1
Error Handling
• Each phase can encounter errors. After detecting an error, a phase must handle the error so
that compilation can proceed.
• In lexical analysis, errors occur in separation of tokens.
• In syntax analysis, errors occur during construction of syntax tree.
• In semantic analysis, errors may occur at the following cases:
1
DR SHALOO DADHEECH.
5
Unit -1
(i) When the compiler detects constructs that have right syntactic structure but no meaning
(ii) During type conversion.
• In code optimization, errors occur when the result is affected by the optimization. In code
generation, it shows error when code is missing etc.
Figure illustrates the translation of source code through each phase, considering the statement
c =a+ b * 5.
Error Encountered in Different Phases
Each phase can encounter errors. After detecting an error, a phase must somehow deal with
the error, so that compilation can proceed.
A program may have the following kinds of errors at various stages:
Lexical Errors
It includes incorrect or misspelled name of some identifier i.e., identifiers typed incorrectly.
Syntactical Errors
These errors are a result of incompatible value assignment. The semantic errors that the
semantic analyzer is expected to recognize are:
• Type mismatch.
• Undeclared variable.
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.
1
DR SHALOO DADHEECH.
6
Unit -1
Logical errors
1
DR SHALOO DADHEECH.
7