0% found this document useful (0 votes)
42 views17 pages

Unit 1 Compiler Design

The document discusses system programming and provides details about its key aspects. System programming involves designing programs that allow hardware and software interaction leading to effective application execution. It includes operating systems, compilers, and runtime libraries. System programming is foundational for application development and must accommodate hardware changes. It requires some hardware knowledge and is machine dependent. Programming is typically done in low-level languages for efficiency.

Uploaded by

hello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views17 pages

Unit 1 Compiler Design

The document discusses system programming and provides details about its key aspects. System programming involves designing programs that allow hardware and software interaction leading to effective application execution. It includes operating systems, compilers, and runtime libraries. System programming is foundational for application development and must accommodate hardware changes. It requires some hardware knowledge and is machine dependent. Programming is typically done in low-level languages for efficiency.

Uploaded by

hello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit -1

System Programming

 System programming involves designing and writing computer programs that allow
the computer hardware to interface with the programmer and the user, leading to the
effective execution of application software on the computer system.

 Typical system programs include the operating system and firmware, programming
tools such as compilers, assemblers, I/O routines, interpreters, scheduler, loaders
and linkers as well as the run time libraries of the computer programming languages.

 System programming is an essential and important foundation in any computer’s


application development, and always evolving to accommodate changes in the
computer hardware.

 This kind of programming requires some level of hardware knowledge and is machine
dependent; the system programmer must therefore know the intended hardware on
which the software is required to operate.

 Additionally, the programmer may make some assumptions on the hardware and other
system components. The software is usually written in a low-level programming
language which is able to operate efficiently in a resource-constrained environment,
and with little runtime overhead using a small library, or none at all. The low-level
language enables direct control over memory access and allows the program to be
written directly in an assembly language. The majority of programs are written using
assembly languages such as C, C++ and C#.

What is TOC?

The Theory of Computation is a branch of computer science and mathematics that focuses on
determining problems that can be solved mechanically, using an algorithm or a set of
programming rules. It is also concerned with the efficiency at which the algorithm can perform
the solution.

In simple terms, the Theory of Computation answers these questions:

 What problems can the machine solve? What problems can’t it solve?
 How fast can a machine solve a problem?
 How much memory space does a machine need to solve a problem?

DR SHALOO DADHEECH. 1
Unit -1

To answer these questions, computer scientists use a model of computation, which is a


computer simulation for the algorithm being developed. The Turing machine is among the most
used models of computation.

The Theory of Computation rests on the fact that computers can’t solve all problems. A given
machine would have limitations, and the Theory of Computation aims to discover these.

The computational model is given inputs and decides whether or not it can process the
information using the developed algorithm.

For example, you can create a machine and design it so that it only accepts red objects. The
algorithm is pretty straightforward, as represented by the image below. The red square is
accepted, and the model rejects the yellow square.

The Theory of Computation can also help determine


if a model needs improvement. In the example
above, the developer may want to introduce other
inputs to see how the model treats them. What
happens when a red square with a yellow border is
introduced to the machine? How about when the
border is red, but the inside of the object is yellow?

DR SHALOO DADHEECH. 2
Unit -1

Three Main Branches of the Theory of Computation

Three theories make up what the Theory of Computation is. These are:

1. Automata Theory: Refers to the analysis of how machines work to solve a problem.

2. Computability Theory: Pertains to determining which problems a machine can solve


and which ones it can’t.

3. Computational Complexity Theory: Addresses the issue of the efficiency of the


machine when solving a problem.

One of the most famous inventions that embody these concepts is the Turing machine created
by Alan Turing in the 1930s. The idea is that a Turing machine can run any problem that
algorithms can solve. In reverse, anything that an algorithm can’t do can’t be done by a Turing
machine.

What Is the Theory of Computation For?

In the real world, the theory has helped with several projects. For one, a group of computer
scientists used the theory to test a book vending machine design. Register machine models are
also based on the Theory of Computation, among other things.

The theory is applicable to other fields besides computer science, such as engineering and life
and social sciences. It is also a fundamental concept in artificial intelligence (AI), natural
language processing (NLP), neural networks and the like.

The Theory of Computation is a basic concept that computer scientists should understand. After
all, it reflects the reality of life—no single solution can solve all problems. As such, it is often
included in the computer science curriculum of universities.

DR SHALOO DADHEECH. 3
Unit -1

Given a set of problems, it would be a waste of time and effort for computer scientists to create
one algorithm to solve all of them. The Theory of Computation dictates that developers first
determine which problems can be solved algorithmically and which ones can’t. After that, they
would also need to find out how efficient the algorithm would be in terms of time and memory
space spent solving the problem.

Compiler Design Objectives:


 Provide an understanding of the fundamental principles in compiler design
 Provide the skills needed for building compilers for various situations that one may
encounter in a career in Computer Science.
 Learn the process of translating a modern high-level language to executable code
required for compiler construction.
Compiler Design outcome and Scope
 The goal of the Compiler Design is to provide an introduction to the system software
like assemblers, compilers, and macros. It provides the complete description about
inner working of a compiler.
 This focuses mainly on the design of compilers and optimization techniques. It also
includes the design of Compiler writing tools. This also aims to convey the language
specifications, use of regular expressions and context free grammars behind the design
of compiler.
Compiler Design outcome
 Understand fundamentals of compiler and identify the relationships among different
phases of the compiler.
 Understand the application of finite state machines, recursive descent, production rules,
parsing, and language semantics.
 Analyse & implement required module, which may include front-end, back-end, and a
small set of middle-end optimizations.
 Use modern tools and technologies for designing new compiler.

Basic Terminologies of Theory of Computation:

Now, let’s understand the basic terminologies, which are important and
frequently used in the Theory of Computation.

Symbol:

A symbol (often also called a character) is the smallest building block, which
can be any alphabet, letter, or picture.

DR SHALOO DADHEECH. 4
Unit -1

Alphabets (Σ):

Alphabets are a set of symbols, which are always finite.

String:

A string is a finite sequence of symbols from some alphabet. A string is


generally denoted as w and the length of a string is denoted as |w|.
Empty string is the string with
zero occurrence of symbols,
represented as ε.
Number of Strings (of length 2)
that can be generated over the alphabet {a, b}:
- -
a a
a b
b a
b b

Length of String |w| = 2


Number of Strings = 4

Introduction to Language

A language is a set of strings, chosen from some Σ* or we can say- ‘A


language is a subset of Σ* ‘. A language that can be formed over ‘ Σ ‘ can
be Finite or Infinite.
Example of Finite Language:
L1 = { set of string of 2 }

DR SHALOO DADHEECH. 5
Unit -1

L1 = { xy, yx, xx, yy }

Example of Infinite Language:


L1 = { set of all strings starts with 'b' }
L1 = { babb, baa, ba, bbb, baab, ....... }

Grammar
 Grammar in theory of computation is a finite set of formal rules that are generating
syntactically correct sentences.
The formal definition of grammar is that it is defined as four tuples −
G=(V,T,P,S)
 G is a grammar, which consists of a set of production rules. It is used to generate the
strings of a language.
 T is the final set of terminal symbols. It is denoted by lower case letters.
 V is the final set of non-terminal symbols. It is denoted by capital letters.
 P is a set of production rules, which is used for replacing non-terminal symbols (on the
left side of production) in a string with other terminals (on the right side of production).
 S is the start symbol used to derive the string.

 Grammar is composed of two basic elements

Terminal Symbols - Terminal symbols are the components of the sentences that are generated
using grammar and are denoted using small case letters like a, b, c etc.
Non-Terminal Symbols - Non-Terminal Symbols take part in the generation of the sentence
but are not the component of the sentence. These types of symbols are also called Auxiliary
Symbols and Variables. They are represented using a capital letter like A, B, C, etc.

DR SHALOO DADHEECH. 6
Unit -1

Example 1
Consider a grammar
G = (V , T , P , S)
Where,
V={S,A,B} ⇒ Non-Terminal symbols
T={a,b} ⇒ Terminal symbols
Production rules P = { S → ABa , A → BB , B → ab , AA → b }
S={S} ⇒ Start symbol
Example 2
Consider a grammar
G=(V,T,P,S)
Where,
V= {S, A, B} ⇒ non terminal symbols
T = { 0,1} ⇒ terminal symbols
Production rules P = { S→A1B
A→0A| ε
B→0B| 1B| ε }
S= {S} ⇒ start symbol.

What is Regular Expressions?


 Regular expressions are an important notation for defining patterns. Each pattern
connects a set of strings. Therefore regular expressions will give as names for sets of
strings.
 It supports an appropriate and useful notation for describing tokens. Regular
Expressions define the language accepted by finite Automata (Transition Diagram).
 Regular Expressions are defined over an alphabet ∑∑.
 If R is a Regular Expression, therefore L(R) represents language denoted by the regular
expression.
Language − It is a collection of strings over some fixed alphabet. The empty string can be
indicated by ε.
Example − If L (Language) = set of strings of 0’s & 1’s of length two
then L = {00, 01, 10, 11}
Example − If L = {1}
then L*=L0∪L1∪L2∪….. Here * can be 0, 1, 2, 3………..
∴ L*={ε}∪{1}∪{11}∪…..
∴ L*={ε,1,11,111,…..}

DR SHALOO DADHEECH. 7
Unit -1

Operation on Regular Languages


The various operations on the regular language are as follows −
If L1={00,10} & L2={01,11}

Operation Description Example

Union Union of two languages L1 and L2 produce the set of L1∪L2={00, 10,
strings which may be either in language L1 or in language 01, 11}
L2or in both. L1∪L2= {set of string in L1 and string in L2}

Concatenat Concatenation of two languages L1 and L2 create a set of L1L2={0001,


ion strings which are formed by combining the strings in 0011,
L1 with strings in L2 (strings in L1 should be followed by 1001,1011}
strings in L2). L1L2={Set of string in L1 followed by
strings in L2}.

Kleen Kleene closure defines zero or more appearance of input L∗1L1∗={ε,00,1


closure of symbols in a string. It consists of an empty string Ɛ (a set 0,1010,
L1L∗1L1∗ of strings with 0 or more occurrences of input 0010,1000,0000,
symbols).L∗1L1∗=L01L10∪L11L11∪L21L12∪…..L∗1= 000000,
⋃i=0∞Li1L1∗=⋃i=0∞L1i 001000,….}

Positive Positive closure indicates one or more occurrences of L+1L1+={00,10


Closure L+ input symbols in a string. It eliminates empty string Ɛ(set ,1010,
1L1+ of strings with 1or more appearance of input 0010,1000,0000,
symbols).L+1L1+=L11L11∪L21L12∪…..L+1=⋃i=0∞L 000000,
i1L1+=⋃i=0∞L1i 001000,….}

Extensions of Regular Expressions


Kleene suggests regular expression in the 1950s with the primary operation for a union,
concatenation, and Kleene closure.
There is some notational extension specified that are directly in use −
 One or more instance − Unary postfix operator + displays positive closure of a regular
expression and its language. It defined that if a is the regular expression, then (a) +
indicates the language (L(a) +. There are two algebraic laws r∗r∗ = r+|e and r+
=rr∗r∗ = r∗r∗r relate the positive closure and Kleene closure.
 Zero or one instance − Unary postfix operator? define zero or one appearance. It define
that r? is similar to r|e or L(r?) = L(r) U {e}. This operator has the equal precedence and
associativity as * and +./

What are the Phases of Compiler Design?


Compiler operates in various phases each phase transforms the source program from one
representation to another. Every phase takes inputs from its previous stage and feeds its output

DR SHALOO DADHEECH. 8
Unit -1

to the next phase of the compiler.


There are 6 phases in a compiler. Each of this phase help in converting the high-level langue
the machine code. The phases of a compiler are:

1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generator
5. Code optimizer
6. Code generator

DR SHALOO DADHEECH. 9
Unit -1

All these phases convert the source code by dividing into tokens, creating parse trees, and
optimizing the source code by different phases.

Phase 1: Lexical Analysis


Lexical Analysis is the first phase when compiler scans the source code. This process can be
left to right, character by character, and group these characters into tokens.
Here, the character stream from the source program is grouped in meaningful sequences by
identifying the tokens. It makes the entry of the corresponding tickets into the symbol table and
passes that token to next phase.
The primary functions of this phase are:

 Identify the lexical units in a source code


 Classify lexical units into classes like constants, reserved words, and enter them in
different tables. It will Ignore comments in the source program
 Identify token which is not a part of the language

Example:
x = y + 10

Tokens

X identifier

= Assignment operator

Y identifier

+ Addition operator

10 Number

Phase 2: Syntax Analysis


Syntax analysis is all about discovering structure in code. It determines whether or not a text
follows the expected format. The main aim of this phase is to make sure that the source code
was written by the programmer is correct or not.
Syntax analysis is based on the rules based on the specific programing language by constructing
the parse tree with the help of tokens. It also determines the structure of source language and
grammar or syntax of the language.
Here, is a list of tasks performed in this phase:

 Obtain tokens from the lexical analyzer


 Checks if the expression is syntactically correct or not

1
DR SHALOO DADHEECH.
0
Unit -1

 Report all syntax errors


 Construct a hierarchical structure which is known as a parse tree

Example
Any identifier/number is an expression
If x is an identifier and y+10 is an expression, then x= y+10 is a statement.
Consider parse tree for the following example

(a+b)*c

In Parse Tree

 Interior node: record with an operator filed and two files for children
 Leaf: records with 2/more fields; one for token and other information about the token
 Ensure that the components of the program fit together meaningfully
 Gathers type information and checks for type compatibility
 Checks operands are permitted by the source language

Phase 3: Semantic Analysis


Semantic analysis checks the semantic consistency of the code. It uses the syntax tree of the
previous phase along with the symbol table to verify that the given source code is semantically
consistent. It also checks whether the code is conveying an appropriate meaning.
Semantic Analyser will check for Type mismatches, incompatible operands, a function called
with improper arguments, an undeclared variable, etc.
Functions of Semantic analyses phase are:

 Helps you to store type information gathered and save it in symbol table or syntax tree
 Allows you to perform type checking
 In the case of type mismatch, where there are no exact type correction rules which
satisfy the desired operation a semantic error is shown
 Collects type information and checks for type compatibility
 Checks if the source language permits the operands or not

1
DR SHALOO DADHEECH.
1
Unit -1

Example
float x = 20.2;
float y = x*30;
In the above code, the semantic analyser will typecast the integer 30 to float 30.0 before
multiplication

Phase 4: Intermediate Code Generation


Once the semantic analysis phase is over the compiler, generates intermediate code for the
target machine. It represents a program for some abstract machine.
Intermediate code is between the high-level and machine level language. This intermediate
code needs to be generated in such a manner that makes it easy to translate it into the target
machine code.
Functions on Intermediate Code generation:

 It should be generated from the semantic representation of the source program


 Holds the values computed during the process of translation
 Helps you to translate the intermediate code into target language
 Allows you to maintain precedence ordering of the source language
 It holds the correct number of operands of the instruction

Example
For example,

total = count + rate * 5


Intermediate code with the help of address code method is:

t1 := int_to_float(5)
t2 := rate * t1
t3 := count + t2
total := t3

Phase 5: Code Optimization


The next phase of is code optimization or Intermediate code. This phase removes unnecessary
code line and arranges the sequence of statements to speed up the execution of the program
without wasting resources. The main goal of this phase is to improve on the intermediate code
to generate a code that runs faster and occupies less space.
The primary functions of this phase are:

 It helps you to establish a trade-off between execution and compilation speed


 Improves the running time of the target program
 Generates streamlined code still in intermediate representation
 Removing unreachable code and getting rid of unused variables
 Removing statements which are not altered from the loop

1
DR SHALOO DADHEECH.
2
Unit -1

Example:
Consider the following code

a = intofloat(10)
b=c*a
d=e+b
f=d
Can become

b =c * 10.0
f = e+b

Phase 6: Code Generation


Code generation is the last and final phase of a compiler. It gets inputs from code optimization
phases and produces the page code or object code as a result. The objective of this phase is to
allocate storage and generate relocatable machine code.
It also allocates memory locations for the variable. The instructions in the intermediate code
are converted into machine instructions. This phase coverts the optimize or intermediate code
into the target language.
The target language is the machine code. Therefore, all the memory locations and registers are
also selected and allotted during this phase. The code generated by this phase is executed to
take inputs and generate expected outputs.

Example:
a = b + 60.0
would be possibly translated to registers.

MOVF a, R1
MULF #60.0, R2
ADDF R1, R2
Symbol Table Management

• Symbol table is used to store all the information about identifiers used in the program.
• It is a data structure containing a record for each identifier, with fields for the attributes of
the identifier.
• It allows finding the record for each identifier quickly and to store or retrieve data from that
record.
• Whenever an identifier is detected in any of the phases, it is stored in the symbol table.
Example
int a, b; float c; char z;

Symbol name Type Address

a Int 1000

1
DR SHALOO DADHEECH.
3
Unit -1

b Int 1002

c Float 1004

z char 1008

Example
1 extern double test (double x);
2 double sample (int count) {
3 double sum= 0.0;
4 for (int i = 1; i < = count; i++)
5 sum+= test((double) i);
6 return sum;
7 }
Symbol name Type Scope

test function, double extern

x double function parameter

sample function, double global

count int function parameter

sum double block local

i int for-loop statement

1
DR SHALOO DADHEECH.
4
Unit -1

Error Handling

• Each phase can encounter errors. After detecting an error, a phase must handle the error so
that compilation can proceed.
• In lexical analysis, errors occur in separation of tokens.
• In syntax analysis, errors occur during construction of syntax tree.
• In semantic analysis, errors may occur at the following cases:

1
DR SHALOO DADHEECH.
5
Unit -1

(i) When the compiler detects constructs that have right syntactic structure but no meaning
(ii) During type conversion.
• In code optimization, errors occur when the result is affected by the optimization. In code
generation, it shows error when code is missing etc.
Figure illustrates the translation of source code through each phase, considering the statement
c =a+ b * 5.
Error Encountered in Different Phases

Each phase can encounter errors. After detecting an error, a phase must somehow deal with
the error, so that compilation can proceed.
A program may have the following kinds of errors at various stages:
Lexical Errors

It includes incorrect or misspelled name of some identifier i.e., identifiers typed incorrectly.
Syntactical Errors

It includes missing semicolon or unbalanced parenthesis. Syntactic errors are handled by


syntax analyzer (parser).
When an error is detected, it must be handled by parser to enable the parsing of the rest of the
input. In general, errors may be expected at various stages of compilation but most of the
errors are syntactic errors and hence the parser should be able to detect and report those errors
in the program.
The goals of error handler in parser are:
• Report the presence of errors clearly and accurately.
• Recover from each error quickly enough to detect subsequent errors.
• Add minimal overhead to the processing of correcting programs.
There are four common error-recovery strategies that can be implemented in the parser to deal
with errors in the code.
o Panic mode.
o Statement level.
o Error productions.
o Global correction.
Semantical Errors

These errors are a result of incompatible value assignment. The semantic errors that the
semantic analyzer is expected to recognize are:
• Type mismatch.
• Undeclared variable.
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.

1
DR SHALOO DADHEECH.
6
Unit -1

Logical errors

These errors occur due to not reachable code-infinite loop

1
DR SHALOO DADHEECH.
7

You might also like