0% found this document useful (0 votes)
19 views36 pages

ACD Unit-2 Part-1

The document provides a comprehensive overview of compiler design, detailing the phases of compilation, the differences between compilers and interpreters, and the roles of various components such as lexical analyzers, syntax analyzers, and semantic analyzers. It explains the processes involved in translating high-level programming languages into machine code, including error handling and the use of symbol tables. Additionally, it discusses tools like LEX for generating lexical analyzers and the concept of bootstrapping in compiler development.

Uploaded by

vanitha.thandur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views36 pages

ACD Unit-2 Part-1

The document provides a comprehensive overview of compiler design, detailing the phases of compilation, the differences between compilers and interpreters, and the roles of various components such as lexical analyzers, syntax analyzers, and semantic analyzers. It explains the processes involved in translating high-level programming languages into machine code, including error handling and the use of symbol tables. Additionally, it discusses tools like LEX for generating lexical analyzers and the concept of bootstrapping in compiler development.

Uploaded by

vanitha.thandur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Compiler Design

1. Introduction
COMPILERS

Contents :

1. Compiler Introduction

2. Language Processing System

3. Compiler vs Interpreter

4. Phase of Compilation

5. Example showing output of each


phase
COMPILERS
A compiler is a program takes a program written in a source
language and translates it into an equivalent program in a
target language.

Source program COMPILER Target program


( Normally a program written in (Normally the equivalent program
a high-level programming language) in machine code – relocatable
object file)

Error
messages
Language processing system
High-Level Language –
HLL program contains
#define or #include directives .
They are closer to humans but far
from machines. These (#) tags are
called preprocessor directives.
They direct the pre-processor about
what to do.

Pre-Processor –
The pre-processor removes
all the #include directives by
including the necessary files and all
the #define directives using macro
expansion. It also performs
macro- processing.
Language processing system
Compiler
is software that converts a
program
Language)
written to
in a low-level
high-level language
langua
(Object/Target/Machine (Source
ge
Language/ language program). Assembl
y
Assembly Language program –
It’s neither in binary form nor
high
level. It is a combination of
machine instructions and some
other useful data needed for
execution.
Language processing system

Assembler – For every platform


(Hardware + OS) we will have an
assembler. They are not universal since
for each platform we have one. The
output of the assembler is called an
object file. Its translates assembly
language to machine code.

Re locatable Machine Code – It can be loaded


at any point and can be run. The address within
the program will be in such a way that it will
cooperate with the program movement.
Linker:- Linking is a process where a linker takes several object files and
libraries as input and produces one executable object file .

Loader :- Loading is a process where a loader loads an executable file into


memory, initializes the register heap data etc., and starts the execution of
the program
Compiler vs
Interpreter
Interpreter – An interpreter converts high-level language into low-
level machine
language, just like a compiler. But they are different in the way
they read the input. The Compiler in one go reads the inputs,
does the processing, and
executes the source code whereas the interpreter does the same
line by line.
Compiler scans the entire program and translates it as a whole
into machine
code whereas an interpreter translates the program one statement
at a time.
Interpreted programs are usually slower with respect to compiled
ones.
Compiler vs
Interpreter
COMPILER INTERPRTETR

A compiler is a program which coverts the


entire source code of a programming language interpreter takes a source program and runs it line
into executable machine code for a CPU. by line, translating each line as it comes to it.

Compiler takes large amount of time to analyze the Interpreter takes less amount of time to analyzethe
entire source code but the overall execution time of source code but the overall execution time of the
the program is comparatively faster. program is slower.

Compiler generates the error message only after


scanning the whole program, so debugging is Its Debugging is easier as it continues translating the
comparatively hard as the error can be present program until the error ismet
any where in the program.

Generates intermediate object code. No intermediate object code is generated.

Examples: C, C++, Java Examples: Python, Perl


Major Parts of Compilers
⮚ A compiler operates in phases.
⮚ A phase is a logically interrelated operation that takes source
program in
one representation and produces output in another
representation.

There are two phases of compilation.


✓ Analysis (Machine Independent/Language Dependent)
✓ Synthesis(Machine Dependent/Language independent)
Compilation process is partitioned into no-of-sub processes called ‘phases’.
Major Parts of Compilers
Analysis and Synthesis phases of compiler

In analysis phase, an intermediate representation is created


from
the given source program.
Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are
the
parts of this phase.

In synthesis phase, the equivalent target program is created


from this
intermediate representation.
Intermediate Code Generator, Code Generator, and Code
Optimizer
Structure of
Compiler HLL Program or Source
code

or Target program
Phases of compiler

Compilation process is partitioned into no-of-sub processes


called ‘phases’.
⮚ Lexical analysis (Scanning)

⮚ Syntax Analysis (Parsing)

⮚ Semantic analysis

⮚ Intermediate Code Generation

⮚ Code optimization

⮚ Code Generation
Phases of A Compiler
Source Lexical Syntax Semantic Intermediate Code Code Target
Program Analyzer Analyzer Analyzer Code Generator Optimizer Generator Program

• Each phase transforms the source program from one


representation
into another representation.

• They communicate with error handlers.

• They communicate with the symbol table.


Analysis phase : Lexical analyzer
Lexical Analyzer reads the source program character by character and
returns the tokens of the source program.
A token describes a pattern of characters having some meaning in the
source language.

(such as identifiers, operators, keywords, numbers, delimeters and so


on)

Ex:Tokens:
newval:= oldval + 12

newval identifier
:= assignment operator
oldval identifier
+ add operator
12 constant
Analysis phase : Lexical analyzer

Lexical analysis will place the abou identifier


information constants, labels into the t s,
symbol table.

Regular expressions are used to describe tokens (lexical


constructs).

A DFA can be used in the implementation of a lexical


analyzer.
Analysis phase : Syntax
Analyzer
A Syntax Analyzer creates the syntactic structure (generally
a parse tree) of the given program. And if there are any
syntax errors those are informed and if there are no
errors then a parse tree like structure is generated.

A syntax analyzer is also called as a parser.

A parse tree describes a syntactic structure.


In the parse all the internal nodes representing variables of
the CFG
All leaf nodes representing terminals of the CFG
Analysis phase : Syntax
Analyzer

Ex: newval := oldval


+ 12
Analysis phase : Semantic
Analyzer
Semantic analysis checks the semantic consistency of the
code.

It uses the syntax tree of the previous phase along with the
symbol
table to verify that the given source code is semantically
consistent.

It also checks whether the code is conveying an appropriate


meaning.

Semantic Analyzer will check for Type mismatches,


incompatible operands, a function called with improper
arguments, an undeclared variable, etc.
Analysis phase : Syntax
Analyzer
Functions of Semantic analyses phase are:
Helps you to store type information gathered and save it in
symbol table
or syntax tree
Allows you to perform type checking
In the case of type mismatch, where there are no exact type
correction rules which satisfy the desired operation a semantic
error is shown Collects type information and checks for type
compatibility
Checks if the source language permits the operands or not
Example
float x = 20.2; float y = x*30; In the above code, the semantic
analyzer will typecast the integer 30 to float 30.0 before
Synthesis phase

Intermediate Code Generations:-


An intermediate representation of the final machine language code is
produced. This phase bridges the analysis and synthesis phases
of
translation.

Code Optimization :-
This is optional phase described to improve the intermediate code so
that the output runs faster and takes less space.

Code Generation:-
The last phase of translation is code generation. A number
of
optimizations to reduce the length of machine language program are
carried out during this phase. The output of the code generator is
Symbol table

Table Management (or) Book-keeping:-

This is the portion to store the names used by the program and records
essential information about each. The data structure used to record this
information called a ‘Symbol Table’.

The information about data objects is collected by the early phases of


the
compiler-lexical and syntactic analyzers.
Symbol table

Table Management (or) Book-keeping:-

A symbol table contains a record for each identifier, constant and labels with fields
for the attributes of the identifier.

This component makes it easier for the compiler to search the identifier record and
retrieve it quickly.

The symbol table also helps you for the scope management.
The symbol table and error handler interact with all the phases and symbol table
update correspondingly.
Error handler

Error Handlers:-
It is invoked when a flaw error in the source program is
detected.
One of the most important functions of a compiler is the
detection and
reporting of errors in the source program. The error message
should
allow the programmer to determine exactly where the
errors have
occurred. Errors may occur in all or the phases of a compiler.

Whenever a phase of the compiler discovers an error, it must


report the error to the error handler, which issues an
appropriate diagnostic msg.

Both of the table-management and error-Handling routines


Error handler
In the compiler design process error may occur in all the given phases as given below:

Lexical analyzer: Wrongly spelled tokens


Syntax analyzer: Missing parenthesis
Intermediate code generator: Mismatched operands for an operator
Code Optimizer: When the statement is not reachable
Code Generator: Unreachable statements
Symbol tables: Error of multiple declared identifiers

Most common errors are invalid character sequence in scanning or lexical analysis.
invalid token sequences in type, scope error, and type mismatch parsing in syntax and
semantic analysis.
Error handler

After finding errors, the phase needs to deal with the errors to continue with the
compilation process.

These errors need to be reported to the error handler which handles the error to
perform the compilation process.

Generally, the errors are reported in the form of message.


Example shows the output of each
phase Position:=initial +rate *60
Example shows the output of each
phase
Pass and Phases of Translation

A compiler can have many phases and passes.


Pass : A pass refers to the traversal of a compiler through the entire program.
Phase : A phase of a compiler is a distinguishable stage, which takes input from
the previous stage, processes and yields output that can be used as input for the
next stage.
Compiler pass are two types:
1. Single Pass Compiler
2. Two Pass Compiler or Multi Pass Compiler.
Single Pass Compiler(Narrow Compilers):
If we combine or group all the phases of
compiler design in a single module known as
single pass compiler.
A one pass/single pass compiler is that type
of compiler that passes through the part of
each compilation unit exactly once.
Single pass compiler is faster and smaller
than
the multi pass compiler.
As a disadvantage of single pass compiler is
that it is less
efficient in comparison with multi pass
compiler.
Pass and Phases of Translation...

Multipass Compiler( Wide Compilers):


 A Two pass/multi-pass Compiler is a type of compiler that processes the
source code of a program multiple times. In multipass Compiler we
divide phases in two pass as:
 In first pass the included phases are as Lexical analyzer, syntax analyzer,
semantic analyzer, intermediate code generator are work as front end.
 First pass is platform independent
because the output of first pass is as
three address code which is useful for
every system .
 In second Pass the included phases
are as
Code optimization and Code generator
are work as back end and the synthesis
part refers to taking input as three
address code and convert them into Low
level language/assembly language and
second pass is platform dependent
because final stage of a typical compiler
converts the
intermediate representation of program
into an
executable set of instructions which is
Bootstrapping

 Bootstrapping is widely used in the compilation development.


 It is a process in which simple language is used to translate more
complicated program which in turn may handle for more complicated
program. This complicated program can further handle even more
complicated program and so on.
 It is used to produce a self-hosting compiler.
 Self-hosting compiler is a type of compiler that can compile its own
source code . i.e.
a compiler written in the source programming language that it intends
to compile.
 A compiler can be characterized by three languages:
1) Source Language
2) Target Language
3) Implementation Language
 The T- diagram shows a compiler SC T for Source S, Target T,
implemented in I. I
 Cross Compiler is a compiler which runs on one machine and produces
output for another machine.
Compiler Construction tools:

Input Ouput Tools

1) Scanner tool Program Token Lex

2) Parser tool Tokens Parse Tree YACC

3) Syntax Directed Parse Tree Intermediate


Translation Engine Code

4) Data Flow Analysis Intermediate Optimized


Engine Code Code

5) Code Generator Optimized Machine Code


Code
6) Compiler Combination of 1/ more Tools
Construction Tool Kit
LEX
LEX:
Lex is a program that generates
lexical analyzer.
It is a Unix utility.
The lexical analyzer is a program that transforms an input stream
into a sequence of tokens.
Lex specifies tokens using Regular Expression.
The function of Lex is as follows:
1. Firstly lexical analyzer creates a program called lex specification file ,
lex.l in the Lex language. Then Lex compiler runs the lex.1 program and
produces a C program lex.yy.c.
2. Finally C compiler runs the lex.yy.c program and produces an object
program a.out.
3. a.out is lexical analyzer that transforms an input stream into a sequence
of tokens.
Prepared by D HIMAGIRI

LEX...

The structure of LEX programs:


%{
Declarations
%}
%%
Rules
%%
Auxiliary Functions
Declaration Section:
The declarations section consists of two parts, auxiliary declarations and regular
definitions.
The auxiliary declarations are copied as such by LEX to the output lex.yy.c file. This C
code consists of instructions to the C compiler and are not processed by the LEX
tool.
The auxiliary declarations (which are optional) are written in C language and are
41
enclosed within ' %{ ' and ' %} ' .
It is generally used to declare functions, include header files, or define global variables
and
constants.
LEX allows the use of short-hands and extensions to regular expressions for the
regular
definitions. A regular definition in LEX is of the form : D R
where D is the symbol representing the regular expression
R.
Prepared by D HIMAGIRI

LEX...
Rules:
Rules in a LEX program consists of two parts :
1.The pattern to be matched
2.The corresponding action to be executed
 Patterns are defined using the regular expressions and actions can be
specified using C Code.
The Rules can be given as

R1 {Action1}
R2 {Action2}
.
.
.
Rn {Action n}

Where Ri is RE and Action i is the action to be taken for corresponding RE.


Auxiliary Functions:
All the required procedures are defined in this section.

Note: Function yywrap is called by lex when input is exhausted. When the
end of the file is reached the return value of yywrap() is checked. If it is non-
zero, scanning terminates and if it is 0 scanning continues with next input file.
LEX...
Lex Program for count tokens in source program:

Note: yylex() match the characters with the regular expression.

You might also like