0% found this document useful (0 votes)
13 views12 pages

UNIT-1 Odg

Uploaded by

kirlasurya2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

UNIT-1 Odg

Uploaded by

kirlasurya2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-1 Unit 1 - Introduction

1. Explain overview of translation process.


 A translator is a kind
kind of
of program
program that
thattakes
takesone
oneform
form of
ofprogram
program as
as input
input and
andconverts
converts
it into another form.
 The input is called source program and output is called target program.
 The source language can be assembly language or higher level language like C, C++,
FORTRAN, etc...
 There are three types of translators,
1. Compiler
2. Interpreter
3. Assembler

2. What is compiler? List major functions done by compiler.


 A compiler is a program that reads a program written in one language and translates
into an equivalent
ent program in another language
language.

Source Program Compiler Target Program

Error report
Fig.1.1. A Compiler
Major functions done by compiler:
 Compiler is used to convert one form of program to another.
 A compiler should convert the source program to a target machine code in such a way
that the generated target code should be easy to understand.
 Compiler should preserve
preserv the meaning of source code.
 Compiler should report errors that occur during compilation process.
 The compilation must be done efficiently.

3. Write the difference between compiler, interpreter and


assembler.
1. Compiler v/s Interpreter
No. Compiler Interpreter
1 Compiler takes entire program as an a Interpreter takes single instruction as aan
input. input.
2 Intermediate code is generated. No Intermediate code is generated.
3 Memory requirement is more. Memory requirement is less.
4 Error is displayed
displayedafter
afterentire
entireprogram
program Error is displayed for every instruction
is checked. interpreted.
5 Example: C compiler Example: BASIC
Table 1.1 Difference between Compiler & Interpreter

Dixita Kagathara, CE Department | 170701 – Compiler Design 1


Unit 1 - Introduction

2. Compiler v/s Assembler


No. Compiler Assembler
1 It translates higher level language to It translates mnemonic operation code to
machine code. machine code.
2 Types of compiler, Types of assembler,
 Single pass compiler  Single pass assembler
 Multi pass compiler  Two pass assembler
3 Example: C compiler Example: 8085, 8086 instruction set
Table 1.2 Difference between Compiler & Assembler

4. Analysis synthesis model of compilation.


compilation OR
Explain structure of compiler. OR
Explain phases of compiler. OR
Write output of phases of a complier. for a = a + b * c * 2; type of
a, b, c are float
There are mainly two parts of compilation process.
1. Analysis phase: TheThemain
mainobjective
objectiveofofthe
the analysis phase isistotobreak
breakthe
thesource
sourcecode
code
into parts and then arranges these pieces into a meaningful structure
structure.
2. Synthesis phase: Synthesis phase is concerned with generation of target language
statement which has the same meaning as the source statement.
Analysis Phase: Analysis part is divided into three sub parts,
I. Lexical analysis
II. Syntax analysis
III. Semantic analysis
Lexical analysis:
 Lexical analysis is also called linear analysis or scanning.
 Lexical analyzer reads the source program and then it is broken into stream of units.
Such units are called token.
 Then it classifies the units into different lexical classes. E.g. id’s, constants, keyword
etc...and enters them into different tables.
 For example, in lexical analysis the assignment statement a: = a= +a b+ *b c* *c 2* 2 would be
grouped into the following tokens:
tokens
a Identifier 1
= Assignment sign
a Identifier 1
+ The plus sign
b Identifier 2
* Multiplication
sign
c Identifier 3
* Multiplication
Dixita Kagathara, CE Department | 170701 – Compiler Design 2
Unit 1 - Introduction
sign
2 Number 2
Syntax Analysis:
 Syntax analysis is also called hierarchical analysis or parsing.
 The syntax analyzer checks each line of the code and spots every tiny mistake that the
programmer has committed while typing the code.
 If code is error free then syntax analyzer generates the tree.
=

+
a

a *

b *

c 2
Semantic analysis:
 Semantic analyzer determines the meaning of a source string.
 For example matching of parenthesis in the expression, or matching of if..else
statement or performing arithmetic operation that are type compatible, or checking the
scope of operation.
=

+
a

a *

b *

c 2

Int to float
Synthesis phase:: synthesis part is divided into three sub parts,
I. Intermediate code generation
II. Code optimization
III. Code generation
Intermediate code generation:
 The intermediate representation should have two important properties, it should be
Dixita Kagathara, CE Department | 170701 – Compiler Design 3
Unit 1 - Introduction
easy to produce and easy to translate into target program.
 We consider intermediate form called “three address code”.
 Three address code consist of a sequence of instruction, each of which has at most
three operands.
 The source program might appear in three address code as,
t1= int to real(2)
t2= id3 * t1
t3= t2 * id2
t4= t3 + id1
id1= t4
Code optimization:
 The code optimization phase attempt to improve the intermediate code.
 This is necessary to have a faster executing code or less consumption of memory.
 Thus by optimizing the code the overall running r unning time of a target program can be
improved.
t1= id3 * 2.0
t2= id2 * t1
id1 = id1 + t2
Code generation:
 In code generation phase the target code gets generated. The intermediate code
instructions are translated into sequence of machine instruction.
instructio
MOV id3, R1
MUL #2.0, R1
MOV id2, R2
MUL R2, R1
MOV id1, R2
ADD R2, R1
MOV R1, id1
Symbol Table
 A symbol table isisaadata
datastructure
structureused
usedbybyaalanguage
languagetranslator
translatorsuch
suchas
asaacompiler
compileror
or
interpreter.
 It is used to store names
namesencountered
encounteredininthe
thesource
sourceprogram,
program,along
alongwith
withthe
therelevant
relevant
attributes for those names.
 Information
tion about following entities is stored in the symbol table.
 Variable/Identifier
 Procedure/function
 Keyword
 Constant
 Class name
 Label name

Dixita Kagathara, CE Department | 170701 – Compiler Design 4


Unit 1 - Introduction

Source program

Lexical Analysis

Syntax Analysis

Semantic Analysis
Symbol Table Error detection
and recovery
Intermediate Code

Code Optimization

Code Generation

Target Program

Fig.1.2. Phases of Compiler

5. The context of a compiler.


compiler OR
Cousins of compiler. OR
What does the linker do? What does the loader do? What does
the Preprocessor do? Explain their role(s) in compilation
process.
 In addition to a compiler, several other programs may be required to create an
executable target program.
Preprocessor
Preprocessor produces input to compiler. They may perform the following functions,
1. Macro processing: A preprocessor may allow user to define
de finemacros
macrosthat
thatare
areshorthand
shorthand
for longer constructs.
2. File inclusion: A preprocessor may include the header file into the program text.
3. Rational preprocessor: Such a preprocessor provides the user with built in macro for
construct like while statement or if
i statement.
4. Language extensions: this processors attempt to add capabilities to the language by
what amount to built-in
built inmacros.
macros.Ex:
Ex:the
thelanguage
languageequal
equalisisaadatabase
databasequery
querylanguage
language
embedded in C. statement beginning with ## are taken by preprocessor to be database
access statement unrelated to C and translated into procedure call on routines that
perform the database access.
Dixita Kagathara, CE Department | 170701 – Compiler Design 5
Unit 1 - Introduction

Skeletal source

Preprocessor

Source program

Compiler

Target assembly

Assembler

Relocatable M/C code

Linker / Loader

Absolute M/C code


Fig.1.3. Context of Compiler
Assembler
Assembler is a translator which takes the assembly program as aninput inputand
andgenerates the
machine code as a output. An assembly is a mnemonic version n ofof machine
machine code,
code, inin which
which
names are used instead of binary codes for operations.
Linker
Linker allows us to make a single program from a several files of relocatable machine code.
These file may have been the result of several
se veraldifferent
differentcompilation,
compilation,and
andone
oneorormore
moremay
maybe be
library files of routine provided by a system.
Loader
The process of loading consists of taking relocatable machine code, altering the relocatable
address and placing the altered instructions and data in memory at the proper location.

6. Explain front end and back end in brief. (Grouping of phases)


 The phases are collected into a front end and back end.
Front end
 The front end consist of those phases, that depends primarily on source language and
largely independent of the target machine.
 Front end includes lexical analysis, syntax analysis, semantic analysis, intermediate code
generation and creation of symbol table.
 Certain amount of code optimization can be done by front end.
Back end
 The back end end consists
consists of
of those
those phases,
phases, that
that depends
depends onon target
target machine
machine and
and do
do not
not
depend on source program.

Dixita Kagathara, CE Department | 170701 – Compiler Design 6


TYPES OF COMPILERS:
Based on the specific input it takes and the output it produces, the Compilers can be classified into
the following types
1) Traditional Compilers(C, C++, Pascal):
These Compilers convert a source program in a HLL into its equivalent in native machine code or
object code.
2)Interpreters(LISP, SNOBOL, Java1.0):
These Compilers first convert Source code into intermediate code, and then interprets (emulates) it
to its equivalent machine code.
3)Cross-Compilers:
These are the compilers that run on one machine and produce code for another machine.
4)Incremental Compilers: These compilers separate the source into user defined–steps;
Compiling/recompiling step- by- step; interpreting steps in a given order
5) Converters (e.g. COBOL to C++):
These Programs will be compiling from one high level language to another.
6)Just-In-Time (JIT) Compilers (Java, Micosoft.NET):
These are the runtime compilers from intermediate language (byte code, MSIL) to executable code
or native machine code. These perform type –based verification which makes the executable code
more trustworthy
7) Ahead-of-Time (AOT) Compilers (e.g., .NET ngen):
These are the pre-compilers to the native code for Java and .NET
8) Binary Compilation:
These compilers will be compiling object code of one platform into object code of another
platform.

LEXICAL ANALYSIS:
→ As the first phase of a compiler, the main task of the lexical analyzer is to read the input
characters of the source program, group them into lexemes, and produce as output tokens for each
lexeme in the source program. This stream of tokens is sent to the parser for syntax analysis. It is
common for the lexical analyzer to interact with the symbol table as well.
→When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that
lexeme into the symbol table. This process is shown in the following figure.

→When lexical analyzer identifies the first token it will send it to the parser, the parser receives the
token and calls the lexical analyzer to send next token by issuing the getNextToken() command.
This Process continues until the lexical analyzer identifies all the tokens. During this process the
lexical analyzer will neglect or discard the white spaces and comment lines.
LEXICAL ANALYSIS Vs PARSING:
There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.
 1. Simplicity of design is the most important consideration . The separation of Lexical
and Syntactic analysis often allows us to simplify at least one of these tasks. For
example, a parser that had to deal with comments and whitespace as syntactic units
would be considerably more complex than one that can assume comments and
whitespace have already been removed by the lexical analyzer.
 2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply
specialized techniques that serve only the lexical task, not the job of parsing. In addition,
specialized buffering techniques for reading input characters can speed up the compiler
significantly.
 3. Compiler portability is enhanced: Input-device-specific peculiarities can be
restricted to the lexical analyzer.
INPUT BUFFERING:

some ways that the simple but important task of reading the source program can be speeded.
This task is made difficult by the fact that we often have to look one or more characters beyond
the next lexeme before we can be sure we have the right lexeme. There are many situations
where we need to look at least one additional character ahead. For instance, we cannot be sure
we've seen the end of an identifier until we see a character that is not a letter or digit, and
therefore is not part of the lexeme for id. In C, single-character operators like -, =, or <
could also be the beginning of a two-character operator like ->, ==, or <=. Thus, we shall
introduce a two-buffer scheme that handles large look aheads safely. We then consider an
improvement involving "sentinels" that saves time checking for the ends of buffers.
Buffer Pairs

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096
bytes. Using one system read command we can read N characters in to a buffer, rather than
using one system call per character. If fewer than N characters remain in the input file, then a
special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.
 Two pointers to the input are maintained:
1. The Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent
we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found; the exact strategy
whereby this determination is made will be covered in the balance of this chapter.

→Once the next lexeme is determined, forward is set to the character at its right end. Then,
after the lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin
is set to the character immediately after the lexeme just found. In Fig, we see forward has passed
the end of the next lexeme, ** (the FORTRAN exponentiation operator), and must be retracted
one position to its left.
Advancing forward requires that we first test whether we have reached the end of one
of the buffers, and if so, we must reload the other buffer from the input, and move forward to
the beginning of the newly loaded buffer. As long as we never need to look so far ahead of the
actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater
than N, we shall never overwrite the lexeme in its buffer before determining it.

LEX the Lexical Analyzer generator


Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool is
referred to as the Lex language and the tool itself is the Lex compiler. Behind the scenes, the
Lex compiler transforms the input patterns into a transition diagram and generates code, in a
file called lex .yy .c, it is a c program given for C Compiler, gives the Object code. Here we need
to know how to write the Lex language. The structure of the Lex program is given below.
Structure of LEX Program : A Lex program has the following form:

The declarations section : includes declarations of variables, manifest constants (identifiers


declared to stand for a constant, e.g., the name of a token), and regular definitions. It appears
between %{. . .%}
→In the Translation rules section, We place Pattern Action pairs where each pair have the form
Pattern {Action}
→The auxiliary function definitions section includes the definitions of functions used to install
identifiers and numbers in the Symbol tale
LEX Program Example:
%{
/* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN, ELSE,ID, NUMBER,
RELOP */
%}
/* regular definitions */
delim [ \t\n]
ws { delim}+
letter [A-Za-z]
digit [o-91
id {letter} ({letter} | {digit}) *
number {digit}+ (\ . {digit}+)? (E [+-I]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(1F) ; }
then {return(THEN) ; }
else {return(ELSE) ; }
(id) {yylval = (int) installID(); return(1D);}
(number) {yylval = (int) installNum() ; return(NUMBER) ; }
‖ < ‖ {yylval = LT; return(REL0P) ; )}
― <=‖ {yylval = LE; return(REL0P) ; }
―=‖ {yylval = EQ ; return(REL0P) ; }
―<>‖ {yylval = NE; return(REL0P);}
―<‖ {yylval = GT; return(REL0P);)}
―<=‖ {yylval = GE; return(REL0P);}
%%
int installID0() {/* function to install the lexeme, whose first character is pointed to by yytext,
and whose length is yyleng, into the symbol table and return a pointer
thereto */
int installNum() {/* similar to installID, but puts numerical constants into a separate table */}

SYNTAX ANALYSIS (PARSER)


THE ROLE OF THE PARSER:
In our compiler model, the parser obtains a string of tokens from the lexical analyzer,
as shown in the below Figure, and verifies that the string of token names can be generated
by the grammar for the source language. We expect the parser to report any syntax errors in
an intelligible fashion and to recover from commonly occurring errors to continue processing the
remainder of the program. Conceptually, for well-formed programs, the parser constructs a parse

tree and passes it to the rest of the compiler for further processing.
→During the process of parsing it may encounter some error and present the error information back
to the user
Syntactic errors include misplaced semicolons or extra or missing braces; that is,
―{" or "}." As another example, in C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error (however, this situation is usually allowed by the
parser and caught later in the processing, as the compiler attempts to generate code).
→Based on the way/order the Parse Tree is constructed, Parsing is basically classified in to
following two types:
1. Top Down Parsing : Parse tree construction start at the root node and moves to the
children nodes (i.e., top down order).
2. Bottom up Parsing: Parse tree construction begins from the leaf nodes and proceeds
towards the root node (called the bottom up order).

LANGUAGE PROCESSING SYSTEM:


Based on the input the translator takes and the output it produces, a language translator can be
called as any one of the following.
Preprocessor: A preprocessor takes the skeletal source program as input and produces an extended
version of it, which is the resultant of expanding the Macros, manifest constants if any, and
including header files etc in the source file.
→For example, the C preprocessor is a macro processor
that is used automatically by the C compiler to transform our source before actual compilation.
Over and above a preprocessor performs the following activities:
 Collects all the modules, files in case if the source program is divided into different modules
stored at different files.
 Expands short hands / macros into source language statements.
Compiler: Is a translator that takes as input a source program written in high level language and
converts it into its equivalent target program in machine language. In addition to above the compiler
also
 Reports to its user the presence of errors in the source program.
 Facilitates the user in rectifying the errors, and execute the code.
Assembler: Is a program that takes as input an assembly language program and converts it into its
equivalent machine language code.
Loader / Linker: This is a program that takes as input a relocatable code and collects the library
functions, relocatable object files, and produces its equivalent absolute machine code.
Specifically,
 Loading consists of taking the relocatable machine code, altering the relocatable addresses,
and placing the altered instructions and data in memory at the proper locations.
 Linking allows us to make a single program from several files of relocatable machine
code. These files may have been result of several different compilations, one or more
may be library routines provided by the system available to any program that needs them.

→in addition to these translators, programs like interpreters, text formatters etc., may be used in
language processing system. To translate a program in a high level language program to an
executable one, the Compiler performs by default the compile and linking functions.
→Normally the steps in a language processing system includes Preprocessing the skeletal Source
program which produces an extended or expanded source program or a ready to compile unit of
the source program, followed by compiling the resultant, then linking / loading , and finally its
equivalent executable code is produced.
→As I said earlier not all these steps are mandatory. In
some cases, the Compiler only performs this linking and loading functions implicitly.

You might also like