0% found this document useful (0 votes)
30 views42 pages

CD Unit 1

Compiler design

Uploaded by

Sneha Giri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views42 pages

CD Unit 1

Compiler design

Uploaded by

Sneha Giri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Vidarbha Youth Welfare Society’s

Prof. Ram Meghe Institute of Technology


and Research, Badnera - Amravati
Compiler Design
Language Processor
• Computer programs are generally written in high-
level languages (like C++, Python, and Java).
• A language processoris a computer program that
convert source code from one programming
language to another language or to machine code.
• They also find errors during translation.
• Compilers, interpreters, translate programs
written in high-level languages into machine code
that a computer understands
and assemblers translate programs written in
assembly language into machine code.
1. Compiler
2. Assembler
• The Assembler is used to translate the program
written in Assembly language into machine
code.
• Mnemonics(instructions) like ADD, MUL,
MUX, SUB, DIV, MOV.
3. Interpreter
• The translation of a single statement of the source
program into machine code is done by a language
processor and executes immediately before moving
on to the next line is called an interpreter.
• If there is an error in the statement, the interpreter
terminates its translating process at that statement and
displays an error message.
Language Processing System
• Preprocessor: It includes all header files and
also evaluates whether a macro(A macro is a
piece of code that is given a name).

• Assembler: The assembler takes the target


code as input and produces real locatable
machine code as output.

• Linker: Linker or link editor is a program that


takes a collection of objects (created by
assemblers and compilers) and combines them
into an executable program.

• Loader: The loader keeps the linked program


in the main memory.
Phases of Compiler/ Structure of
Compiler

The compilation process is a sequence of various


phases. Each phase takes input from its previous
stage, has its own representation of source
program, and feeds its output to the next phase of
the compiler.
Structure of Compiler Compilatio
• Basically divided into 2 parts:
• Analysis Part (Front-end)
• Synthesis Part (Back-end)
Compilation- Front End
It is also known as Analysis Part.
• Lexical Analysis
• Syntax Analysis
• Semantic Analysis
• Intermediate Code Generation
=> Front end is machine independent.
• Compilation- Back End
• It is also known as Synthesis Part.
• Code Optimization
• Code Generation
=> Back end is machine dependent.
• The analysis phase creates an intermediate
representation from the given source code. The
synthesis phase creates an equivalent target
program from the intermediate representation.
Lexical Analysis
• First phase of compiler.
• It converts sequence of characters to tokens.
• Meaningful tokens are called Lexemes.
• It is also called scanner.

Example:
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Syntax Analysis
• Heirarchical analysis or Parsing.
• the tokens are categorized hierarchically into nested
groups.

• Rules-
• Any Identifier is an Expression.
• Any Number is an Expression.
• If E1 & E2 are expression, then E1*E2 & E1+E2 are
expressions.
• If id is identifier & E is an expression then:
Id:=E
Semantic Analysis
• This phase is used to check whether the
components of the source program are
meaningful or not.
• It validates the syntax tree by applying rules &
regulations of the target language.
• It does type checking, scope resolution, variable
declaration, etc.
• It decorates the syntax tree by putting data types,
values, etc.
Semantic Analysis
Example:-
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Intermediate Code Generation
• After syntax and semantic analysis of the source
program, many compilers generate an explicit low-
level or machine-like intermediate representation.
• This intermediate representation should have two
important properties: – it should be easy to produce
and it should be easy to translate into the target
machine.
• Intermediate representation has variety of forms one
such representation is three address code.
• Each instruction has at most 3 operands.
• Consider intermediate instruction of 3 address
code-
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)

Intermediate code is-


Temp1:= int to real(60)
Temp2:= id3* Temp1
Temp3:=id2+ Temp2
id1:=Temp3
Code Optimization
• It aims to reduce process timings of any program.
• It produces efficient programming code.
• It is an optional phase.
• Results is faster running machine code.

• Example
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)
Optimized code is-
Temp1:=id3 * 60.0
id1:=id2+Temp1
Code Generation
• Consists of relocatable assembly code or machine code.
• Example
Position:= initial+ rate * 60
id1:= id2 + id3 * 60 (tokens)

• Target Code-
MOVF id3,R2
MULF #60.0,R2
MOVF id2,R1
ADDF R2,R1
MOVF R1,id1
Symbol Table
• It stores identifiers identified in lexical analysis.
• A Symbol table is a data structure containing a
record for each identifier, with fields for the
attributes of the identifier.
• The data structure allows us to find the record for
each identifier quickly and to store or retrieve
data from that record quickly.
• It adds type and scope information during
syntactical and semantical analysis.
• This info is used in code generation to find which
instructions to use.
Error Handler
• Error can be reported in the form of massage.

• It should report the presence of errors clearly


and accurately.

• It should recover from each error quickly


enough to be able to detect subsequent errors
The Role of Lexical Analyzer
• Lexical Analysis process of taking an input
string of characters and producing a sequence
of symbols called lexical tokens, or just tokens.
• The lexical analyzer reads the source text and,
thus, it may perform certain secondary tasks:
1. Eliminate comments and white spaces in
the form of blanks, tab and newline characters.
2. Correlate errors messages from the
compiler with the source program (eg, keep
track of the number of lines).
• Secondary tasks of compiler-

1. Removal of comments
eg. int main()
{
// a=10;
int a=10;
}
2. Removal of white spaces
eg. int main()
{
// a=10;
int a= 10;
}
int a=10;
3. Correlates error messages with the source
program.
Token, Pattern, Lexeme
• Token: Sequence of characters that have a
collective meaning. Typical tokens are,
1) Identifiers
2) Keywords
3) Operators
4) Special symbols
5) Constants
Example of Tokens-
• Keywords-for, while, if etc.
• Identifier-Variable name, function name, etc.
• Operators- '+', '++', '-' etc.
Example of Non-Tokens:
• Comments,blanks, tabs, newline, etc
• Pattern: A set of rules used for identifying
various tokens presents in source program is
called pattern.
• Lexeme: A lexeme is a sequence of characters
in the source program that is matched by the
pattern for a token.
OR
Lexemes are said to be a sequence of characters
(alphanumeric) in a token.
Example of Tokens-
Consider this expression in the programming language
C:
sum = 3 + 2;
Lexeme Token category
sum Identifier
= Assignment operator
3 Integer literal
+ Addition operator
2 Integer literal
; End of statement
Token Lexeme Pattern
letter followed by letters
ID x y n0
and digits

NUM -123 1.456e-5 any numeric constant

IF if if

LPAREN ( (

any string of characters


LITERAL ``Hello'' (except ``) between ``
and `
Attributes of Token
• When more than one lexeme can match a
pattern, a lexical analyzer must provide the
compiler additional information about that
lexeme matched.
• Information about identifiers, its lexeme, type
and location at which it was first found is kept
in symbol table.
• The appropriate attribute value for an identifier
is a pointer to the symbol table entry for that
identifier.
Example-
position:=initial + rate *60

<id, pointer to symbol table entry for position>


<assign _ op>
<id, pointer to symbol table entry for initial>
<addition _ op>
<id, pointer to symbol table entry for rate>
<multiplication _ op>
<num, integer value >
Specifications of Token

• Symbol
• String
• Length of String
• Prefix and Suffix of a string
• Concatenation of two string
• Reverse of string
• Operation on string
• Alphabets
• Language
Recognition of Tokens

– Recognition of identifier.
– Recognition of delimiter.
– Recognition of keywords.
– Recognition of operators.
– Recognition of numbers.
Input Buffer
• Input buffering is done for increasing efficiency of
the compiler.

• The amount of time taken to process characters and


the large number of characters that must be processed
during the compilation of a source program,
specialized buffering techniques have been developed
to reduce the amount of overhead required to process
a single input character.

• Input Buffering is used to speedup the lexical analysis


process.
• Two buffer input scheme that is useful when look
ahead is necessary-
• Buffer Pairs
• Sentinels

• The lexical analyzer scans the input from left to right


one character at a time. It uses two pointers begin
ptr(bp) and forward (fp) to keep track of the pointer
of the input scanned.
• Initially both the pointers point to the first
character of the input string –
• Forward pointer scans ahead until a match for
a pattern is found. Once the next lexeme is
determined, processed and both pointers are
set to the character immediately past the
lexeme.
Buffer Pair
• A buffer (array) divided into two N-character
halves of, say 1024 characters each, where
N=number of characters on one disk block &
‘eof’ marks the end of source file
Sentinels
• It is an extra key inserted at the end of the
array. It is a special, dummy character that
can’t be part of source program.

You might also like