0% found this document useful (0 votes)
105 views

Introduction To Compiler Design-Unit I

The document discusses the design of compilers. It describes the key stages in compiling a program from source code into an executable format: 1) Lexical analysis breaks the source code into tokens by identifying lexemes like keywords, identifiers, and punctuation. 2) Syntax analysis parses the tokens according to the language's grammar rules to create an intermediate representation like a syntax tree. 3) Semantic analysis checks that the program follows the language's semantic rules by type checking the syntax tree. 4) Code generation translates the intermediate representation into the target machine language. Stages like intermediate code generation and optimization may occur between semantic analysis and code generation.

Uploaded by

KDCreatives
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Introduction To Compiler Design-Unit I

The document discusses the design of compilers. It describes the key stages in compiling a program from source code into an executable format: 1) Lexical analysis breaks the source code into tokens by identifying lexemes like keywords, identifiers, and punctuation. 2) Syntax analysis parses the tokens according to the language's grammar rules to create an intermediate representation like a syntax tree. 3) Semantic analysis checks that the program follows the language's semantic rules by type checking the syntax tree. 4) Code generation translates the intermediate representation into the target machine language. Stages like intermediate code generation and optimization may occur between semantic analysis and code generation.

Uploaded by

KDCreatives
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

COMPILER DESIGNING

INTRODUCTION
Introduction to Compiler:
⚫Programming languages are notations for
describing computations to people and to
machines. The world as we know it depends on
programming languages, because all the
software running on all the computers was
written in some programming language. But,
before a program can be run, it first must be
translated into a form in which it can be
executed by a computer.
⚫The software systems that do this translation
are called compilers.
Language Processor
a compiler is a program that can read a program in one
language | the source language | and translate it into an
equivalent program in another language | the target
language
⚫If the target program is an executable machine-language
program, it can then be called by the user to process inputs
and produce outputs

⚫An interpreter is another common kind of language


processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute the
operations specfied in the source program on inputs
supplied by the user.
Example 1.1
⚫Java language processors combine compilation and
interpretation, as shown in Fig. 1.4. A Java source program
may first be compiled into an intermediate form called
bytecodes. The linker resolves
external memory
addresses, where the
code in one file may
refer to a location in
another file.
The loader then puts
together all of the
executable object files
into memory for
execution.
The Structure of a Compiler
⚫The analysis part breaks up the source program into
constituent pieces and imposes a grammatical structure on
them. It then uses this structure to create an intermediate
representation of the source program.
⚫The analysis part also collects information about the source
program and stores it in a data structure called a symbol
table, which is passed along with the intermediate
representation to the synthesis part.
⚫The synthesis part constructs the desired target program
from the intermediate representation and the information in
the symbol table. The analysis part is often called the front
end of the compiler; the synthesis part is the back end
1. Lexical Analysis
⚫The first phase of a compiler is called lexical analysis
or scanning. The lexical analyzer reads the stream of
characters making up the source program and groups
the characters into meaningful sequences called
lexemes. For each lexeme, the lexical analyzer
produces as output a token of the form
{token-name; attribute-value}
▪ For example,
position = initial + rate * 60
⚫ The following tokens passed on to the syntax analyzer:
1. position is a lexeme that would be mapped into a token hid; 1i, where id is an
abstract symbol standing for identier and 1 points to the symboltable entry for
position. The symbol-table entry for an identier holds information about the
identier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token h=i. Since
this token needs no attribute-value, we have omitted the second component.
We could have used any abstract symbol such as assign for the token-name,
but for notational convenience we have chosen to use the lexeme itself as the
name of the abstract symbol.
3. initial is a lexeme that is mapped into the token hid; 2i, where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token h+i.
5. rate is a lexeme that is mapped into the token hid; 3i, where 3 points to the
symbol-table entry for rate.
6. * is a lexeme that is mapped into the token hi.
7. 60 is a lexeme that is mapped into the token <60>.
<id; 1> <= > <id, 2> <+> < id, 3> <*> <60>
2. Syntax Analysis
⚫The second phase of the compiler is syntax analysis or
parsing. The parser uses the first components of the
tokens produced by the lexical analyzer to create a
tree-like intermediate representation that depicts the
grammatical structure of the token stream.
⚫A typical representation is a syntax tree in which each
interior node represents an operation and the children
of the node represent the arguments of the operation.

position = initial + rate * 60


3. Semantic Analysis
⚫The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
⚫An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operand
⚫The language specification may permit some type
conversions called coercions.
4. Intermediate Code Generator
⚫In the process of translating a source program into target code,
a compiler may construct one or more intermediate
representations, which can have a variety of forms. Syntax
trees are a form of intermediate representation; they are
commonly used during syntax and semantic analysis.
⚫An intermediate form called three-address code, which
consists of a sequence of assembly-like instructions with three
operands per instruction.
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
5. Code Optimization
⚫The machine-independent code-optimization phase attempts
to improve the intermediate code so that better target code
will result.
⚫Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes
less power.
⚫A simple intermediate code generation algorithm followed
by code optimization is a reasonable way to generate good
target code. t1 = inttofloat(60)

t2 = id3 * t1
t1 = id3 * 60.0 t3 = id2 + t2
id1 = id2 + t1 id1 = t3
6. Code Generation
⚫The code generator takes as input an intermediate representation
of the source program and maps it into the target language.
⚫If the target language is machine code, registers or memory
locations are selected for each of the variables used by the
program.
⚫A crucial aspect of code generation is the judicious assignment
of registers to hold variables.
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
⚫The F in each instruction tells us that it deals with oating-
point numbers.
⚫LD-Loads
⚫MUL-Multiply
⚫ADD-Addition
⚫ST-Store

LDF R2, id3


MULF R2, R2,
t1 = id3 * 60.0
#60.0
id1 = id2 + t1
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
Finite automata and lexical Analysis:
⚫A lexical analyzer automatically by specifying the
lexeme patterns to a lexical-analyzer generator and
compiling those patterns into code that functions as a
lexical analyzer.
⚫It also speeds up the process of implementing the
lexical analyzer, since the programmer species the
software at the very high level of patterns and relies on
the generator to produce the detailed code.
⚫A lexical-analyzer generator called Lex.
The Role of the Lexical Analyzer:
⚫As the first phase of a compiler, the main task of the
lexical analyzer is to read the input characters of the
source program, group them into lexemes, and
produce as output a sequence of tokens for each
lexeme in the source program. The stream of tokens is
sent to the parser for syntax analysis.
⚫Commonly, the interaction is implemented by having
the parser call the lexical analyzer.
⚫The call, suggested by the getNextToken command,
causes the lexical analyzer to read characters from its
input until it can identify the next lexeme and produce
for it the next token, which it returns to the parser.
The Role of the Lexical Analyzer:

⚫ Since the lexical analyzer is the part of the compiler that reads the source
text, it may perform certain other tasks besides identification of lexemes.
⚫ One such task is stripping out comments and whitespace (blank, newline,
tab, and perhaps other characters that are used to separate tokens in the
input).
lexical analyzers are divided into a cascade of two
processes:

⚫ Scanning consists of the simple processes that do not require


tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.
⚫ Lexical analysis proper is the more complex portion, which
produces tokens from the output of the scanner.
A) Lexical Analysis Versus Parsing
⚫ There are a number of reasons why the analysis portion of a
compiler is normally separated into lexical analysis and parsing
(syntax analysis) phases.
I. Simplicity of design is the most important consideration.
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
II. Compiler efficiency is improved.
III.Compiler portability is enhanced. Input-device-specific
peculiarities can be restricted to the lexical analyzer.
B) Tokens, Patterns, and Lexemes
⚫ A token is a pair consisting of a token name and an optional
attribute value. The token name is an abstract symbol
representing a kind of lexical unit. The token names are the
input symbols that the parser processes..
⚫ A pattern is a description of the form that the lexemes of a
token may take. In the case of a keyword as a token, the pattern
is just the sequence of characters that form the keyword. For
identifiers and some other tokens, the pattern is a more complex
structure that is matched by many strings.
⚫ A lexeme is a sequence of characters in the source program that
matches the pattern for a token and is identified by the lexical
analyzer as an instance of that token.
Example
⚫ To see how these concepts are used in practice, in the C
statement
printf("Total = %d\n", score);
both printf and score are lexemes matching the pattern for token
id, and "Total = %d\n" is a lexeme matching literal.
C) Attributes for Tokens
⚫ One lexeme can match a pattern, the lexical analyzer must
provide the subsequent compiler phases additional information
about the particular lexeme that matched.
⚫ Example

E = M * C**2
⚫ are written below as a sequence of pairs.
D) Lexical Errors
⚫ It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error.
⚫ For instance, if the string fi is encountered for the first time in a C
program in the context:
fi ( a == f(x)) ...
⚫ A lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.
⚫ Other possible error-recovery actions are:
a) Delete one character from the remaining input.
b) Insert a missing character into the remaining input.
c) Replace a character by another character.
d) Transpose two adjacent characters.
THANK YOU

You might also like