Introduction To Compiler Design-Unit I

The document discusses the design of compilers. It describes the key stages in compiling a program from source code into an executable format: 1) Lexical analysis breaks the source code into tokens by identifying lexemes like keywords, identifiers, and punctuation. 2) Syntax analysis parses the tokens according to the language's grammar rules to create an intermediate representation like a syntax tree. 3) Semantic analysis checks that the program follows the language's semantic rules by type checking the syntax tree. 4) Code generation translates the intermediate representation into the target machine language. Stages like intermediate code generation and optimization may occur between semantic analysis and code generation.

Uploaded by

KDCreatives

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views

Introduction To Compiler Design-Unit I

Uploaded by

KDCreatives

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

COMPILER DESIGNING

INTRODUCTION
Introduction to Compiler:
⚫Programming languages are notations for
describing computations to people and to
machines. The world as we know it depends on
programming languages, because all the
software running on all the computers was
written in some programming language. But,
before a program can be run, it first must be
translated into a form in which it can be
executed by a computer.
⚫The software systems that do this translation
are called compilers.
Language Processor
a compiler is a program that can read a program in one
language | the source language | and translate it into an
equivalent program in another language | the target
language
⚫If the target program is an executable machine-language
program, it can then be called by the user to process inputs
and produce outputs

⚫An interpreter is another common kind of language

processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute the
operations specfied in the source program on inputs
supplied by the user.
Example 1.1
⚫Java language processors combine compilation and
interpretation, as shown in Fig. 1.4. A Java source program
may first be compiled into an intermediate form called
bytecodes. The linker resolves
external memory
addresses, where the
code in one file may
refer to a location in
another file.
The loader then puts
together all of the
executable object files
into memory for
execution.
The Structure of a Compiler
⚫The analysis part breaks up the source program into
constituent pieces and imposes a grammatical structure on
them. It then uses this structure to create an intermediate
representation of the source program.
⚫The analysis part also collects information about the source
program and stores it in a data structure called a symbol
table, which is passed along with the intermediate
representation to the synthesis part.
⚫The synthesis part constructs the desired target program
from the intermediate representation and the information in
the symbol table. The analysis part is often called the front
end of the compiler; the synthesis part is the back end
1. Lexical Analysis
⚫The first phase of a compiler is called lexical analysis
or scanning. The lexical analyzer reads the stream of
characters making up the source program and groups
the characters into meaningful sequences called
lexemes. For each lexeme, the lexical analyzer
produces as output a token of the form
{token-name; attribute-value}
▪ For example,
position = initial + rate * 60
⚫ The following tokens passed on to the syntax analyzer:
1. position is a lexeme that would be mapped into a token hid; 1i, where id is an
abstract symbol standing for identier and 1 points to the symboltable entry for
position. The symbol-table entry for an identier holds information about the
identier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token h=i. Since
this token needs no attribute-value, we have omitted the second component.
We could have used any abstract symbol such as assign for the token-name,
but for notational convenience we have chosen to use the lexeme itself as the
name of the abstract symbol.
3. initial is a lexeme that is mapped into the token hid; 2i, where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token h+i.
5. rate is a lexeme that is mapped into the token hid; 3i, where 3 points to the
symbol-table entry for rate.
6. * is a lexeme that is mapped into the token hi.
7. 60 is a lexeme that is mapped into the token <60>.
<id; 1> <= > <id, 2> <+> < id, 3> <*> <60>
2. Syntax Analysis
⚫The second phase of the compiler is syntax analysis or
parsing. The parser uses the first components of the
tokens produced by the lexical analyzer to create a
tree-like intermediate representation that depicts the
grammatical structure of the token stream.
⚫A typical representation is a syntax tree in which each
interior node represents an operation and the children
of the node represent the arguments of the operation.

position = initial + rate * 60

3. Semantic Analysis
⚫The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
⚫An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operand
⚫The language specification may permit some type
conversions called coercions.
4. Intermediate Code Generator
⚫In the process of translating a source program into target code,
a compiler may construct one or more intermediate
representations, which can have a variety of forms. Syntax
trees are a form of intermediate representation; they are
commonly used during syntax and semantic analysis.
⚫An intermediate form called three-address code, which
consists of a sequence of assembly-like instructions with three
operands per instruction.
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
5. Code Optimization
⚫The machine-independent code-optimization phase attempts
to improve the intermediate code so that better target code
will result.
⚫Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes
less power.
⚫A simple intermediate code generation algorithm followed
by code optimization is a reasonable way to generate good
target code. t1 = inttofloat(60)

t2 = id3 * t1
t1 = id3 * 60.0 t3 = id2 + t2
id1 = id2 + t1 id1 = t3
6. Code Generation
⚫The code generator takes as input an intermediate representation
of the source program and maps it into the target language.
⚫If the target language is machine code, registers or memory
locations are selected for each of the variables used by the
program.
⚫A crucial aspect of code generation is the judicious assignment
of registers to hold variables.
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
⚫The F in each instruction tells us that it deals with oating-
point numbers.
⚫LD-Loads
⚫MUL-Multiply
⚫ADD-Addition
⚫ST-Store

LDF R2, id3

MULF R2, R2,
t1 = id3 * 60.0
#60.0
id1 = id2 + t1
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
Finite automata and lexical Analysis:
⚫A lexical analyzer automatically by specifying the
lexeme patterns to a lexical-analyzer generator and
compiling those patterns into code that functions as a
lexical analyzer.
⚫It also speeds up the process of implementing the
lexical analyzer, since the programmer species the
software at the very high level of patterns and relies on
the generator to produce the detailed code.
⚫A lexical-analyzer generator called Lex.
The Role of the Lexical Analyzer:
⚫As the first phase of a compiler, the main task of the
lexical analyzer is to read the input characters of the
source program, group them into lexemes, and
produce as output a sequence of tokens for each
lexeme in the source program. The stream of tokens is
sent to the parser for syntax analysis.
⚫Commonly, the interaction is implemented by having
the parser call the lexical analyzer.
⚫The call, suggested by the getNextToken command,
causes the lexical analyzer to read characters from its
input until it can identify the next lexeme and produce
for it the next token, which it returns to the parser.
The Role of the Lexical Analyzer:

⚫ Since the lexical analyzer is the part of the compiler that reads the source
text, it may perform certain other tasks besides identification of lexemes.
⚫ One such task is stripping out comments and whitespace (blank, newline,
tab, and perhaps other characters that are used to separate tokens in the
input).
lexical analyzers are divided into a cascade of two
processes:

⚫ Scanning consists of the simple processes that do not require

tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.
⚫ Lexical analysis proper is the more complex portion, which
produces tokens from the output of the scanner.
A) Lexical Analysis Versus Parsing
⚫ There are a number of reasons why the analysis portion of a
compiler is normally separated into lexical analysis and parsing
(syntax analysis) phases.
I. Simplicity of design is the most important consideration.
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
II. Compiler efficiency is improved.
III.Compiler portability is enhanced. Input-device-specific
peculiarities can be restricted to the lexical analyzer.
B) Tokens, Patterns, and Lexemes
⚫ A token is a pair consisting of a token name and an optional
attribute value. The token name is an abstract symbol
representing a kind of lexical unit. The token names are the
input symbols that the parser processes..
⚫ A pattern is a description of the form that the lexemes of a
token may take. In the case of a keyword as a token, the pattern
is just the sequence of characters that form the keyword. For
identifiers and some other tokens, the pattern is a more complex
structure that is matched by many strings.
⚫ A lexeme is a sequence of characters in the source program that
matches the pattern for a token and is identified by the lexical
analyzer as an instance of that token.
Example
⚫ To see how these concepts are used in practice, in the C
statement
printf("Total = %d\n", score);
both printf and score are lexemes matching the pattern for token
id, and "Total = %d\n" is a lexeme matching literal.
C) Attributes for Tokens
⚫ One lexeme can match a pattern, the lexical analyzer must
provide the subsequent compiler phases additional information
about the particular lexeme that matched.
⚫ Example

E = M * C**2
⚫ are written below as a sequence of pairs.
D) Lexical Errors
⚫ It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error.
⚫ For instance, if the string fi is encountered for the first time in a C
program in the context:
fi ( a == f(x)) ...
⚫ A lexical analyzer cannot tell whether fi is a misspelling of the
keyword if or an undeclared function identifier.
⚫ Other possible error-recovery actions are:
a) Delete one character from the remaining input.
b) Insert a missing character into the remaining input.
c) Replace a character by another character.
d) Transpose two adjacent characters.
THANK YOU

C Programming: Core Concepts and Techniques
From Everand
C Programming: Core Concepts and Techniques
William Smith
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Mastering C: A Comprehensive Guide to Programming Excellence
From Everand
Mastering C: A Comprehensive Guide to Programming Excellence
THE NORTHERN HIMALAYAS
No ratings yet
Control Structures in Visual Basic 6
No ratings yet
Control Structures in Visual Basic 6
3 pages
Flat Online Bits (Mid-I)
No ratings yet
Flat Online Bits (Mid-I)
6 pages
WIDT UNIT-III
100% (1)
WIDT UNIT-III
50 pages
Theory of Automata Lecture 1
No ratings yet
Theory of Automata Lecture 1
51 pages
Lab Manual
No ratings yet
Lab Manual
20 pages
Lab 2 - Function and Array
No ratings yet
Lab 2 - Function and Array
4 pages
MODULE 2: Input / Output Organization: Courtesy: Text Book: Carl Hamacher 5 Edition
No ratings yet
MODULE 2: Input / Output Organization: Courtesy: Text Book: Carl Hamacher 5 Edition
95 pages
CS-4022 Computer Organization and Assembly Language Programming
No ratings yet
CS-4022 Computer Organization and Assembly Language Programming
2 pages
Compiler Design
No ratings yet
Compiler Design
47 pages
MCSE 204: Adina Institute of Science & Technology
No ratings yet
MCSE 204: Adina Institute of Science & Technology
16 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
NOTES
No ratings yet
NOTES
156 pages
2-4 Steps in Developing Application:: Visual Basic
No ratings yet
2-4 Steps in Developing Application:: Visual Basic
12 pages
Model Question Paper
No ratings yet
Model Question Paper
2 pages
Purpose of Language Processors
100% (2)
Purpose of Language Processors
7 pages
Finaljava - Copy - Book (1) - Pages-8-111
No ratings yet
Finaljava - Copy - Book (1) - Pages-8-111
104 pages
Java Notes
No ratings yet
Java Notes
65 pages
C++ Lab Manual
No ratings yet
C++ Lab Manual
26 pages
CH 1 Categories of Computers and Computer Languages Presentation
No ratings yet
CH 1 Categories of Computers and Computer Languages Presentation
28 pages
Active Server Pages PDF
No ratings yet
Active Server Pages PDF
34 pages
1) Explain Briefly About The Four Major Phases of Unified Process With Neat Diagram. The Four Phases
No ratings yet
1) Explain Briefly About The Four Major Phases of Unified Process With Neat Diagram. The Four Phases
8 pages
Context Free Grammar
100% (1)
Context Free Grammar
65 pages
Module-2 With Content
No ratings yet
Module-2 With Content
35 pages
React Intro
No ratings yet
React Intro
45 pages
Web Development Using PHP
No ratings yet
Web Development Using PHP
65 pages
Programming Language
No ratings yet
Programming Language
1 page
Compiler Report
No ratings yet
Compiler Report
15 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Evolution of Computer Architecture
0% (1)
Evolution of Computer Architecture
6 pages
15CS31T
No ratings yet
15CS31T
114 pages
CG Lab File
No ratings yet
CG Lab File
46 pages
Machine Structure SP
No ratings yet
Machine Structure SP
15 pages
137673885assignments For Class Xii Ip PDF
No ratings yet
137673885assignments For Class Xii Ip PDF
52 pages
Data Structure Calicut University 2nd Sem
No ratings yet
Data Structure Calicut University 2nd Sem
20 pages
Com 413 Compiler - Notes1-1
No ratings yet
Com 413 Compiler - Notes1-1
6 pages
Classes and Objects: Chapter No.: 2
No ratings yet
Classes and Objects: Chapter No.: 2
55 pages
Chapter 3 Regular Expression
No ratings yet
Chapter 3 Regular Expression
25 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
Computer Network Lab PDF
No ratings yet
Computer Network Lab PDF
79 pages
ACP Question Bank
No ratings yet
ACP Question Bank
5 pages
System Software 2 Marks and 16 Marks With Answer
No ratings yet
System Software 2 Marks and 16 Marks With Answer
23 pages
Data Structures - 2 Marks
No ratings yet
Data Structures - 2 Marks
23 pages
Chap 1 Dhamdhere
75% (4)
Chap 1 Dhamdhere
84 pages
The Basic C Structure
No ratings yet
The Basic C Structure
8 pages
Control Statements in Java
No ratings yet
Control Statements in Java
10 pages
Features of Java - Core Java Tutorial - Studytonight
100% (1)
Features of Java - Core Java Tutorial - Studytonight
4 pages
Introduction To Blue Tooth Networking
No ratings yet
Introduction To Blue Tooth Networking
47 pages
Spelling and Grammar: Microsoft Word
No ratings yet
Spelling and Grammar: Microsoft Word
4 pages
Assignment List For Python
No ratings yet
Assignment List For Python
8 pages
Process of Execution of A Program:: Compiler Design
No ratings yet
Process of Execution of A Program:: Compiler Design
26 pages
GATE Compiler Design 93-2009
67% (3)
GATE Compiler Design 93-2009
12 pages
Unit II WT Notes
No ratings yet
Unit II WT Notes
32 pages
C# Coding Best Practices
No ratings yet
C# Coding Best Practices
13 pages
C# Program To Demonstrate Multilevel Inheritance
No ratings yet
C# Program To Demonstrate Multilevel Inheritance
5 pages
FSD Unit2
No ratings yet
FSD Unit2
41 pages
Pgdca 2 Sem PDF
No ratings yet
Pgdca 2 Sem PDF
11 pages
Bluetooth Profiles: What Is A Bluetooth Profile?
No ratings yet
Bluetooth Profiles: What Is A Bluetooth Profile?
20 pages
CSCI2100 Project
No ratings yet
CSCI2100 Project
7 pages
csr8670-audio-flash-product-brief_87-ce851-1-b(1)
No ratings yet
csr8670-audio-flash-product-brief_87-ce851-1-b(1)
2 pages
RMS Documentation 5th Sem Project
No ratings yet
RMS Documentation 5th Sem Project
21 pages
9608 - w19 - 2 - 2 - QP Computer Science
No ratings yet
9608 - w19 - 2 - 2 - QP Computer Science
20 pages
TLE 3rd Quarte EXAMINATION
No ratings yet
TLE 3rd Quarte EXAMINATION
2 pages
Lastexception 63769201389
No ratings yet
Lastexception 63769201389
8 pages
B.1 Types of Flowcharts: Saturday, 29 August 2020 1:13 PM
No ratings yet
B.1 Types of Flowcharts: Saturday, 29 August 2020 1:13 PM
14 pages
Mat2004 Operation-Research LT 1.0 1 Mat2004
No ratings yet
Mat2004 Operation-Research LT 1.0 1 Mat2004
2 pages
GATEZEE An Automated Gate Pass Management System
No ratings yet
GATEZEE An Automated Gate Pass Management System
9 pages
Jyoti Resume - 1
No ratings yet
Jyoti Resume - 1
1 page
7 Z
No ratings yet
7 Z
5 pages
Threat Hunting Workshop Public
No ratings yet
Threat Hunting Workshop Public
58 pages
Checklist - Vendor Risk Assessment Template
No ratings yet
Checklist - Vendor Risk Assessment Template
4 pages
Assignment Questions 2
No ratings yet
Assignment Questions 2
2 pages
Web Application Security Adithyan AK
No ratings yet
Web Application Security Adithyan AK
64 pages
Advanced Mechatronics of Courseguide Book2021
100% (1)
Advanced Mechatronics of Courseguide Book2021
3 pages
Pronto Xi 740 Solutions Overview 9 Technology
100% (1)
Pronto Xi 740 Solutions Overview 9 Technology
32 pages
Chapter 2 - SQA Components in The Project Life Cycle
No ratings yet
Chapter 2 - SQA Components in The Project Life Cycle
113 pages
CH2. PLL Linear Model
No ratings yet
CH2. PLL Linear Model
43 pages
Semistandard E005!00!0704 (Secs II) Leebs5520
No ratings yet
Semistandard E005!00!0704 (Secs II) Leebs5520
267 pages
Aircraft Condition Monitoring System
No ratings yet
Aircraft Condition Monitoring System
5 pages
OTN Systems Introduction: WWW - Pit.or - KR
No ratings yet
OTN Systems Introduction: WWW - Pit.or - KR
18 pages
Item Based Recommendation System
No ratings yet
Item Based Recommendation System
42 pages
Specification EVOLUTIONneo Eco
No ratings yet
Specification EVOLUTIONneo Eco
5 pages
Emtech Module3 Week3 Lesson
No ratings yet
Emtech Module3 Week3 Lesson
9 pages
Process Synchronization
No ratings yet
Process Synchronization
31 pages
GA-EP45T-UD3LR: User's Manual
No ratings yet
GA-EP45T-UD3LR: User's Manual
112 pages
MAD Project
No ratings yet
MAD Project
14 pages
Video Decoder Manual Clear Com
No ratings yet
Video Decoder Manual Clear Com
9 pages
Ip Project - To-Do-List
100% (2)
Ip Project - To-Do-List
26 pages