0% found this document useful (0 votes)

26 views39 pages

CDUnit 1

Uploaded by

kusumelekhaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views39 pages

CDUnit 1

Uploaded by

kusumelekhaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

UNIT-I: Introduction:

Language Processors
The Structure of a Compiler
 Lexical Analysis: The Role of the Lexical
Analyzer
Specification of Tokens
 Recognition of Tokens
 The Lexical-Analyzer Generator Lex.

Compiler Design 1
Compiler :

A compiler is a program that can read a program in one

language i.e. source language and translate it into an
equivalent program in another language i.e. target
language

Compiler Design 2
If the target program is an executable machine-language
program, it can then be called by the user to process inputs
and produce outputs

Interpreter :
An interpreter is another common kind of language
processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute the
operations specified in the source program on inputs
supplied by the user

Compiler Design 3
For example Java language processors combine compilation and
interpretation A Java source program may first be compiled into an
intermediate form called bytecodes. The bytecodes are then
interpreted by a virtual machine.

A benefit of this arrangement is that bytecodes compiled on one

machine can be interpreted on another machine, perhaps across a
network. In order to achieve faster processing of inputs to outputs.

Compiler Design 4
Language Processors:
In addition to a compiler, several other programs may be required to
create an executable target program as shown in Fig

Compiler Design 5
Preprocessor :
The preprocessor may also expand shorthands, called macros, into
source language statements. The modified source program is then fed
to a compiler.

Compiler :
The compiler may produce an assembly-language program as its
output, because assembly language is easier to produce as output and
is easier to debug.

Assembler :
The assembly language is then processed by a program called
an assembler that
produces relocatable machine code as its output.

Linkers and Loaders :

Large programs are often compiled in pieces, so the relocatable
machine code may have to be linked together with other relocatable
object files and library files into the code that actually runs on the
machine.
The linker resolves external memory addresses, where the code in one
file may refer to a Compiler
location in another file.
Design 6
Structure of a compiler :

There are two major parts of a compiler:

Analysis
Synthesis

In analysis phase, an intermediate representation is created from the

given source program.
Lexical Analyzer ,Syntax Analyzer and Semantic Analyzer are the
parts of this phase.
In synthesis phase, the equivalent target program is created from this
intermediate representation.
Intermediate Code Generator, Code Generator, and Code Optimizer
are the parts of this phase.

Compiler Design 7
Phases of a compiler:

• Compiler consists of 6 phases

•Each phase transforms the source program from one representation

into another representation.

• They communicate with error handlers.

• They communicate with the symbol table.

Compiler Design 8
Compiler Design 9
Lexical Analysis :

•Lexical analyzer phase is the first phase of compilation process.

•Lexical Analyzer reads the stream of characters making up the source
program and group the characters into meaningful sequences called
Lexeme

•For each lexeme, the lexical analyzer produces a token of the form
that it passes on to the subsequent phase, syntax analysis
<token-name, attribute-value>

Token-name: an abstract symbol is used during syntax analysis, an

attribute-value: points to an entry in the symbol table for this token

•Puts information about identifiers into the symbol table.

•Example: position =initial + rate * 60

Compiler Design 10
Syntax analysis :

•Syntax analysis is the second phase of compilation process.

• It takes tokens as input and generates a parse tree as output. In
syntax analysis phase, the parser checks that the expression made
by the tokens is syntactically correct or not.

•A typical representation is a syntax tree in which each interior

node represents an operation and the children of the node
represent the arguments of the operation

Compiler Design 11
Semantic analysis :

•Semantic analysis is the third phase of compilation process.

•It checks whether the parse tree follows the rules of language.
•Semantic analyzer keeps track of identifiers, their types and
expressions.

•The output of semantic analysis phase is the annotated tree syntax.

Compiler Design 12
Intermediate Code Generation :

•In the intermediate code generation, compiler generates the source

code into the intermediate code.

•Intermediate code is generated between the high-level language and

the machine language.

•The intermediate code should be generated in such a way that you can
easily translate it into the target machine code.

Compiler Design 13
Code Optimization :

•Code optimization is used to improve the intermediate code so that

the output of the program could run faster and take less space.
•It removes the unnecessary lines of the code and arranges the
sequence of statements in order to speed up the program execution.

Code Generation :

•Code generation is the final stage of the compilation process. It takes

the optimized intermediate code as input and maps it to the target
machine language.
• Code generator translates the intermediate code into the machine
code of the specified computer.

Compiler Design 14
Compiler Design 15
Lexical Analysis :
•The first phase of a compiler
•The main task of the lexical analyzer is to read the input characters of
the source program, group them into lexemes , and produce as output
a sequence of tokens for each lexeme in the source program.

•The stream of tokens is sent to the parser for syntax analysis

•The lexical analyzer to interact with the symbol table
•One such task is stripping out comments and whitespace (blank,
newline, tab, and perhaps other characters that are used to separate
tokens in the input).

Compiler Design 16
The role of lexical analyzer :

token
Source Lexical Parse
Parser
program Analyzer tree
getNextToken

Symbol
table

Compiler Design 17
Lexical Analysis Versus Parsing
 Simplicity of design is the most important consideration.
 Compiler efficiency is improved
 Compiler portability is enhanced.

Compiler Design 18
Tokens, Patterns and Lexemes
A token is a pair a token name and an optional
token value
A pattern is a description of the form that the
lexemes of a token may take
A lexeme is a sequence of characters in the
source program that matches the pattern for a
token

Compiler Design 19
Example:
In many programming languages, the following classes cover most or all of
the tokens:

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Compiler Design 20
Attributes for tokens
E = M * C ** 2
<id, pointer to symbol table entry for E>
<assign-op>
<id, pointer to symbol table entry for M>
<mult-op>
<id, pointer to symbol table entry for C>
<exp-op>
<number, integer value 2>

Compiler Design 21
Lexical errors
Some errors are out of power of lexical analyzer
to recognize:
fi (a == b) …
However it may be able to recognize errors like:
d = 2r
Such errors are recognized when no pattern for
tokens matches a character sequence

Compiler Design 22
Error recovery
Panic mode: successive characters are ignored
until we reach to a well formed token
Delete one character from the remaining input
Insert a missing character into the remaining
input
Replace a character by another character
Transpose two adjacent characters

Compiler Design 23
Specification of tokens
In theory of compilation regular expressions are
used to formalize the specification of tokens
Regular expressions are means for specifying
regular languages
Example:
 Letter(letter | digit)*

Each regular expression is a pattern specifying

the form of strings

Compiler Design 24
Regular expressions
 Ɛ is a regular expression, L(Ɛ) = {Ɛ}
 If a is a symbol in ∑then a is a regular expression, L(a) =
{a}
 (r) | (s) is a regular expression denoting the language L(r) ∪
L(s)
 (r)(s) is a regular expression denoting the language
L(r)L(s)
 (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denoting L(r)

Compiler Design 25
Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn

Example:
letter -> A | B | … | Z | a | b | … | Z |
digit -> 0 | 1 | … | 9
id -> letter (letter| digit)*

Compiler Design 26
Extensions
One or more instances: (r)+
Zero of one instances: r?
Character classes: [abc]

Example:
letter_ -> [A-Za-z_]
digit -> [0-9]
id -> letter(letter|digit)*

Compiler Design 27
Recognition of tokens
Starting point is the language grammar to understand the
tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number

Compiler Design 28
Recognition of tokens (cont.)
 The next step is to formalize the patterns:
digit -> [0-9]
digits -> digit+
number -> digit(.digits)? (E[+-]? digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
 We also need to handle whitespaces:
ws -> (blank | tab | newline)+

Compiler Design 29
Transition diagrams
Transition diagram for relop
Relop -> < | > | <= | >= | = | <>

Compiler Design 30
Transition diagrams (cont.)
Transition diagram for reserved words and
identifiers
id -> letter (letter|digit)*

Compiler Design 31
Transition diagrams (cont.)
Transition diagram for unsigned numbers
number -> digit(.digits)? (E[+-]? digit)?

Compiler Design 32
Transition diagrams (cont.)
 Transition diagram for whitespace
ws -> (blank | tab | newline)+

Compiler Design 33
Lexical Analyzer Generator - Lex

 LEX is a tool that allows one to specify a Lexical Analyzer by

specifying RE to describe patterns for tokens.

 Input Notation-Lex language(Specification)

 Lex Compiler-Transforms Input patterns into a Transition

diagram and generates code in a file called lex.yy.c

Compiler Design 34
Lexical analyzer with LEX

Lex Source program

lex.l Lexical Compiler lex.yy.c

lex.yy.c C a.out
compiler

Input stream Sequence

a.out
of tokens

Compiler Design 35
Structure of Lex programs :

Lex program has the following

form:
declarations
%%
translation rules
%%
auxiliary functions

•The translation rules each have the form

Pattern {Action}

Compiler Design 36
• The declarations section includes declarations of variables, manifest
constants (identifiers declared to stand for a constant, e.g., the name of a
token), and regular definitions.
• The translation rules each have the form
• Pattern { Action }
• pattern is a regular expression
• Action-Fragment of code written in C.
• Third Section-holds whatever additional functions are used in the actions.
• Alternatively, these functions can be compiled separately and loaded with
the lexical analyser

Compiler Design 37
Conflict Resolution in Lex
There are two rules that Lex uses to decide on the proper lexeme to
select, when several prefixes of the input match one or more
patterns:

1. Always prefer a longer prefix to a shorter prefix.

2. If the longest possible prefix matches two or more
patterns, prefer the pattern listed first in the Lex program.

Compiler Design 38
The Lookahead Operator
 Lex automatically reads one character ahead of the last character that forms
the selected lexeme, and then retracts the input so only the lexeme itself is
consumed from the input.

 However, sometimes, we want a certain pattern to be matched to the input only

when it is followed by a certain other characters. If so, we may use the slash in
a pattern to indicate the end of the part of the pattern that matches the lexeme.

 What follows / is additional pattern that must be matched before we can decide
that the token in question was seen, but what matches this second pattern is not
part of the lexeme.

Compiler Design 39

PPS Unit 5
No ratings yet
PPS Unit 5
7 pages
Python Quiz1mid
No ratings yet
Python Quiz1mid
3 pages
Unit 1
No ratings yet
Unit 1
109 pages
BCA SEM4 OCT 2022 Question Papers
No ratings yet
BCA SEM4 OCT 2022 Question Papers
11 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
Compiler Design
No ratings yet
Compiler Design
174 pages
Chapter OneCompiler Design
No ratings yet
Chapter OneCompiler Design
14 pages
Document From Aditya Tripathi
No ratings yet
Document From Aditya Tripathi
5 pages
Maksym
No ratings yet
Maksym
1 page
1 Compiler Design Lect1
No ratings yet
1 Compiler Design Lect1
28 pages
CSC Slides Intro N Lex
No ratings yet
CSC Slides Intro N Lex
77 pages
CURSORS
No ratings yet
CURSORS
5 pages
1.lecture Notes 19 Apil
No ratings yet
1.lecture Notes 19 Apil
26 pages
Midsem
No ratings yet
Midsem
13 pages
CD Unit 1
No ratings yet
CD Unit 1
63 pages
Compiler Lec-One
No ratings yet
Compiler Lec-One
46 pages
CV Preey Shah PDF
No ratings yet
CV Preey Shah PDF
1 page
CN-Unit - 4 Notes - AR20 - REC
No ratings yet
CN-Unit - 4 Notes - AR20 - REC
19 pages
Compiler Designing
No ratings yet
Compiler Designing
12 pages
Jyothi
No ratings yet
Jyothi
6 pages
Azure Landing Zone Deployment - English
100% (1)
Azure Landing Zone Deployment - English
2 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
40 pages
DVR User Manual: (Version 5.1.0.0)
No ratings yet
DVR User Manual: (Version 5.1.0.0)
122 pages
Synopsis of Charity Management System
100% (1)
Synopsis of Charity Management System
7 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Lenovo B340/B540 Schematics: Schematic Sheet Schematic Sheet
No ratings yet
Lenovo B340/B540 Schematics: Schematic Sheet Schematic Sheet
72 pages
L2 - Structure of A Compiler
No ratings yet
L2 - Structure of A Compiler
43 pages
Cp5151 Advanced Data Structures and Algorithms
0% (1)
Cp5151 Advanced Data Structures and Algorithms
4 pages
Tapan Kumar Experienced Format
No ratings yet
Tapan Kumar Experienced Format
2 pages
Compiler Unit1
No ratings yet
Compiler Unit1
23 pages
Vaishnavi Kolewar - Resume
No ratings yet
Vaishnavi Kolewar - Resume
2 pages
Hardware Key
No ratings yet
Hardware Key
3 pages
66fe65b5746f9CCWeek 02lecture03
No ratings yet
66fe65b5746f9CCWeek 02lecture03
47 pages
Project Cost Management Plan For Fresh Graduate
No ratings yet
Project Cost Management Plan For Fresh Graduate
14 pages
CD All Units
No ratings yet
CD All Units
117 pages
Microprocessor 8085 Ramesh Gaonkar: Architecture, Programming, and Applications With The by
No ratings yet
Microprocessor 8085 Ramesh Gaonkar: Architecture, Programming, and Applications With The by
37 pages
Intro To Compilers
No ratings yet
Intro To Compilers
77 pages
Lenovo Legion 5 17ACH6 Spec
No ratings yet
Lenovo Legion 5 17ACH6 Spec
7 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
Computer Fundamentals and Programming: Course Description
No ratings yet
Computer Fundamentals and Programming: Course Description
2 pages
Computer Test No. 59
No ratings yet
Computer Test No. 59
11 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
41 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
CCNA 200 120 Lab Manual
No ratings yet
CCNA 200 120 Lab Manual
75 pages
Mlops - Definitions, Tools and Challenges: Elated Ork
No ratings yet
Mlops - Definitions, Tools and Challenges: Elated Ork
8 pages
Remote Procedure Call
No ratings yet
Remote Procedure Call
6 pages
Bedasa
No ratings yet
Bedasa
31 pages
Configurable Task List Continuation To httpscnsapcomdocsDOC-43459
No ratings yet
Configurable Task List Continuation To httpscnsapcomdocsDOC-43459
7 pages
LM139, LM239, LM339: Low-Power Quad Voltage Comparators
No ratings yet
LM139, LM239, LM339: Low-Power Quad Voltage Comparators
16 pages
01 IntroToCompilers
No ratings yet
01 IntroToCompilers
41 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
CC 1
No ratings yet
CC 1
41 pages
Introduction
No ratings yet
Introduction
23 pages
A Risc MIPS, Mflops
No ratings yet
A Risc MIPS, Mflops
16 pages
Automata and Compiler Design: D.Rahul
No ratings yet
Automata and Compiler Design: D.Rahul
638 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Notes Compile Complete
No ratings yet
Notes Compile Complete
117 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
Compiler Design - Quick Guide: Language Processing System
No ratings yet
Compiler Design - Quick Guide: Language Processing System
51 pages
History of Neural Networks
No ratings yet
History of Neural Networks
4 pages
Compiler Design
No ratings yet
Compiler Design
94 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
CD Iii I
No ratings yet
CD Iii I
180 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
Compiler Design: Objectives
No ratings yet
Compiler Design: Objectives
45 pages
Module 2&3
No ratings yet
Module 2&3
127 pages
Mini Compiler: Submitted By: Tejash Niroula 16bce2292
No ratings yet
Mini Compiler: Submitted By: Tejash Niroula 16bce2292
14 pages
Flowchart
No ratings yet
Flowchart
10 pages
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
No ratings yet
Narayana Engineering College::Nellore: Department of Computer Science and Engineering
20 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
Unit 1,2 PDF
No ratings yet
Unit 1,2 PDF
31 pages
Unit 1,2 PDF
No ratings yet
Unit 1,2 PDF
31 pages
How To Setup An FTP Site
No ratings yet
How To Setup An FTP Site
3 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Quick Book of Compiler
100% (1)
Quick Book of Compiler
66 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages

CDUnit 1

Uploaded by

CDUnit 1

Uploaded by

UNIT-I: Introduction:

A compiler is a program that can read a program in one

A benefit of this arrangement is that bytecodes compiled on one

Linkers and Loaders :

There are two major parts of a compiler:

In analysis phase, an intermediate representation is created from the

• Compiler consists of 6 phases

•Each phase transforms the source program from one representation

• They communicate with error handlers.

• They communicate with the symbol table.

•Lexical analyzer phase is the first phase of compilation process.

Token-name: an abstract symbol is used during syntax analysis, an

•Puts information about identifiers into the symbol table.

•Syntax analysis is the second phase of compilation process.

•A typical representation is a syntax tree in which each interior

•Semantic analysis is the third phase of compilation process.

•The output of semantic analysis phase is the annotated tree syntax.

•In the intermediate code generation, compiler generates the source

•Intermediate code is generated between the high-level language and

•Code optimization is used to improve the intermediate code so that

•Code generation is the final stage of the compilation process. It takes

•The stream of tokens is sent to the parser for syntax analysis

Token Informal description Sample lexemes

id Letter followed by letter and digits pi, score, D2

printf(“total = %d\n”, score);

Each regular expression is a pattern specifying

 LEX is a tool that allows one to specify a Lexical Analyzer by

 Input Notation-Lex language(Specification)

 Lex Compiler-Transforms Input patterns into a Transition

Lex Source program

Input stream Sequence

Lex program has the following

•The translation rules each have the form

1. Always prefer a longer prefix to a shorter prefix.

 However, sometimes, we want a certain pattern to be matched to the input only

You might also like