0% found this document useful (0 votes)

42 views92 pages

CD 1

The document provides an overview of compiler design, focusing on the phases of compilation, particularly the lexical analysis phase. It details the responsibilities of the lexical analyzer, including token identification and error handling, and explains the use of regular expressions for token specification. Additionally, it introduces the LEX tool for generating lexical analyzers and describes the structure of a LEX program.

Uploaded by

229X1A2838 BANDARI DINESH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views92 pages

CD 1

Uploaded by

229X1A2838 BANDARI DINESH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

COMPILER DESIGN

UNIT-1
Language Processors
PHASES OF COMPILER DESIGN
All these phases convert the source code
by dividing into tokens, creating parse
trees, and optimizing the source code by
different phases.
(a+b)*c
Working of Compiler Phases with
Example
Front End and Back end of Compiler
Lexical Analyzer
Introduction
• Lexical analysis is the starting phase of the compiler.
It gathers modified source code that is written in
the form of sentences from the language
preprocessor.
• The lexical analyzer is responsible for breaking
these syntaxes into a series of tokens, by removing
whitespace in the source code.
• If the lexical analyzer gets any invalid token, it
generates an error.
• The stream of character is read by it and it seeks the
legal tokens, and then the data is passed to the
syntax analyzer, when it is asked for.
Roles and Responsibility of Lexical
Analyzer
The lexical analyzer performs the following tasks-
• The lexical analyzer is responsible for removing
the white spaces and comments from the source
program.
• It corresponds to the error messages with the
source program.
• It helps to identify the tokens.
• The input characters are read by the lexical
analyzer from the source code.
Example
Count number of tokens:
int max(int i);
• Lexical analyzer first read int and finds it to be
valid and accepts as token.
• max is read by it and found to be a valid
function name after reading (
• int is also a token , then again i as another
token and finally ;
• Answer:
Total number of tokens 7:
int, max, ( ,int, i, ), ;
• We can represent in the form of lexemes and
tokens as under
Input Buffering

• Lexical Analysis has to access secondary

memory each time to identify tokens. It is
time-consuming and costly. So, the input
strings are stored into a buffer and then
scanned by Lexical Analysis.
• Lexical Analysis scans input string from left to
right one character at a time to identify
tokens.
• It uses two pointers to scan tokens −
• Begin Pointer (bptr) − It points to the
beginning of the string to be read.
• Look Ahead Pointer (lptr) − It moves ahead to
search for the end of the token.
Example − For statement int a, b;
• Both pointers start at the beginning of the
string, which is stored in the buffer.
•Look Pointer scans buffer until the token is
found.

•The character ("blank space") beyond the token

("int") have to be examined before the token
("int") will be determined.
After processing token ("int") both pointers will
set to the next token ('a'), & this process will be
repeated for the whole program.
• A buffer can be divided into two halves. If the
look Ahead pointer moves towards halfway in
First Half, the second half is filled with new
characters to be read. If the look Ahead
pointer moves towards the right end of the
buffer of the second half, the first half will be
filled with new characters, and it goes on.
Specification of Tokens
• Specification of tokens depends on the pattern of the
lexeme. Here we will be using regular expressions to
specify the different types of patterns that can actually
form tokens.
• Although the regular expressions are inefficient in
specifying all the patterns forming tokens. Yet it reveals
almost all types of pattern that forms a token.
• There are 3 specifications of tokens:
1. String
2. Language
3. Regular Expression
1. String

• An alphabet or character class is a finite set of symbols.

• A string over an alphabet is a finite sequence of symbols
drawn from that alphabet.
• In language theory, the term "word" is often used as
synonyms for "string".
• The length of a string s, usually written |s|, is the
number of occurrences of symbols in s. For example,
"banana" is a string of length six.
• The empty string, denoted ε, is the string of length zero.
1. Prefix of String
• The prefix of the string is the preceding symbols
present in the string and the string(s) itself.
For Example: s = abcd
The prefix of the string abcd: ∈, a, ab, abc, abcd
2. Suffix of String
• Suffix of the string is the ending symbols of the
string and the string(s) itself
For Example: s = abcd
Suffix of the string abcd: ∈, d, cd, bcd, abcd
3. Proper Prefix of String
• The proper prefix of the string includes all the prefixes
of the string excluding ∈ and the string(s) itself.

Proper Prefix of the string abcd: a, ab, abc

4. Proper Suffix of String
The proper suffix of the string includes all the suffixes
excluding ∈ and the string(s) itself.

Proper Suffix of the string abcd: d, cd, bcd

5. Substring of String
• The substring of a string s is obtained by deleting any
prefix or suffix from the string.

Substring of the string abcd: ∈, abcd, bcd, abc, …

7. Subsequence of String
The subsequence of the string is obtained by
eliminating zero or more (not necessarily
consecutive) symbols from the string.

A subsequence of the string abcd: abd, bcd, bd,

8. Concatenation of String
If s and t are two strings, then st denotes
concatenation.

s = abc t = def

Concatenation of string s and t i.e. st = abcdef

3. Regular Expression

• A regular expression is a sequence of symbols

used to specify lexeme patterns.
• A regular expression is helpful in describing the
languages that can be built using operators such
as union, concatenation, and closure over the
symbols.
• The grammar defined by regular expressions is
known as regular grammar. The language defined
by regular grammar is known as regular language.
Notations
If r and s are regular expressions denoting the
languages L(r) and L(s), then
• Union : (r)|(s) is a regular expression denoting
L(r) U L(s)
• Concatenation : (r)(s) is a regular expression
denoting L(r)L(s)
• Kleene closure : (r)* is a regular expression
denoting (L(r))*
• (r) is a regular expression denoting L(r)
Precedence and Associativity

• *, concatenation (.), and | (pipe sign) are left

associative
• * has the highest precedence
• Concatenation (.) has the second highest
precedence.
• | (pipe sign) has the lowest precedence of all.
Representing valid tokens of a
language in regular expression
If x is a regular expression, then:
• x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
• x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
• x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
Recognition of Tokens

• Tokens obtained during lexical analysis

are recognized by Finite Automata.
• Finite Automata (FA) is a simple idealized
machine that can be used to recognize
patterns within input taken from a character
set or alphabet (denoted as C). The primary
task of an FA is to accept or reject an input
based on whether the defined pattern occurs
within the input.
• EXAMPLE
Assume the following grammar fragment to generate
a specific language

• where the
terminals if, then, else, relop, id and num generates
sets of strings given by following regular definitions.
Lexical-Analyzer Generator
• LEX is a tool that generates a lexical
analyzer program for a given input string. It
processes the given input string/file and
transforms them into tokens. It is used by
YACC programs to generate complete parsers.
• There are many versions available for LEX, but
the most popular is flex which is readily
available on Linux systems as a part of the
compiler package.
The function of Lex is as follows:

• Firstly lexical analyzer creates a program lex.1

in the Lex language. Then Lex compiler runs
the lex.1 program and produces a C program
lex.yy.c.
• Finally C compiler runs the lex.yy.c program
and produces an object program a.out.
• a.out is lexical analyzer that transforms an
input stream into a sequence of tokens.
Parts of the LEX program

The layout of a LEX source program is:

• Definitions
• Rules
• Auxiliary routines
A double modulus sign separates each
section %%.

Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Compiler Design - (Book) .PDF 160
No ratings yet
Compiler Design - (Book) .PDF 160
165 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
CD File
100% (1)
CD File
47 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
System Programming
No ratings yet
System Programming
150 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
KARTHIK
No ratings yet
KARTHIK
55 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
At&cd DCM Unit 4-1
No ratings yet
At&cd DCM Unit 4-1
113 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Compiler Design
No ratings yet
Compiler Design
65 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
2-Patterns, Lexemes, Tokens, Attributes-18-12-2024
No ratings yet
2-Patterns, Lexemes, Tokens, Attributes-18-12-2024
73 pages
Unit 2
No ratings yet
Unit 2
61 pages
SS & OS LAB Manual-1 PDF
No ratings yet
SS & OS LAB Manual-1 PDF
73 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler Design - Parser Design With Lex and Yacc
No ratings yet
Compiler Design - Parser Design With Lex and Yacc
8 pages
System Software and Compiler Design Laboratory
No ratings yet
System Software and Compiler Design Laboratory
33 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
CSI 411 - Compiler - Lecture 2 PDF
No ratings yet
CSI 411 - Compiler - Lecture 2 PDF
22 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
Acd Unit-2
No ratings yet
Acd Unit-2
16 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
SE Compiler Chapter 2
No ratings yet
SE Compiler Chapter 2
16 pages
Lec 02
No ratings yet
Lec 02
17 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
CD Lab Manual - JSR
No ratings yet
CD Lab Manual - JSR
33 pages
CD Lab Manual 2024-25
No ratings yet
CD Lab Manual 2024-25
74 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
17 pages
Lex and Yacc
No ratings yet
Lex and Yacc
27 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Overview-Lex and Yacc Program
No ratings yet
Overview-Lex and Yacc Program
11 pages
CD Lab RECORD PRINT' 1
No ratings yet
CD Lab RECORD PRINT' 1
91 pages
HW 31712
No ratings yet
HW 31712
22 pages
Lex, Yacc Pgms
No ratings yet
Lex, Yacc Pgms
25 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
System Software Ch2
No ratings yet
System Software Ch2
20 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
Jlex Manual: Robb T. Koether
No ratings yet
Jlex Manual: Robb T. Koether
27 pages
Maintainers Manual For Version 2.2.1 of The NIST DMIS Test Suite (For DMIS 5.2)
No ratings yet
Maintainers Manual For Version 2.2.1 of The NIST DMIS Test Suite (For DMIS 5.2)
72 pages
M.Suhaib Khalid PDF
No ratings yet
M.Suhaib Khalid PDF
10 pages
Compiler Design Unit-1 - 4
No ratings yet
Compiler Design Unit-1 - 4
4 pages
CC Lab Manual - 16feb2015
No ratings yet
CC Lab Manual - 16feb2015
34 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Chapter 13.2. LR Parsing
No ratings yet
Chapter 13.2. LR Parsing
34 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
Lec5 LEX Lexical Analyzer Generator
No ratings yet
Lec5 LEX Lexical Analyzer Generator
12 pages
CD Lab Manual Aim - Algorithm
No ratings yet
CD Lab Manual Aim - Algorithm
11 pages
CD Lab 1
No ratings yet
CD Lab 1
23 pages
Compiler Design Final
No ratings yet
Compiler Design Final
23 pages
3.7 LeX
No ratings yet
3.7 LeX
13 pages
BITS Pilani
No ratings yet
BITS Pilani
21 pages
Lex Program
No ratings yet
Lex Program
12 pages
Introduction To Lex
No ratings yet
Introduction To Lex
18 pages
HTML Utilities
No ratings yet
HTML Utilities
2 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

CD 1

Uploaded by

CD 1

Uploaded by

COMPILER DESIGN

• Lexical Analysis has to access secondary

•The character ("blank space") beyond the token

• An alphabet or character class is a finite set of symbols.

Proper Prefix of the string abcd: a, ab, abc

Proper Suffix of the string abcd: d, cd, bcd

Substring of the string abcd: ∈, abcd, bcd, abc, …

A subsequence of the string abcd: abd, bcd, bd,

Concatenation of string s and t i.e. st = abcdef

• A regular expression is a sequence of symbols

• *, concatenation (.), and | (pipe sign) are left

• Tokens obtained during lexical analysis

• Firstly lexical analyzer creates a program lex.1

The layout of a LEX source program is:

You might also like