0% found this document useful (0 votes)

14 views33 pages

Week 5-6

Slides of Compiler Construction chapter 5-6

Uploaded by

Malik Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views33 pages

Week 5-6

Slides of Compiler Construction chapter 5-6

Uploaded by

Malik Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

MODULE # 3: COMPILER CONSTRUCTION

INSTRUCTOR: DR. SAKEENA JAVAID

1
OUTLINE OF THE TOPICS TO BE COVERED TODAY

 Chomsky's classification of grammars

 Lexical analysis
 Tokens and types of tokens
 Regular Expressions and DFA’s

2
CHOMSKY'S CLASSIFICATION OF GRAMMARS

 Type-0 grammars include all formal grammar. These

languages are also known as the Recursively Enumerable
languages.
 Type-1 grammars generate context-sensitive languages. It
should be Type 0.
 Type-2 grammars generate context-free languages. In Type
2: First of all, it should be Type 1.
 Type-3 grammars generate regular languages. Type 3 is
the most restricted form of grammar.

3
LEXICAL ANALYSIS

 First phase of a compiler is called lexical analysis or scanning

 Lexical analyzer reads the stream of characters making up the source program
 Groups the characters into meaningful sequences called lexemes
 For each lexeme, the lexical analyzer produces as output a token of the form

<token-name; attribute-value> In the token, the first component token-name is an abstract

 It passes on to the subsequent phase syntax analysis symbol that is used during syntax analysis, and the second
component attribute-value points to an entry in the symbol table
for this token

Information from the symbol-table entry is needed for

semantic analysis and code generation.
4
LEXICAL ANALYSIS

 For example:
 Suppose a source program contains the assignment statement
position = initial + rate * 60 … (1)
 Equation 2 shows the representation of the assignment equation (1) after lexical
analysis as the sequence of tokens
<id; 1> <=> <id; 2> <+> <id; 3> <*> <60> … (2)
 Visual representation of the afore-mentioned equation will be seen next

5
LEXICAL ANALYSIS

 Visual representation
 All stages during compilation of the code

6
LEXICAL ANALYSIS

 Interactions between the Lexical Analyzer and

the Parser
 Lexical analysers are divided into a cascade of two
processes:
a) Scanning consists of the simple processes that
do not require tokenization of the input, such
as deletion of comments and compaction of
consecutive whitespace characters into one.
b) Lexical analysis is the more complex portion,
which produces tokens from the output of the
scanner.

7
LEXICAL ANALYSIS

 Lexical Analysis Versus Parsing

 A number of reasons why the analysis portion of a compiler is normally separated into lexical
analysis and parsing (syntax analysis) phases
 Simplicity of design is the most important consideration.
 While designing a new language, separating lexical and syntactic concerns can lead to a
cleaner overall language design
 Compiler efficiency is improved (using lexical analysis and buffering techniques)
 Compiler portability is enhanced (Input-device-specific peculiarities can be restricted to
the lexical analyser)

8
LEXICAL ANALYSIS

 Tokens, Patterns, and Lexemes

 When discussing lexical analysis, we use three related but distinct terms
 Token is a pair consisting of a token name and an optional attribute value
 e.g., a particular keyword, or a sequence of input characters denoting an identifier. The token names are the input
symbols that the parser processes. In what follows, we shall generally write the name of a token in boldface. We will
often refer to a token by its token name.
 A pattern is a description of the form that the lexemes of a token may take.
 The pattern is just the sequence of characters that form the keyword. For identifiers and some other tokens, the pattern
is a more complex structure that is matched by many strings
 A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the
lexical analyzer as an instance of that token

9
LEXICAL ANALYSIS

 Example: Find some typical tokens, their informally described patterns,

and some sample lexemes. To see how these concepts are used in
practice, in the C statement:
printf ("Total = %d\n",
score);
 both printf and score are lexemes matching the pattern for token id, and

 "Total = %d\n" is a lexeme matching literal

10
LEXICAL ANALYSIS
Examples of Tokens and Lexemes

11
LEXICAL ANALYSIS

 Categorization of Tokens:
 In many programming languages, the following classes cover most or all of the
tokens:
 One token for each keyword. The pattern for a keyword is the same as the keyword itself
 Tokens for the operators, either individually or in classes such as the token comparison
 One token representing all identifiers
 One or more tokens representing constants, such as numbers and literal strings
 Tokens for each punctuation symbol, i.e., left and right parentheses, comma, and semicolon

12
LEXICAL ANALYSIS

 Attributes for Tokens:

 When more than one lexeme can match a token,
 The lexical analyser must provide the subsequent compiler phases additional information about the
particular lexeme that matched. For example, the pattern for token number matches both 0 and 1 .
 in many cases the lexical analyser returns to the parser not only a token name, but an attribute value
that describes the lexeme represented by the token;
 information about an identifier, e.g., its lexeme, its type, and the location at which it is first found

13
LEXICAL ANALYSIS

 Example 2 :
 The token names and associated attribute values for the Fortran statement
E = M * C *2
 Please solve it with respect to tokens…

14
LEXICAL ANALYSIS

15
LEXICAL ANALYSIS

 Lexical Errors:
 It is hard for a lexical analyser to tell without the aid of other components, that there is a source-code error.
 For instance, if the string fi is encountered for the first time in a C program in the context:
 fi ( a == f(x)) ...
 a lexical analyser cannot tell whether fi is a misspelling of the keyword if
 or an undeclared function identifier
 Since fi is a valid lexeme for the token id, the lexical analyser must return the token id to the parser
and let some other phase of the compiler
 probably the parser in this case (handle an error due to transposition of the letters)

16
LEXICAL ANALYSIS

 A situation arises in which the lexical analyser is unable to proceed

 because none of the patterns for tokens matches any prefix of the remaining input
 The simplest recovery strategy is “panic mode" recovery.
 We delete successive characters from the remaining input, until the lexical analyser can find a
well-formed token at the beginning of what input is left.
 This recovery technique may confuse the parser, but in an interactive computing environment it
may be quite adequate.

17
LEXICAL ANALYSIS

 Other possible error-recovery actions are:

 Delete one character from the remaining input.
 Insert a missing character into the remaining input.
 Replace a character by another character.
 Transpose two adjacent characters.

18
REGULAR EXPRESSIONS AND DFA

 Specifications of Tokens:
 Regular expressions are an important notation for specifying lexeme patterns.
 They cannot express all possible patterns, however, they are very effective in specifying those types of
patterns that we actually need for tokens.
 We shall study the formal notation for regular expressions
 Strings and languages
 An alphabet is any finite set of symbols. Typical examples of symbols are letters, digits, and
punctuation.
 The set {0, 1} is the binary alphabet. ASCII is an important example of an alphabet; it is used in many
software systems. Unicode is another example.

19
REGULAR EXPRESSIONS AND DFA’S

 A string over an alphabet is a finite sequence of symbols drawn from that alphabet .
 In language theory, the terms “sentence" and “word" are often used as synonyms for
“string."
 Length of a string s, or |s|, is the number of occurrences of symbols in s.
 For example, banana is a string of length six. The empty string, denoted , is the string of
length zero

20
REGULAR EXPRESSIONS AND DFA’S

 A language is any countable set of strings over some fixed alphabet. This definition is
very broad.
 Abstract languages like , the empty set {}, the set containing only the empty string,
are languages under this definition.
 Note that the definition of “language" does not require that any meaning be ascribed
to the strings in the language

21
REGULAR EXPRESSIONS AND DFA’S

 Operations on languages
 In lexical analysis, the most important operations on languages are union, concatenation, and closure
 Union is the familiar operation on sets. The concatenation of languages is all strings formed by taking a string
from the first language and a string from the second language, in all possible ways, and concatenating them.
 The (Kleene) closure of a language L, denoted as , is the set of strings you get by concatenating L zero or more
times.
 Note that , the concatenation of L zero times," is defined to be {}
 Finally, the positive closure, denoted , is the same as the Kleene closure, but without the term . That is {}, will
not be in L+ unless it is in L itself.

22
REGULAR EXPRESSIONS AND DFA’S

23
REGULAR EXPRESSIONS AND DFA’S

 Regular Expressions
 Here are the rules that define the regular expressions over some alphabet and the languages that those expressions
denote.
 BASIS: There are two rules that form the basis:

 1. is a regular expression, and L() is {}, that is, the language whose sole member is the empty string.

 2. If a is a symbol in , then a is a regular expression, that is, the language with one string, of length one, with a in
its one position.
 Note that by convention, we use italics for symbols, and boldface for their corresponding regular expression

24
REGULAR EXPRESSIONS AND DFA’S

 INDUCTION:

 There are four parts to the induction whereby larger regular expressions are built from smaller ones.

 Suppose r and s are regular expressions denoting languages L(r) and L(s), respectively.

 1. (r)|(s) is a regular expression denoting the language L(r) U L(s).

 2. (r)(s) is a regular expression denoting the language L(r)L(s).

 3. is a regular expression denoting ((L.

 4. (r) is a regular expression denoting L(r).

 This last rule says that we can add additional pairs of parentheses around expressions without changing the
language they denote

25
REGULAR EXPRESSIONS AND DFA’S

 Conventions for dropping the parentheses

 As defined, regular expressions often contain unnecessary pairs of parentheses.
 We may drop certain pairs of parentheses if we adopt the conventions that:
 a) The unary operator has highest precedence and is left associative.
 b) Concatenation has second highest precedence and is left associative
 c) | has lowest precedence and is left associative

26
27
REGULAR EXPRESSIONS AND DFA’S

28
REGULAR EXPRESSIONS AND DFA’S

 Transition Diagrams
 An intermediate step in the construction of a lexical analyser, we first convert patterns into stylized
flowcharts, called “transition diagrams."
 We perform the conversion from regular-expression patterns to transition diagrams
 Transition diagrams have a collection of nodes or circles, called states. Each state represents a
condition that could occur during the process of scanning the input looking for a lexeme that
matches one of several patterns.
 We may think of a state as summarizing all we need to know about what characters we have seen
between the lexemeBegin pointer and the forward pointer

29
REGULAR EXPRESSIONS AND DFA’S

 DFA
 We shall assume that all our transition diagrams are deterministic, meaning that there is never more than
one edge out of a given state with a given symbol among its labels
 Conventions for the transition diagrams
 Some important conventions about transition diagrams are:
 Certain states are said to be accepting, or final. These states indicate that a lexeme has been found, although the
actual lexeme may not consist of all positions between the lexemeBegin and forward pointers.
 In addition, if it is necessary to retract the forward pointer one position (i.e., the lexeme does not include the
symbol that got us to the accepting state), then we shall additionally place a* near that accepting state
 One state is designated the start state, or initial state; it is indicated by an edge, labeled “start,"

30
REGULAR EXPRESSIONS AND DFA’S

Figure: Patterns for tokens

Figure: Tokens, their patterns, and attribute values
31
REGULAR EXPRESSIONS AND DFA’S

 Example:
 A transition diagram that
recognizes the lexemes matching
the token relop.
 We begin in state 0, the start
state

32
Thanks 

002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
CS-352 - Spring 2024 - Lec2
No ratings yet
CS-352 - Spring 2024 - Lec2
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
SOL Compiler Chapter Three
No ratings yet
SOL Compiler Chapter Three
52 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Lexical Analysis
No ratings yet
Lexical Analysis
9 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Doosan B10 - 13 - 15 - 16R-5 - SB4292E - 18.09
100% (2)
Doosan B10 - 13 - 15 - 16R-5 - SB4292E - 18.09
494 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
ATCD Mod 3
No ratings yet
ATCD Mod 3
46 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
RTN 900 V100R019C00 Configuration Guide 01 PDF
No ratings yet
RTN 900 V100R019C00 Configuration Guide 01 PDF
1,883 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Week 4 Lec 8 CC p1-1
No ratings yet
Week 4 Lec 8 CC p1-1
23 pages
2 Lex
No ratings yet
2 Lex
45 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
350 Laser Machine Operating Manual
No ratings yet
350 Laser Machine Operating Manual
109 pages
Micom P443 Programmable Logic: Non - Latching
No ratings yet
Micom P443 Programmable Logic: Non - Latching
20 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Latihan Soal Paket 1
0% (1)
Latihan Soal Paket 1
14 pages
09.all We Like Sheep PDF
No ratings yet
09.all We Like Sheep PDF
9 pages
INFORMATION MANAGEMENT Unit 2
No ratings yet
INFORMATION MANAGEMENT Unit 2
35 pages
Odm Database
No ratings yet
Odm Database
8 pages
Wide Enterprise Networking
No ratings yet
Wide Enterprise Networking
8 pages
Assistive Technologies To Support Students With Dyslexia: Author: Kara Dawson Et Al
100% (1)
Assistive Technologies To Support Students With Dyslexia: Author: Kara Dawson Et Al
16 pages
CLASS 2 Worksheet
No ratings yet
CLASS 2 Worksheet
16 pages
WORK CYCLE 7.2 Web Application Configuration Guide
No ratings yet
WORK CYCLE 7.2 Web Application Configuration Guide
132 pages
Fault Diagnosis and Fault Tolerant Control of A Three-Phase VSI Supplying Sensorless Speed Controlled Induction Motor Drive
No ratings yet
Fault Diagnosis and Fault Tolerant Control of A Three-Phase VSI Supplying Sensorless Speed Controlled Induction Motor Drive
17 pages
Week 3-4
No ratings yet
Week 3-4
36 pages
Paper 113
No ratings yet
Paper 113
10 pages
Week 1-2
No ratings yet
Week 1-2
27 pages
Async-JS.L.U01-05 (Asynchronous JavaScript)
No ratings yet
Async-JS.L.U01-05 (Asynchronous JavaScript)
43 pages
A Dot Matrix Printer
No ratings yet
A Dot Matrix Printer
21 pages
ISTN212 Exam 2023 V2 - PRINT
No ratings yet
ISTN212 Exam 2023 V2 - PRINT
21 pages
Product Guide ROG Sept Oct Nov 2018
No ratings yet
Product Guide ROG Sept Oct Nov 2018
27 pages
78-Identify Input and Output Devices
No ratings yet
78-Identify Input and Output Devices
16 pages
SE Assignment#1
No ratings yet
SE Assignment#1
5 pages
Bosch Praesensa Commerical Brochure
No ratings yet
Bosch Praesensa Commerical Brochure
9 pages
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
No ratings yet
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
11 pages
Synopsis (MCA) Clothes Management
No ratings yet
Synopsis (MCA) Clothes Management
9 pages
HV 48V 80AH LiFeP04
No ratings yet
HV 48V 80AH LiFeP04
1 page
Hall Ticket
No ratings yet
Hall Ticket
1 page
DBMS (CMP509)
No ratings yet
DBMS (CMP509)
5 pages
Radwin Training Catalog
No ratings yet
Radwin Training Catalog
19 pages
Se Assignment 1
No ratings yet
Se Assignment 1
3 pages
DS - Unit Wise Question Bank
No ratings yet
DS - Unit Wise Question Bank
2 pages
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
No ratings yet
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
7 pages
Class8-IIT Screening Test QP Sample Paper
No ratings yet
Class8-IIT Screening Test QP Sample Paper
2 pages
Lutfi Land
No ratings yet
Lutfi Land
1 page
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
No ratings yet
A Detection System For Stolen Vehicles Using Vehicle Attributes With Deep Learning
4 pages
Tasks and Milestones
No ratings yet
Tasks and Milestones
2 pages
AC Prices (Baji Nighat)
No ratings yet
AC Prices (Baji Nighat)
1 page
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Week 5-6

Uploaded by

Week 5-6

Uploaded by

MODULE # 3: COMPILER CONSTRUCTION

INSTRUCTOR: DR. SAKEENA JAVAID

 Chomsky's classification of grammars

 Type-0 grammars include all formal grammar. These

 First phase of a compiler is called lexical analysis or scanning

<token-name; attribute-value> In the token, the first component token-name is an abstract

Information from the symbol-table entry is needed for

 Interactions between the Lexical Analyzer and

 Lexical Analysis Versus Parsing

 Tokens, Patterns, and Lexemes

 Example: Find some typical tokens, their informally described patterns,

 "Total = %d\n" is a lexeme matching literal

 Attributes for Tokens:

 A situation arises in which the lexical analyser is unable to proceed

 Other possible error-recovery actions are:

 1. (r)|(s) is a regular expression denoting the language L(r) U L(s).

 2. (r)(s) is a regular expression denoting the language L(r)L(s).

 3. is a regular expression denoting ((L.

 4. (r) is a regular expression denoting L(r).

 Conventions for dropping the parentheses

Figure: Patterns for tokens

You might also like