0% found this document useful (0 votes)
26 views35 pages

CS-352 - Spring 2024 - Lec2

Uploaded by

jouf00008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views35 pages

CS-352 - Spring 2024 - Lec2

Uploaded by

jouf00008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Faculty of Information Technology

Spring 2024

Compiler Design
CS-352
Lecture (2)

1
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
2
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
3
Structure of a Compiler

4
Today !

5
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
6
Basic Terminologies

Token

It is a pair consisting of a token name and an optional attribute value.

Pattern

It is a description of the form that the lexemes of a token may take.

Lexeme
It is a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.

7
Basic Terminologies (Cont.)

• Example: gives some typical tokens, their


informally described patterns, and some sample
lexemes.

8
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
9
Role of Lexical Analyzer

• As the first phase of a compiler, the main task of


the lexical analyzer is to read the input characters
of the source program, group them into lexemes,
and produce as output a sequence of tokens for
each lexeme in the source program.

Lexical
Source Program Token Stream
Analyzer

10
Role of Lexical Analyzer (Cont.)

• Sometimes, lexical analyzers are divided into a


cascade of two phases:-

Scanning

Consists of the simple processes that do not require


tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.

Lexical Analyzer

proper is the more complex portion, which produces tokens


from the output of the scanner.

11
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
12
Interactions between Scanner and
Parser
• Commonly, the interaction is implemented by having the
parser call the lexical analyzer. The call, suggested by the
getNextToken command, causes the lexical analyzer to read
characters from its input until it can identify the next lexeme
and produce for it the next token, which it returns to the
parser.

13
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
15
Main Components of Lexical
Analyzer
RE

Thompson
Construction

NFA
Subset
Construction
DFA

Minimization

Minimized
DFA
DFA Simulation Scanner
Generator

Program 16
Main Components of Lexical
Analyzer (Cont.)

• Main components of scanner generation (e.g., Lex)


▪ Convert a regular expression to a non-deterministic finite
automaton (NFA).

▪ Convert the NFA to a deterministic finite automaton (DFA).

▪ Improve the DFA to minimize the number of states.

▪ Generate a program in C or some other language to “simulate”


the DFA.

17
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
18
Operations on Languages

19
Operations on Languages (Cont.)

Definitions of Operations on Languages

20
Operations on Languages (Cont.)

• Example: Let L be the set of letters {A, B, ... , Z, a, b, ... , z} and let D
be the set of digits {0, 1, ... , 9}.
– We may think of L and D in two, essentially equivalent, ways. One way is that L and
D are, respectively, the alphabets of uppercase and lowercase letters and of digits.
The second way is that L and D are languages, all of whose strings happen to be of
length one.
– Here are some other languages that can be constructed from languages L and D,
using the operators on languages:-

21
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
22
Regular Expressions

• In the former Example, we were able to describe identifiers


by giving names to sets of letters and digits and using the
language operators: union, concatenation, and closure.
• This process is so useful that a notation called regular
expressions. Regular expressions are used to describe the
languages that constructed from the symbols of some
alphabet.
• In this notation, if letter is established to stand for any letter,
and digit is established to stand for any digit, then we could
describe the language of C++ identifiers by:-
letter ( letter | digit )*

23
Regular Expressions (Cont.)

• BASIS: There are two rules that form the basis:-


1. ε is a regular expression, and L(ε) is {ε}, that is, the language
whose sole member is the empty string.

2. If a is a symbol in Σ, then a is a regular expression, and L(a) ={a},


that is, the language with one string, of length one, with a in its
one position.

24
Regular Expressions (Cont.)

• As defined, regular expressions often contain unnecessary


pairs of parentheses. We may drop certain pairs of
parentheses if we adopt the conventions that:-
a) The unary operator * has highest precedence and is left associative.

b) Concatenation has second highest precedence and is left associative.

c) l has lowest precedence and is left associative.


• Under these conventions, for example, we may replace the
regular expression (a)l((b)*(c)) by alb*c. Both expressions
denote the set of strings that are either a single a or are zero
or more b's followed by one c.

25
Regular Expressions (Cont.)

26
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
27
Regular Definitions

• For notational convenience, we may wish to give names to


certain regular expressions and use those names in
subsequent expressions, as if the names were themselves
symbols.

28
Regular Definitions (Cont.)

29
Regular Definitions (Cont.)

30
Outline

• Structure of a Compiler
• Basic Terminologies
• Role of Lexical Analyzer
• Interactions between Scanner and Parser
• Main Components of Lexical Analyzer
• Operations on Languages
• Regular Expressions
• Regular Definitions
• Extensions of Regular Definitions
31
Extensions of Regular Definitions

• Since Kleene introduced regular expressions with the basic


operators for union, concatenation, and Kleene closure in the
1950s, many extensions have been added to regular
expressions to enhance their ability to specify string
patterns.

32
Extensions of Regular Definitions
(Cont.)

33
Extensions of Regular Definitions
(Cont.)

34

You might also like