0% found this document useful (0 votes)
48 views14 pages

Syntax Specification: Regular Expressions

The document discusses the phases of compilation for programming languages, including syntax analysis. It defines syntax as how linguistic elements like words are combined to form phrases and clauses. Syntax of programming languages can be formally defined using context-free grammars. Regular expressions are used to define the tokens or basic elements of a language and can be represented as deterministic finite automata. Examples are provided of regular expressions for numeric literals and identifiers.

Uploaded by

seerat fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views14 pages

Syntax Specification: Regular Expressions

The document discusses the phases of compilation for programming languages, including syntax analysis. It defines syntax as how linguistic elements like words are combined to form phrases and clauses. Syntax of programming languages can be formally defined using context-free grammars. Regular expressions are used to define the tokens or basic elements of a language and can be represented as deterministic finite automata. Examples are provided of regular expressions for numeric literals and identifiers.

Uploaded by

seerat fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

Syntax Specification

Regular Expressions

1
Phases of Compilation

2
Syntax Analysis

• Syntax:
– Webster’s definition: 1 a : the way in which linguistic
elements (as words) are put together to form constituents
(as phrases or clauses)
• The syntax of a programming language
– Describes its form
» i.e. Organization of tokens (elements)
– Formal notation
» Context Free Grammars (CFGs)

3
Review: Formal definition of tokens

• A set of tokens is a set of strings over an alphabet


– {read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}
• A set of tokens is a regular set that can be defined by
comprehension using a regular expression
• For every regular set, there is a deterministic finite
automaton (DFA) that can recognize it
– i.e. determine whether a string belongs to the set or not
– Scanners extract tokens from source code in the same way
DFAs determine membership

4
Regular Expressions

• A regular expression (RE) is:


– A single character
– The empty string, 
– The concatenation of two regular expressions
» Notation: RE1 RE2 (i.e. RE1 followed by RE2)
– The union of two regular expressions
» Notation: RE1 | RE2
– The closure of a regular expression
» Notation: RE*
» * is known as the Kleene star
» * represents the concatenation of 0 or more strings
– Non-null enumeration
» Notation: RE+
» represents all non-null concatenations of RE (1 or more times)

5
Regular Expressions Basics

Let alphabet ={a,b} (means a and b are its only letters)


a*=(, a, aa, aaa, ...}
(ab)*=(, ab, abab, ababab, ...}
a  b=(a, , b, bb, bb, ...}
(a  b)*= all strings containing a’s and b’s
(a*b*)*=(ab*)*= all strings containing a’s and b’s
a*b*={aibj| i >=0, j>=0)

6
Building Regular Expressions

Regular Expressions as Language


• * while loop
–iterates 0 or more times
• concatenation uv
–sequential; first u, then v
• uv OR
–select from one or the other or both

7
Description  Regular Expression

Let ={a,b} – all expressions over this alphabet


Strings with
• exactly one a b*ab*
• exactly two a’s b*ab*ab*
• one or more a’s (b*ab*)* or (a  b)*a (a  b)*
• even number of a’s (b*ab*ab*)*
• even number of a’s and exactly one b
• (aa)*b(aa)*  (aa)*ab(aa)*a
• odd number of a’s (b*ab*ab*)*b*ab*
• that don’t contain aa (b  ab)*(  a)

8
Regular Expression  Description

Same alphabet
• (aa)* even number of a’s
• (a  b) (a  b) (a  b) (a  b)
all strings of length 4
• ((a  b) (a  b) (a  b) (a  b))*
strings of length divisible by 4
• (aa)*  ((a  b) (a  b) (a  b) (a  b))*
strings of a’s of length divisible by 4

9
Token Definition Example

• Numeric literals in Pascal


– Definition of the token unsigned_number

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer  digit digit* | digit+

unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  )


( ( e ( + | – | ) unsigned_integer ) |  )

• Recursion is not allowed in Regular Expressions!

10
Exercise

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer  digit digit*

unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  )


( ( e ( + | – | ) unsigned_integer ) |  )
• Regular expression for
– Decimal numbers

number  …

11
Exercise

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer  digit digit*

unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  )


( ( e ( + | – | ) unsigned_integer ) |  )
• Regular expression for
– Decimal numbers

number  ( + | – | ) unsigned_integer ( (  unsigned_integer ) |  )

12
Exercise

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer  digit digit*

unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  )


( ( e ( + | – | ) unsigned_integer ) |  )
• Regular expression for
– Identifiers

identifier  …

13
Exercise

digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer  digit digit*

unsigned_number  unsigned_integer ( ( . unsigned_integer ) |  )


( ( e ( + | – | ) unsigned_integer ) |  )
• Regular expression for
– Identifiers

identifier  letter ( letter | digit |  )*

letter  a | b | c | … | z

14

You might also like