0% found this document useful (0 votes)
50 views30 pages

The Role of The Lexical Analyzer: Token Source Program

The document discusses the role of the lexical analyzer in source code compilation. It defines key terms like tokens, lexemes, and patterns used to break source code into lexical units. Regular expressions are used to define token structures and recognize regular language patterns in the source code. Transition diagrams and finite state automata are implemented to depict the actions of the lexical analyzer when called by the parser. The document provides examples of regular expressions for recognizing identifiers, numbers, and other lexical units in programming languages.

Uploaded by

mdhuq1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views30 pages

The Role of The Lexical Analyzer: Token Source Program

The document discusses the role of the lexical analyzer in source code compilation. It defines key terms like tokens, lexemes, and patterns used to break source code into lexical units. Regular expressions are used to define token structures and recognize regular language patterns in the source code. Transition diagrams and finite state automata are implemented to depict the actions of the lexical analyzer when called by the parser. The document provides examples of regular expressions for recognizing identifiers, numbers, and other lexical units in programming languages.

Uploaded by

mdhuq1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

The Role of The Lexical Analyzer

token
Source program Lexical Analyser get next token parser

Symbol table

Token a reserve word

Patterns the set of strings defined by a rule is called a pattern


Lexemes is a sequence of characters in the source program matched by the pattern for atoken

Input Buffering

Sentinels

Regular Expression:

Tokens are built from symbols of a finite vocabulary. We use regular expressions to define structures of tokens.

Regular Expressions
The sets of strings defined by regular expressions are termed regular sets Definition of regular expressions
is a regular expression denoting the empty set
A string s is a regular expression denoting a set containing only s if A and B are regular expressions, so are

A | B (alternation)
AB A* (concatenation) (Kleene closure)

Regular Expressions (Contd)


Some examples
Let D = (0 | 1 | 2 | 3 | 4 | ... | 9 ) L = (A | B | ... | Z) decimal = D+ D+
ident = L (L | D)* (_ (L | D)+)*

Some more examples


Identifiers:

Real Numbers:

Recognition of Tokens

A transition diagram

This machine accepts abccabc, but it rejects abcab. This machine accepts (abc+)+.

Transition Diagrams:
Depicts the action that take place when a lexical analyzer is called by

the parser

* Input retraction must take place

Implementing a Transition Diagram

Lex Specification
A lex program consists of three parts: declarations %% translation rules %% auxiliary procedures

Finite Automata
Generalized Transition Diagram Used to recognize regular sets Can be deterministic or [only one output from a state] Non Deterministic [more than one output from a state]

Nondeterministic Finite Automata


Recognizes a regular expression: (a|b)*abb

Transition table for the finite automation of previous figure

You might also like