0% found this document useful (0 votes)

67 views21 pages

Lexical Analysis: Programming Languages Translators

The document discusses lexical analysis in programming language translators. Lexical analysis involves breaking the source code text into tokens through processes like recognizing keywords, identifiers, numbers, operators, and punctuation. It creates a stream of tokens from the character input by grouping characters into meaningful units like identifiers, numbers, strings, and punctuation symbols. The lexical analyzer represents each unique token with a numeric code to simplify later parsing.

Uploaded by

Anwar Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views21 pages

Lexical Analysis: Programming Languages Translators

Uploaded by

Anwar Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Programming Languages Translators

Lexical Analysis
Tasks of a Scanner

recognizes the keywords of the language

these are the reserved words that have a special meaning
in the language, such as the word class in Java
recognizes special characters, such as ( and ), or
groups of special characters, such as := and ==
recognizes identifiers, integers, reals, decimals,
strings, etc
ignores whitespaces (tabs, blanks, etc) and
comments
recognizes and processes special directives (such as
the #include "file" directive in C) and macros
Lexical Analysis
A scanner groups input characters into tokens
input: x = x * (acc+123)
token value
identifier x
equal =
identifier x
star*
left-paren (
identifier acc
plus +
integer 123
right-paren )
Tokens are typically represented by numbers
Lexical Analysis

Lexical analyzer splits it into tokens

Token = sequence of characters (symbolic
name) representing a single terminal symbol
Identifiers: myVariable
Literals: 123 5.67 true
Keywords: char sizeof
Operators: + - * /
Punctuation: ; , } {
Discards whitespace and comments
Examples of Tokens in C
Tokens Lexemes
identifier Age, grade,Temp, zone, q1
number 3.1416, -498127,987.76412097
string A cat sat on a mat., 90183654
open parentheses (
close parentheses )
Semicolon ;
reserved word if IF, if, If, iF
Examples of Tokens in C

Lexical analyzer usually represents each token

by a unique integer code
+ { return(PLUS); } // PLUS = 401
- { return(MINUS); } // MINUS = 402
* { return(MULT); } // MULT = 403
/ { return(DIV); } // DIV = 404
Some tokens require regular expressions
[a-zA-Z_][a-zA-Z0-9_]* { return (ID); } // identifier
[1-9][0-9]* { return(DECIMALINT); }
0[0-7]* { return(OCTALINT); }
(0x|0X)[0-9a-fA-F]+ { return(HEXINT); }
slide 6
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but sorrounded by core dumped

printf(total = %d\n, score);

Redefining Identifiers can be
dangerous

program confusing;
const true = false;
begin
if (a<b) = true then
f(a)
else
Whitespace

Whitespace is any space, tab, end-of-line

character (or characters), or character
sequence inside a comment
No token may contain embedded whitespace
(unless it is a character or string literal)
Example:
>= one token
> = two tokens
Reserved Keywords in C

auto, break, case, char, const, continue,

default, do, double, else, enum, extern, float,
for, goto, if, int, long, register, return, short,
signed, sizeof, static, struct, switch, typedef,
union, unsigned, void, volatile, wchar_t, while
C++ added a bunch: bool, catch, class,
dynamic_cast, inline, private, protected,
public, static_cast, template, this, virtual and
others
Each keyword is mapped to its own token
slide 10
Lexical Analysis

The process of converting a character stream into a

corresponding sequence of meaningful symbols
(called tokens or lexemes) is called tokenizing, lexing
or lexical analysis. A program that performs this
process is called a tokenizer, lexer, or scanner.
In Scheme, we tokenize (set! x (+ x 1)) as
( set! x ( + x 1 ) )
Similarly, in Java, we tokenize
System.out.println("Hello World!"); as
System . out . println ( "Hello
World!" ) ;
Parsing Process
Call the scanner to get tokens

Build a parse tree from the stream of tokens

A parse tree shows the syntactic structure of the
source program.

Add information about identifiers in the symbol

table
Report error, when found, and recover from the
error
12
Parsing
Parsing is a process that constructs a syntactic
structure (i.e. parse tree) from the stream of
tokens.
We already learn how to describe the syntactic
structure of a language using (context-free)
grammar.
So, a parser only need to do this?

Stream of tokens
Parser Parse tree
Context-free grammar
Sentinels

E = M eof * C * * 2 eof eof

Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Transition diagrams

Transition diagram for relop

Transition diagrams (cont.)

Transition diagram for reserved words and

identifiers
Transition diagrams (cont.)

Transition diagram for unsigned numbers

Recognition

state = 0;
while ( (c = next_char() ) != EOF ) {
switch (state) {
case 0: if ( c == a ) state = 1;
break;
case 1: if ( c == b ) state = 2;
break;
case 2: if ( c == c ) state = 3;
break;
case 3: if ( c == a ) state = 1;
else { ungetchar(); return (TRUE); }
break;
default:
error();
}
}
if ( state == 3 ) return (TRUE) else return (FALSE);
Finite Automata for the Lexical Tokens

a- z a- z
i f 0-9
2 0-9
1 2 3 1 2
1
0-9

IF ID NUM

0-9 0-9
0-9
. 1 - 2 - 3
\n
4
a- z
1 2 3 1 2
any but \n
. blank, etc.
5 blank, etc.
4 0-9 5 0-9

REAL White space error

(and comment starting with - -)

(Appel, pp. 21)

LEXICAL ANALYSIS

Lexical Errors
Deleting an extraneous character
Inserting a missing character
Replacing an incorrect character by a correct
character
Transposing two adjacent characters(such as ,
fi=>if)
Pre-scanning
Tokens / Patterns / Regular Expressions
Lexical Analysis - searches for matches of lexeme to pattern
Lexical Analyzer returns:<actual lexeme, symbolic identifier of token>

For Example: Token Symbolic ID

if 1
then 2
else 3
>,>=,<, 4
Set of all regular := 5
expressions plus
id 6
symbolic ids plus
analyzer define required int 7
functionality. real 8

algs algs
REs --- NFA --- DFA (program for simulation)

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Essentials of Quantification Theory
No ratings yet
Essentials of Quantification Theory
10 pages
Syntax and Semantics
100% (1)
Syntax and Semantics
59 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
@CD - ch2 Compiler Design
No ratings yet
@CD - ch2 Compiler Design
26 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Flyer bR301 BLE C45
No ratings yet
Flyer bR301 BLE C45
2 pages
Api Acr1281u C1 1.11
No ratings yet
Api Acr1281u C1 1.11
98 pages
Blade RF
No ratings yet
Blade RF
14 pages
Mastercard Rules
No ratings yet
Mastercard Rules
443 pages
Annex87 Cyber Attacks
No ratings yet
Annex87 Cyber Attacks
33 pages
Packetyzer: Previous Work
No ratings yet
Packetyzer: Previous Work
3 pages
Install Open VPN On Ubuntu
No ratings yet
Install Open VPN On Ubuntu
11 pages
Predicates and Quantifiers-1
No ratings yet
Predicates and Quantifiers-1
9 pages
BCS503 - Theory of Computation - Syllabus
No ratings yet
BCS503 - Theory of Computation - Syllabus
3 pages
Translations Into Predicate Logic For Dummies
No ratings yet
Translations Into Predicate Logic For Dummies
4 pages
Flat Unit-2
No ratings yet
Flat Unit-2
6 pages
3.role of Lexical Analyzer
No ratings yet
3.role of Lexical Analyzer
4 pages
ATCD Important Questions
No ratings yet
ATCD Important Questions
7 pages
Unit-3 Knowledge Representation BTech MS N HI L14 L22 PDF
No ratings yet
Unit-3 Knowledge Representation BTech MS N HI L14 L22 PDF
91 pages
Formal Languages and Automata Theory QB
100% (2)
Formal Languages and Automata Theory QB
5 pages
Discrete Mathematics and Its Applications by Kenneth H. Rosen
No ratings yet
Discrete Mathematics and Its Applications by Kenneth H. Rosen
8 pages
Assignment 1 PDF
No ratings yet
Assignment 1 PDF
3 pages
Logic PDF
No ratings yet
Logic PDF
18 pages
Unit1 Sana Anjum
No ratings yet
Unit1 Sana Anjum
148 pages
Compiler Design Material
No ratings yet
Compiler Design Material
107 pages
Toc MCQ
No ratings yet
Toc MCQ
1,103 pages
03 Laboratory Exercise 1
No ratings yet
03 Laboratory Exercise 1
7 pages
TOC Unit 4 PDF
100% (1)
TOC Unit 4 PDF
23 pages
A Crash Course in JFLAP: (Q, S, G, D, Q, Q, Q)
No ratings yet
A Crash Course in JFLAP: (Q, S, G, D, Q, Q, Q)
43 pages
Construction of Nfa and Dfa From R
100% (2)
Construction of Nfa and Dfa From R
15 pages
Lecture - Note - Unit - 6 - Turing Machine
No ratings yet
Lecture - Note - Unit - 6 - Turing Machine
51 pages
Ex 1.4 Predicates and Quantifiers (Without Translation)
No ratings yet
Ex 1.4 Predicates and Quantifiers (Without Translation)
28 pages
Toafl PPT Unit2
No ratings yet
Toafl PPT Unit2
79 pages
Discrete Math Logical
100% (1)
Discrete Math Logical
21 pages
Unit 3 PDF
No ratings yet
Unit 3 PDF
56 pages
Linking Interaction Nets and Post Canonical Systems
No ratings yet
Linking Interaction Nets and Post Canonical Systems
9 pages
14-Context Sesitive Analysis and Attribute Grammars
No ratings yet
14-Context Sesitive Analysis and Attribute Grammars
39 pages
Discrete Mathematics, Chapter 1.4-1.5: Predicate Logic: Richard Mayr
No ratings yet
Discrete Mathematics, Chapter 1.4-1.5: Predicate Logic: Richard Mayr
22 pages
Z - Notation
No ratings yet
Z - Notation
11 pages
PPL Question Bank
No ratings yet
PPL Question Bank
5 pages

Lexical Analysis: Programming Languages Translators

Uploaded by

Lexical Analysis: Programming Languages Translators

Uploaded by

Programming Languages Translators

recognizes the keywords of the language

Lexical analyzer splits it into tokens

Lexical analyzer usually represents each token

Token Informal description Sample lexemes

id Letter followed by letter and digits pi, score, D2

printf(total = %d\n, score);

Whitespace is any space, tab, end-of-line

auto, break, case, char, const, continue,

The process of converting a character stream into a

Build a parse tree from the stream of tokens

Add information about identifiers in the symbol

E = M eof * C * * 2 eof eof

Transition diagram for relop

Transition diagram for reserved words and

Transition diagram for unsigned numbers

REAL White space error

(Appel, pp. 21)

For Example: Token Symbolic ID

You might also like