0% found this document useful (0 votes)

18 views26 pages

Lexical Analysis (Scanner)

Uploaded by

Hadeer Anwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views26 pages

Lexical Analysis (Scanner)

Uploaded by

Hadeer Anwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Lexical Analysis (scanner)

Source
Program Tokens
Lexical
(Character analyzer
Stream)

▪ Lexical Analysis is
also known as
lexical scanner.
Lexical Analysis (scanner)
token & attributes
source lexical Syntax
program analyzer analyzer
get next
token

Symbol
Table

The purpose of lexical analyzers:

▪ Lexical analysis takes a stream of input characters and decode
them into higher level tokens that a syntax analyzer (parser) can
understand.
▪ The lexical analyzer breaks these syntaxes into a series of tokens,
by removing any whitespace or comments in the source code.
Lexical Analysis (scanner)
..
.
count = 1
position
.
= initial + rate * 60
..

• The lexical analyzer reads the stream of characters making

up the source program and groups the characters into
meaningful sequences called lexemes (tokens).
• For each lexeme, the lexical analyzer produces as output a
token of the form (token-attribute pair):
(Token-type, attribute-value)

.(Token-type, attribute-value) : ‫يتم التعامل مع الوحدة اللفظية على أنها ثنائية مكونة من جزأين‬
TOKENS, PATTERNS, AND LEXEMES:
I learn compiler
In English language: design
noun, verb, adjective, …
In a programming language:
Identifier, Integer, Keyword, Whitespace,…
TOKENS, PATTERNS, AND LEXEMES:
Example: The program statement
count = 1

Sequence of token-attribute pairs:

Lexical (id,1) (assign,=) (int,1)

count = 1
analyzer
Example: The program statement
position = initial + rate * 60

o Position is a lexeme that would be mapped into a token (id, 1),

where id is an abstract symbol standing for identifier
and 1 points to the symbol-table entry for position.
o The assignment symbol = is a lexeme that is mapped into the token (=)
o Initial is a lexeme that is mapped into the token (id, 2)
o + is a lexeme that is mapped into the token (+)
o rate is a lexeme that is mapped into the token (id, 3)
o * is a lexeme that is mapped into the token (*)
o 60 is a lexeme that is mapped into the token (60)
Example: The program statement

y := 31 + 28 * x

y := 31 + 28 * x Lexical analyzer

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
Token value
Parser
(token attribute)
Source
Program Tokens
Lexical
(Character analyzer
Stream)

In this phase build

table for all identifier
in source program
called symbol table
Symbol Table Manager:

Symbol table is an important data structure

created and maintained by compilers in order to
store all identifiers used in the source program,
information about the occurrence of various
entities such as:
variable names,
function names,
objects,
classes,
interfaces, etc.
The symbol table for C++ Code

Example:
The symbol table for C++ Code

Example:
Lexical Tokens

Function of lexical analyzer:

▪ Read source code (a stream of characters)
▪ Return a stream of tokens, e.g.:

1) Keywords :- while, if ,for,….

2) Identifiers :- Declared by programmer

3) Operators :- +,-,*,/,==,=,….

4) Numeric constant :- numbers such 12,35.5,.9E-23,etc

5) Character constant :- single strings of characters enclosed in quotes.
6) special characters :- characters used as delimiters such as . ( ) , ; :
7) Comments :- Ignored by subsequent phases.
Lexical Tokens
Examples of words are:

(1) keywords - while, if, else, for, ...

• These are words which may have a particular predefined

meaning to the compiler.

• Reserved words are keywords which are not available to

the programmer for use as identifiers.

• In most programming languages, such as Java and C, all

keywords are reserved.
Lexical Tokens

(2) identifiers
• words that the programmer constructs to attach a
name to a construct, Identifiers may be used to identify
variables, classes, constants, functions, etc.

(3) operators
• symbols used for arithmetic, character, or
logical operations, such as +,- ,=,!=, etc.
Lexical Tokens
(4) numeric constants
• numbers such as 124, 12.35, 0.09E-23, etc.
• Numeric constants may be stored in a table.

(5) character constants

• single characters or strings of characters enclosed
in quotes.

(6) special characters

• characters used as delimiters such as .,(,),{,},; these
are generally single-character words.
Lexical Tokens
(7) comments
• Though comments must be detected in the lexical
analysis phase, they are not put out as tokens to the
next phase of compilation.

(8) white space

• Spaces and tabs are generally ignored by the
compiler, except to serve as delimiters in most
languages, and are not put out as tokens.
(9) new line
• In languages with free format, newline characters
should also be ignored .
An example of source input:

• in C language, the variable declaration line

int value = 100;

contains the tokens:

int (keyword)
value (identifier)
= (assignment operator)
100 (integer)
; (symbol (end of statement))
An example of source input:
An example of source input:

average = (sum/count)

average identifier
= Assignment operator
( open parenthesis
sum identifier
/ Division operator
count Identifier
) Close parenthesis
An example of source input:

Count number of tokens?

An example of source input:
How many tokens are there in in this C code :
printf ("i = %d, &I = %x", i, &i) ;
output:
printf
(
"i = %d, &I = %x“
,
i
,
&
i
)
;
Count number of tokens

package tokencount;
import java.util.*;
public class tokeniz {
public static void main(String[] args) {
StringTokenizer st = new StringTokenizer("Aslamu alikum all");
System.out.println("Total tokens : " + st.countTokens());
}
}
Extract tokens:

StringTokenizer st = new StringTokenizer("this is a test");

while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}

output:
This
is
a
test
The following example illustrates how the String.split
method can be used to break up a string into its basic
tokens:

String[] result = "this is a test".split("\\s");

for (int x=0; x<result.length; x++)
System.out.println(result[x]);

output:
this
Is
a
test

Token Lexeme Pattern
No ratings yet
Token Lexeme Pattern
4 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Lexical Analysis Examples
No ratings yet
Lexical Analysis Examples
11 pages
Final Exam Review PDF
67% (3)
Final Exam Review PDF
19 pages
M - Sequence and P - Sequencer in UVM
No ratings yet
M - Sequence and P - Sequencer in UVM
4 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
The Role of The Lexical Analyzer
100% (1)
The Role of The Lexical Analyzer
15 pages
Python Material
No ratings yet
Python Material
314 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lecture 04 05 PDF
No ratings yet
Lecture 04 05 PDF
8 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
CS606 Assignment 1
No ratings yet
CS606 Assignment 1
4 pages
3.role of Lexical Analyzer
No ratings yet
3.role of Lexical Analyzer
4 pages
BC200405108
No ratings yet
BC200405108
5 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Upload 1
No ratings yet
Upload 1
3 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Compilers and Translators Assignment
No ratings yet
Compilers and Translators Assignment
3 pages
UNIT I BKS Lexical Analysis I - Tokens - Lexemes - Pattern
No ratings yet
UNIT I BKS Lexical Analysis I - Tokens - Lexemes - Pattern
28 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
CS606 1
No ratings yet
CS606 1
3 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
2 Lexical Analyzer
No ratings yet
2 Lexical Analyzer
21 pages
Lecture 2.1 - Lexical Analysis
No ratings yet
Lecture 2.1 - Lexical Analysis
24 pages
UNIT I BKS Lesson 3 Lexical Analysis and Role of Lexical Analyzer
No ratings yet
UNIT I BKS Lesson 3 Lexical Analysis and Role of Lexical Analyzer
28 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
@CD - ch2 Compiler Design
No ratings yet
@CD - ch2 Compiler Design
26 pages
Learning Materials, CD, Unit-2 (Lexical Analysis)
No ratings yet
Learning Materials, CD, Unit-2 (Lexical Analysis)
13 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
37 pages
Compiler Construction: Tahir Iqbal
No ratings yet
Compiler Construction: Tahir Iqbal
28 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Difference Between VB - Net and Visual Basic
No ratings yet
Difference Between VB - Net and Visual Basic
2 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Lab Task For Assembly Language
No ratings yet
Lab Task For Assembly Language
7 pages
KIRTI Oracle - Cloud Technical
100% (1)
KIRTI Oracle - Cloud Technical
3 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
An Initial Investigation of ChatGPT Unit Test Generation
No ratings yet
An Initial Investigation of ChatGPT Unit Test Generation
10 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
Swings Lecture Notes PDF
No ratings yet
Swings Lecture Notes PDF
11 pages
VSTS TFS Customization
No ratings yet
VSTS TFS Customization
37 pages
TSDK Overview
No ratings yet
TSDK Overview
16 pages
Certificate Declaration: Topic Name
No ratings yet
Certificate Declaration: Topic Name
16 pages
Software Business Analysis
No ratings yet
Software Business Analysis
9 pages
Quectel SC66&SC600x&SC200R Series Sensor Configuration Guide V1.0
No ratings yet
Quectel SC66&SC600x&SC200R Series Sensor Configuration Guide V1.0
20 pages
Itw Midsem Notes
No ratings yet
Itw Midsem Notes
21 pages
CH 3 Lecture 1
No ratings yet
CH 3 Lecture 1
36 pages
AGILE Methodology
No ratings yet
AGILE Methodology
2 pages
Vcube - Java Course Content
No ratings yet
Vcube - Java Course Content
11 pages
2.6 Layoutmanagers
No ratings yet
2.6 Layoutmanagers
6 pages
Speedware Overview
No ratings yet
Speedware Overview
2 pages
Advanced Unit Testing, Part IV - Fixture Setup - Teardown, Test Repetition and Performance Tests
No ratings yet
Advanced Unit Testing, Part IV - Fixture Setup - Teardown, Test Repetition and Performance Tests
14 pages
Full Download C For Programmers 2nd Ed Edition Harvey M. Deitel PDF
No ratings yet
Full Download C For Programmers 2nd Ed Edition Harvey M. Deitel PDF
67 pages
C Keywords
No ratings yet
C Keywords
4 pages
Java Section 4 Ilearning
No ratings yet
Java Section 4 Ilearning
17 pages
Form1: "Do You Want A New Entry?" "New Entry"
No ratings yet
Form1: "Do You Want A New Entry?" "New Entry"
2 pages
Architectural Design
No ratings yet
Architectural Design
27 pages
Betelhem Worku
No ratings yet
Betelhem Worku
22 pages
CBSE - VI - Computer Science-ASSESSMENT TEST-7 - QP
No ratings yet
CBSE - VI - Computer Science-ASSESSMENT TEST-7 - QP
2 pages
List Theory
No ratings yet
List Theory
24 pages
How To Create Simple Calculator Android App Using Android Studio
No ratings yet
How To Create Simple Calculator Android App Using Android Studio
20 pages
3078java Assignments
No ratings yet
3078java Assignments
4 pages
Web Development Practical 2
No ratings yet
Web Development Practical 2
2 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Lexical Analysis (Scanner)

Uploaded by

Lexical Analysis (Scanner)

Uploaded by

Lexical Analysis (scanner)

The purpose of lexical analyzers:

• The lexical analyzer reads the stream of characters making

Sequence of token-attribute pairs:

Lexical (id,1) (assign,=) (int,1)

o Position is a lexeme that would be mapped into a token (id, 1),

In this phase build

Symbol table is an important data structure

Function of lexical analyzer:

1) Keywords :- while, if ,for,….

2) Identifiers :- Declared by programmer

4) Numeric constant :- numbers such 12,35.5,.9E-23,etc

(1) keywords - while, if, else, for, ...

• These are words which may have a particular predefined

• Reserved words are keywords which are not available to

• In most programming languages, such as Java and C, all

(5) character constants

(6) special characters

(8) white space

• in C language, the variable declaration line

contains the tokens:

Count number of tokens?

StringTokenizer st = new StringTokenizer("this is a test");

String[] result = "this is a test".split("\\s");

You might also like