0% found this document useful (0 votes)

35 views21 pages

2 Lexical Analyzer

Uploaded by

Salam Abdulla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views21 pages

2 Lexical Analyzer

Uploaded by

Salam Abdulla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

University of sulaimani

College of science
Department of Computer Science

Compiler
First Phase of Compiler
Lexical Analyzer

2023-2024
Mzhda Hiwa Hama
Lexical Analyzer
• The lexical analyzer takes a source program as input, and
produces a stream of tokens as output. The lexical analyzer
might recognize particular instances of tokens Typically :

- Each keyword is a token.

- Each identifier is a token.
- Each constant is a token.
- Each sign, operator is a token.
Terms

Token: A token is a group of characters having collective meaning:

typically a word or punctuation mark, separated by a lexical
analyzer and passed to a parser.

Lexeme: is the actual character sequence forming a token, as the

token is the general class that a lexeme belongs to.

Pattern: A rule that describes the set of strings associated to a

token. Expressed as a regular expression and describing how a
particular token can be formed.

For example, [A-Za-z][A-Za-z_0-9] *

Example
int sum = 3 + 2 ;

Lexem Token
int Keyword
sum Identifier
= Assignment operator
3 Number
+ Addition operator
2 Number
; Punctuation symbol
.

Token Classes

1. Identifier: variable names, constant ,methods, classes,

parameters, user defined data. Identifiers in java must be
composed of letters, numbers, the underscore _ and the dollar
sign $. Identifiers may only begin with a letter, the underscore or
a dollar sign.

2. Keyword: is any words that have a predefined meaning in the

language; programmers cannot use keywords as names
for variables, methods, classes, or as any other identifiers.
2. Operator : mathematical & logical operations , +,- ,=, >,*,
&&, || .

3. Separator:

4. Literal : Boolean, integer, floating point, string, character,

true ,false, 2, 6.02e23 , “music”, null.

.
Functions of Lexical Analyzer
1. It produces stream of tokens.

2. It eliminates (comments(single line and multiple line comment)

and whitespace (blank, newline, tab, etc…)

3. It generates symbol table which stores the information about

identifiers, constants seen in the input.

4. It keeps track of line numbers. the lexical analyzer may keep

track of the number of newline characters seen, so it can
associate a line number with each error message.

5. It reports the errors encountered while generating the tokens.

Error handling in Lexical Analyzer

• The scanner is tasked with determining that the input stream

can be divided into valid symbols in the source language, but
has no knowledge about which token should come where.

• Few errors can be detected at the lexical level alone because

the scanner has a very localized view of the source program
without any context.

• The scanner can report about characters that are not valid
tokens (e.g., an illegal or unrecognized symbol) and a few other
malformed entities (illegal characters within a string constant,
unterminated comments, etc.)
Error handling
Example 1 : printf(“Compiler");$
This is a lexical error since an illegal
character $ appears at the end of statement.

Example 2 : This is a comment */ This is an

lexical error since end of comment is
present but beginning is not present.

It does not look for or detect garbled sequences, tokens out of

place, undeclared identifiers, misspelled keywords…etc,
The syntax analyzer will catch this error later in the next phase.
How symbol table is used by Lexical Analyzer

When an identifier in the source program is detected by the lexical

analyzer, it is inserted into the symbol table which hold
information about the identifier, such as its name and type.

Symbol table in the context of a lexical analyser might contain:

1. Identifiers (variable names, function names, etc.).
2. Associated information like data type, scope, and memory
location.
3. Additional attributes required for semantic analysis or code
generation phases of compilation.
Symbol table
The remaining phases also insert information about
identifiers into the symbol table and then use this
information in various ways.

• For example, when doing semantic analysis we

need to know what the types of identifiers are.
Reading Ahead

A lexical analyzer may need to read ahead some characters

before it can decide on the token to be returned to the
parser. For example, a lexical analyzer for C or Java must
read ahead after it sees the character >. If the next
character is =, then > is part of the character sequence >=,
the lexeme for the token for the "greater than or equal to"
operator. Otherwise > itself forms the "greater than"
operator, and the lexical analyzer has read one character
too many.
Reading Ahead

A general approach to reading ahead on the input, is to

maintain an input buffer from which the lexical analyzer
can read and push back characters. Input buffers can be
justified on efficiency grounds alone, since fetching a block
of characters is usually more efficient than fetching one
character at a time. A pointer keeps track of the portion of
the input that has been analyzed; pushing back a character
is implemented by moving back the pointer.
Recognizing Keywords and Identifiers

Recognizing keywords and identifiers presents a problem.

Usually, keywords like if or then are reserved so they are not
identifiers even though they look like identifiers.

There are two ways that we can handle reserved words that
look like identifiers:

1. Install the reserved words in the symbol table initially. A field

of the symbol-table entry indicates that these strings are
never ordinary identifiers, and tells which token they
represent . Of course, any identifier not in the symbol table
during lexical analysis cannot be a reserved word, so its
token is identifier.
2. Create separate transition diagrams for each keyword. such a
transition diagram consists of states representing the situation
after each successive letter of the keyword is seen, followed by a
test for a "nonletter-or-digit”.
Output of Lexical Analyzer

• The lexical analyzer produces as output a token of the

form (token-name, attribute-value).

• The first component token-name is an abstract

symbol that is used during syntax analysis,

• and the second component attribute-value points to

an entry in the symbol table for this token.
Example
• Suppose a source program contains the assignment statement
position = initial + rate * 60
• position is a lexeme that would be mapped into a token (id, 1),
where id is an abstract symbol standing for identifier and 1
points to the symbol table entry for position.

• The assignment symbol = is a lexeme that is mapped in- to the

token (=).Since this token needs no attribute-value

• initial is a lexeme that is mapped into the token (id, 2), where
2 points to the symbol-table entry for initial.

• 60 is a lexeme that is mapped into the token (60).

Token names and values

• So, this is the representation of the whole statement

after lexical analysis
position = initial + rate * 60

Lexical
analyzer

(id,1) (=) (id, 2) (+) (id, 3) (*) (60)

Parser
Simple Example of Symbol table

Lexeme: variable_name
Token Type: IDENTIFIER
Attribute:<symbol_table_entry_ptr>

•Lexeme: "variable_name" is the actual string encountered in the

source code.
•Token Type: Indicates the type of token associated with the
lexeme. In this case, it's an identifier.
•Attribute: Points to the symbol table entry associated with the
lexeme. This pointer allows the lexical analyzer to access
additional information about the identifier, such as its type, scope,
memory address, etc.
Implementing Lexical Analyzer

Since the lexical structure of every programming language can

be specified by a regular language, a common way to
implement a lexical analyzer is to:

• Specify regular expressions for all of the kinds of tokens in

the language. Then, use the alternation operator to create
a single regular expression that recognizes the language of
all valid tokens.
• Convert the overall regular expression specifying all
possible tokens into a deterministic finite automaton
(DFA).
• Translate the DFA into a program that simulates the DFA.
This program is the lexical analyzer.
Homework

1. List elements of each token types:

a. Keyword (all keyword in java)

b. Identifier
c. Separator
d. Operator

2. Write regular expression for keyword and identifiers?

3. What are ( Lex or JFlex)?

4.In how many ways we can implement Lexical Analyzer?

Comp2521 Lab03
No ratings yet
Comp2521 Lab03
4 pages
Evans Analytics2e PPT 02
No ratings yet
Evans Analytics2e PPT 02
29 pages
Original Slides by Daniel Liang Modified Slides by Salam Abdulla
No ratings yet
Original Slides by Daniel Liang Modified Slides by Salam Abdulla
112 pages
MidTermLabTest (2021)
No ratings yet
MidTermLabTest (2021)
10 pages
Time Delay in PIC
100% (2)
Time Delay in PIC
2 pages
Backend Developer Roadmap
No ratings yet
Backend Developer Roadmap
3 pages
IPL Lab 1 Introduction
No ratings yet
IPL Lab 1 Introduction
32 pages
Wearing The Hair Shirt
100% (1)
Wearing The Hair Shirt
68 pages
Patterns and Frameworks: Purpose of This Lecture
No ratings yet
Patterns and Frameworks: Purpose of This Lecture
12 pages
Masm Procedures
No ratings yet
Masm Procedures
9 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
1 Computer Programming Introduction
No ratings yet
1 Computer Programming Introduction
38 pages
Algorithm Techniques Seminar
No ratings yet
Algorithm Techniques Seminar
11 pages
SECJ3303 202120221 Test1b Unlocked
No ratings yet
SECJ3303 202120221 Test1b Unlocked
10 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Database Concepts: Need For A Database
No ratings yet
Database Concepts: Need For A Database
3 pages
CSE 425: Software Design and Pattern: Section 1
No ratings yet
CSE 425: Software Design and Pattern: Section 1
11 pages
Neural Network Course
No ratings yet
Neural Network Course
6 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Unit 3
No ratings yet
Unit 3
16 pages
Every React Concept Explained in 5 Minutes - DEV Community
No ratings yet
Every React Concept Explained in 5 Minutes - DEV Community
18 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
3x SuperTrend
No ratings yet
3x SuperTrend
2 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Lexical Analyzer: Design and Implementation With LEX Tool
No ratings yet
Lexical Analyzer: Design and Implementation With LEX Tool
13 pages
Xilinx Tools in Command Line Mode
No ratings yet
Xilinx Tools in Command Line Mode
6 pages
IBM - FORTRAN Coding Form
No ratings yet
IBM - FORTRAN Coding Form
1 page
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
HW & SW Engineer Open Roles-CS
No ratings yet
HW & SW Engineer Open Roles-CS
6 pages
3 Syntax Analysis
No ratings yet
3 Syntax Analysis
42 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Lecture 2.1 - Lexical Analysis
No ratings yet
Lecture 2.1 - Lexical Analysis
24 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Satish Resume
No ratings yet
Satish Resume
2 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Lecture 8.2 - Multi-Dimensional Arrays
No ratings yet
Lecture 8.2 - Multi-Dimensional Arrays
125 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Lecture 04 05 PDF
No ratings yet
Lecture 04 05 PDF
8 pages
1 Finite Automata
No ratings yet
1 Finite Automata
62 pages
Sougata Jana - CSC407 - OS - 2nd - Year - CSE - SET3
No ratings yet
Sougata Jana - CSC407 - OS - 2nd - Year - CSE - SET3
2 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Lexical Analysis in Compiler Design With Example
No ratings yet
Lexical Analysis in Compiler Design With Example
8 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Compilers and Translators Assignment
No ratings yet
Compilers and Translators Assignment
3 pages
Upload 1
No ratings yet
Upload 1
3 pages
Uncoalesced Global Accesses
No ratings yet
Uncoalesced Global Accesses
14 pages
Lect 05
No ratings yet
Lect 05
38 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
VB Net Notes All Units
No ratings yet
VB Net Notes All Units
13 pages
Recognition of Token in Lexical Analysis-3
No ratings yet
Recognition of Token in Lexical Analysis-3
10 pages
Unix Lab Manual Bca 5
No ratings yet
Unix Lab Manual Bca 5
20 pages
CS606 Assignment No 1
No ratings yet
CS606 Assignment No 1
3 pages
What Is The Role of The Lexical Analyzer in Compiler Design
No ratings yet
What Is The Role of The Lexical Analyzer in Compiler Design
2 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Advanced Bash Shell Scripting Guide - Reference Cards
No ratings yet
Advanced Bash Shell Scripting Guide - Reference Cards
5 pages
4.2. Korobov Algorithm1
No ratings yet
4.2. Korobov Algorithm1
10 pages
String DS
No ratings yet
String DS
13 pages
JavaScript Notes 3
No ratings yet
JavaScript Notes 3
15 pages
Lexical Analysis (Scanner)
No ratings yet
Lexical Analysis (Scanner)
26 pages
2.1lexical Analysis
No ratings yet
2.1lexical Analysis
29 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
CS606 1
No ratings yet
CS606 1
3 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Role of Lexical Analyzer
No ratings yet
Role of Lexical Analyzer
2 pages
BC200405108
No ratings yet
BC200405108
5 pages
Role of A Lexical AN
No ratings yet
Role of A Lexical AN
26 pages
JAVA REPORT Me
No ratings yet
JAVA REPORT Me
17 pages
Comp Final
No ratings yet
Comp Final
16 pages
3.role of Lexical Analyzer
No ratings yet
3.role of Lexical Analyzer
4 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
ACD Unit-2 Part-2
No ratings yet
ACD Unit-2 Part-2
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
CSE220 Data Structures - Course Description and Outcome - Zaber Mohammad
No ratings yet
CSE220 Data Structures - Course Description and Outcome - Zaber Mohammad
7 pages
CS606 Assignment 1
No ratings yet
CS606 Assignment 1
4 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
37 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
Indirect Communication Distributed Systems
No ratings yet
Indirect Communication Distributed Systems
3 pages
Unit 2
No ratings yet
Unit 2
14 pages
HW 31712
No ratings yet
HW 31712
22 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
ATCD
No ratings yet
ATCD
9 pages

2 Lexical Analyzer

Uploaded by

2 Lexical Analyzer

Uploaded by

University of sulaimani

- Each keyword is a token.

Token: A token is a group of characters having collective meaning:

Lexeme: is the actual character sequence forming a token, as the

Pattern: A rule that describes the set of strings associated to a

For example, [A-Za-z][A-Za-z_0-9] *

1. Identifier: variable names, constant ,methods, classes,

2. Keyword: is any words that have a predefined meaning in the

4. Literal : Boolean, integer, floating point, string, character,

2. It eliminates (comments(single line and multiple line comment)

3. It generates symbol table which stores the information about

4. It keeps track of line numbers. the lexical analyzer may keep

5. It reports the errors encountered while generating the tokens.

• The scanner is tasked with determining that the input stream

• Few errors can be detected at the lexical level alone because

Example 2 : This is a comment */ This is an

It does not look for or detect garbled sequences, tokens out of

When an identifier in the source program is detected by the lexical

Symbol table in the context of a lexical analyser might contain:

• For example, when doing semantic analysis we

A lexical analyzer may need to read ahead some characters

A general approach to reading ahead on the input, is to

Recognizing keywords and identifiers presents a problem.

1. Install the reserved words in the symbol table initially. A field

• The lexical analyzer produces as output a token of the

• The first component token-name is an abstract

• and the second component attribute-value points to

• The assignment symbol = is a lexeme that is mapped in- to the

• 60 is a lexeme that is mapped into the token (60).

• So, this is the representation of the whole statement

(id,1) (=) (id, 2) (+) (id, 3) (*) (60)

•Lexeme: "variable_name" is the actual string encountered in the

Since the lexical structure of every programming language can

• Specify regular expressions for all of the kinds of tokens in

1. List elements of each token types:

a. Keyword (all keyword in java)

2. Write regular expression for keyword and identifiers?

3. What are ( Lex or JFlex)?

4.In how many ways we can implement Lexical Analyzer?

You might also like