Concepts - Assignment (Technical Report Template)

This report discusses the development of a lexical analyzer, or scanner, which is the first phase of compilation that converts source code into tokens. The analyzer was implemented in C++ using finite state machines and regular expressions to recognize keywords, operators, and literals. The document outlines the methodology, tools used, and challenges faced during the implementation process.

Uploaded by

ghada.mohamed82005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Concepts - Assignment (Technical Report Template)

Uploaded by

ghada.mohamed82005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

LEXICAL ANALYZER

Build Scanner

Prepared By
Student Name
Ghada Mohamed Mostafa
Student ID
200047779

Under Supervision
Name of Doctor
Nehal Abdelsalam
Name of T. A.
Faris emadeldin
1. Introduction
In this report, I will discuss the process of building a lexical analyzer, which
is the first phase in the process of compilation. A lexical analyzer, also
known as a scanner, plays an essential role in converting a stream of
characters from source code into tokens—each representing a meaningful
element in the program. This conversion is crucial for further stages in the
compiler, such as syntax and semantic analysis.

For this project, I converted a simple C-based lexical analyzer into C++ to
make the code more modular and easier to understand. The implementation
of this lexical analyzer follows the traditional concepts of lexical analysis
using finite state machines to match patterns like keywords, identifiers,
operators, and literals. This report outlines the methodology used, the tools
applied, and the challenges faced during the development of the lexical
analyzer.
1.1. Phases of Compiler
A compiler performs several phases to convert the high-level source
code into machine code. The major phases of a compiler are as follows:
1.Lexical Analysis:
This phase breaks the input source code into tokens (basic units of
code such as keywords, operators, and identifiers). The Lexical
Analyzer (also called the Scanner) reads characters and groups them
into lexemes.
2.Syntax Analysis:
The Syntax Analyzer (Parser) examines the syntax of the program to
ensure the correct arrangement of tokens. It checks if the token
sequence adheres to the language’s grammar.
3.Semantic Analysis:
In this phase, the compiler ensures that the program’s logic makes
sense. It checks for semantic errors like type mismatches, undeclared
variables, and other logical errors.
4.Intermediate Code Generation:
The compiler translates the program into an intermediate form that
is easier to manipulate. This code is independent of the target machine.
5.Code Optimization:
The intermediate code is optimized for better performance, which
may include improving memory usage or execution time.
6.Code Generation:
The final phase generates the target machine code (or assembly
language), which can be executed by the computer.
7.Code Linking and Assembly:
The generated machine code is linked with libraries or other modules to
create the final executable.
2. Lexical Analyzer
The lexical analyzer is the first phase of the compiler, and its job is to read
the source code character-by-character, identifying meaningful sequences
and grouping them into tokens. These tokens represent the smallest units of
the programming language, such as keywords, operators, identifiers, and
literals.

The lexical analyzer can recognize these tokens using a state machine
approach, where each state corresponds to a specific action (e.g., recognizing
a number or an identifier). It works in conjunction with regular expressions,
which define the pattern for each token type.

The main goal of the lexical analyzer is to simplify the parsing process by
providing a stream of tokens to the next phase of the compiler. The analyzer
reads the input stream and identifies the tokens, helping the compiler build
a structured representation of the program.
3. Software Tools
3.1. Computer Program
The program for the lexical analyzer was written in C++. The choice of
C++ allows for efficient handling of memory and the ability to
manipulate strings easily. The program reads input from a file,
processes the characters one by one, and generates tokens based on
predefined patterns.
3.2. Programming Language
The lexer was designed for simple arithmetic expressions using C++. It
supports basic arithmetic operations such as addition, subtraction,
multiplication, and division, as well as parentheses and integer literals.
The lexer also identifies identifiers, which can represent variables or
function names.
4. Implementation of a Lexical Analyzer

1. Including Libraries:
#include <iostream>
#include <fstream>
#include <cctype>
#include <string>
In this section, we include several standard libraries:
•iostream: allows us to handle input and output, which we use to print
out data using cout.
•fstream: is used for file operations, enabling us to read from files.
•cctype: provides functions like isalpha() and isdigit(), which are
helpful to check whether a character is a letter or a digit.
•string: includes the string class to work with text data, which we use
to store the lexemes we encounter.
2. Defining Constants:
#define LETTER 0
#define DIGIT 1
#define UNKNOWN 99
#define END_OF_FILE -1
Here, we define constants to categorize the characters:
•LETTER: indicates that the character is a letter (alphabet).
•DIGIT: used when the character is a number.
•UNKNOWN: when the character is something else that doesn’t fit into
the known categories.
•END_OF_FILE: marks the end of the file.

3.Token Codes:

#define INT_LIT 10

#define IDENT 11

#define ASSIGN_OP 20

#define ADD_OP 21

#define SUB_OP 22

#define MULT_OP 23

#define DIV_OP 24

#define LEFT_PAREN 25

#define RIGHT_PAREN 26

In this section, we define tokens that correspond to different parts of the

input:

•INT_LIT: represents integer literals.

•IDENT: identifies variable names or identifiers.

•Other tokens represent operations like assignment (ASSIGN_OP),
addition (ADD_OP), subtraction (SUB_OP), etc., and parentheses.

4. Global Variables:

int charClass;
string lexeme = "";
char nextChar;
int lexLen;
int token;
int nextToken;
ifstream inFile;
Here we declare several variables:
• charClass: stores the type of the character (whether it’s a letter, digit,
etc.).
• lexeme: holds the current lexeme (a sequence of characters treated as
a unit).
• nextChar: stores the next character to be processed.
• lexLen: keeps track of the length of the lexeme.
• token: stores the current token we’re processing.
• nextToken: stores the next token after processing the current one.
• inFile: the file stream used to open and read from the input file.

5. Function Declarations:
void addChar();
void getChar();
void getNonBlank();
int lex();
int lookup(char ch);
We declare functions here that will handle various tasks:
•addChar(): adds the current character to the lexeme.
•getChar(): retrieves the next character from the input file.
•getNonBlank(): skips over any blank spaces or tabs.
•lex(): the main lexical analysis function that processes the input
and identifies tokens.
•lookup(): handles special characters like operators and
parentheses.
6.addChar() Function:
void addChar() {
lexeme += nextChar;
}
This function adds the current character (nextChar) to the lexeme
being built. For example, if we encounter a series of letters or
digits, they are appended one by one to form the full lexeme.

7.getChar() Function:

void getChar() {
if (inFile.get(nextChar)) {
if (isalpha(nextChar))
charClass = LETTER;
else if (isdigit(nextChar))
charClass = DIGIT;
else
charClass = UNKNOWN;
} else {
charClass = END_OF_FILE;
}
}
Here, we read the next character from the file using
inFile.get(nextChar). We check the character:
• If it’s a letter (isalpha(nextChar)), we set charClass to
LETTER.
• If it’s a digit (isdigit(nextChar)), we set charClass to DIGIT.
• If it’s neither, we classify it as UNKNOWN.
• If we’ve reached the end of the file, we set charClass to
END_OF_FILE.
8. getNonBlank() Function:
void getNonBlank() {
while (isspace(nextChar))
getChar();
}
This function is used to skip over any whitespace (spaces, tabs, etc.)
in the input. It calls getChar() repeatedly until a non-whitespace
character is found.
9. lookup() Function:
int lookup(char ch) {
switch (ch) {
case '(': addChar(); return LEFT_PAREN;
case ')': addChar(); return RIGHT_PAREN;
case '+': addChar(); return ADD_OP;
case '-': addChar(); return SUB_OP;
case '*': addChar(); return MULT_OP;
case '/': addChar(); return DIV_OP;
default: addChar(); return END_OF_FILE;
}
}
In this function, we check for special characters like parentheses or
operators. Based on the character, we:
• Add it to the lexeme using addChar().
• Return the corresponding token (e.g., LEFT_PAREN for (,
ADD_OP for +).
10. lex() Function:
int lex() {
lexeme = "";
getNonBlank();
switch (charClass) {
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT) {
addChar();
getChar();
}
nextToken = IDENT;
break;
case DIGIT:
addChar();
getChar();
while (charClass == DIGIT) {
addChar();
getChar();
}
nextToken = INT_LIT;
break;
case UNKNOWN:
nextToken = lookup(nextChar);
getChar();
break;
case END_OF_FILE:
nextToken = END_OF_FILE;
lexeme = "EOF";
break;
}
cout << "Next token is: " << nextToken << ", Next lexeme is " <<
lexeme << endl;
return nextToken;
}
In this function, we process the current character and determine
what token it represents:
• If the character is a letter, we start building an identifier
(IDENT).
• If it’s a digit, we build an integer literal (INT_LIT).
• If it’s a special character (like an operator or parenthesis),
we use the lookup() function.
• If we reach the end of the file, we set the token to
END_OF_FILE.
11. main() Function:
int main() {
inFile.open("front.in");
if (!inFile.is_open()) {
cout << "ERROR - cannot open front.in" << endl;
return 1;
}
getChar();
do {
lex();
} while (nextToken != END_OF_FILE);
inFile.close();
return 0;
}
In the main function:
•We attempt to open the input file front.in. If it fails, we print an
error message.
•We start reading characters from the file with getChar().
•We process the tokens using lex() until we reach the end of the
file.
•Finally, we close the file.

5.References
Sebesta, R. W. (2025). A Concept of Programming Language
(12th ed.). University of Colorado at Colorado Springs.

Important Note: -
Technical reports include a mixture of text, tables, and figures. Consider how you can
present the information best for your reader. Would a table or figure help to convey your
ideas more effectively than a paragraph describing the same data?
Figures and tables should: -
 Be numbered
 Be referred to in-text, e.g. In Table 1…, and
 Include a simple descriptive label - above a table and below a figure.

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
GL Whitepaper
No ratings yet
GL Whitepaper
25 pages
IDM Crack 6.32 Build 9 Serial Key Final Retail + Patch (Latest) (2019) PDF
No ratings yet
IDM Crack 6.32 Build 9 Serial Key Final Retail + Patch (Latest) (2019) PDF
19 pages
560648-005 Draft ProJob - User Guide Final Issue A
No ratings yet
560648-005 Draft ProJob - User Guide Final Issue A
79 pages
Compiler Design Lab
No ratings yet
Compiler Design Lab
68 pages
Compiler Design
No ratings yet
Compiler Design
40 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
4 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
36 pages
CS3501 - Compiler Design Lab Manual
No ratings yet
CS3501 - Compiler Design Lab Manual
53 pages
EX - NO:1 Implementation of Symbol Table Date
No ratings yet
EX - NO:1 Implementation of Symbol Table Date
65 pages
BCSE307P - Compiler Lab Manual
No ratings yet
BCSE307P - Compiler Lab Manual
48 pages
Cs3501 Compiler Design Lab Manual
No ratings yet
Cs3501 Compiler Design Lab Manual
54 pages
CD Lab Manual (1) - 1
No ratings yet
CD Lab Manual (1) - 1
60 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
34 pages
Cdrec 1
No ratings yet
Cdrec 1
29 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
Compiler Record
No ratings yet
Compiler Record
42 pages
Cs6612 Compiler Laboratory
No ratings yet
Cs6612 Compiler Laboratory
67 pages
Compiler Construction: Tahir Iqbal
No ratings yet
Compiler Construction: Tahir Iqbal
28 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
28 pages
FINAL CS3501 Compiler Design LAB
No ratings yet
FINAL CS3501 Compiler Design LAB
49 pages
CS3501 STudent Reference
No ratings yet
CS3501 STudent Reference
69 pages
Cs3501 Compiler Design Laboratory
No ratings yet
Cs3501 Compiler Design Laboratory
50 pages
CD Unit-1 (Part-1)
No ratings yet
CD Unit-1 (Part-1)
18 pages
17ACS42 Manual
No ratings yet
17ACS42 Manual
54 pages
CS3501-Compiler Lab-2021R-Updated-19-7-2023
No ratings yet
CS3501-Compiler Lab-2021R-Updated-19-7-2023
44 pages
LA Using Transition Diagram
No ratings yet
LA Using Transition Diagram
2 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
CC Record
No ratings yet
CC Record
83 pages
CD Lab Record
No ratings yet
CD Lab Record
40 pages
CD Student Manual
No ratings yet
CD Student Manual
76 pages
Ex 1 - Lexical Analyser
No ratings yet
Ex 1 - Lexical Analyser
8 pages
Sec B
No ratings yet
Sec B
43 pages
Compiler Design Lab
100% (1)
Compiler Design Lab
15 pages
CD Prac File
No ratings yet
CD Prac File
24 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Cs6612 Compiler Laboratory
No ratings yet
Cs6612 Compiler Laboratory
42 pages
CD - Yash Final
No ratings yet
CD - Yash Final
50 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
CD Lab Manual
No ratings yet
CD Lab Manual
83 pages
CD Lab Manual PDF
No ratings yet
CD Lab Manual PDF
83 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Laboratory Manual: COMP - 433
No ratings yet
Laboratory Manual: COMP - 433
30 pages
CD Lab Manual
No ratings yet
CD Lab Manual
41 pages
Compiler Design: Practical File
88% (8)
Compiler Design: Practical File
43 pages
Compiler Design Practical File
No ratings yet
Compiler Design Practical File
43 pages
Final Lab Manual CC
No ratings yet
Final Lab Manual CC
42 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
Lec3-CompilerConstruction 2
No ratings yet
Lec3-CompilerConstruction 2
26 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
Compiler Design and Construction Lab Manual
No ratings yet
Compiler Design and Construction Lab Manual
52 pages
CD Manual
No ratings yet
CD Manual
58 pages
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
16 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
Lec 3-Compiler Construction
0% (1)
Lec 3-Compiler Construction
26 pages
BDA Assignment
No ratings yet
BDA Assignment
55 pages
Cdlab T.K.R
No ratings yet
Cdlab T.K.R
12 pages
Compiler Lab (2016-2017)
No ratings yet
Compiler Lab (2016-2017)
63 pages
Assignment 2 04042021 045308pm
No ratings yet
Assignment 2 04042021 045308pm
14 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
51 Log Siemens PDF
No ratings yet
51 Log Siemens PDF
2 pages
B ACI Best Practices
No ratings yet
B ACI Best Practices
232 pages
Main Method in C#: Sandesh M Patil
No ratings yet
Main Method in C#: Sandesh M Patil
9 pages
Password Security Research Paper
No ratings yet
Password Security Research Paper
18 pages
High End: Test: CPU RAM CPU
No ratings yet
High End: Test: CPU RAM CPU
26 pages
Ad Audit Plus Use Cases
No ratings yet
Ad Audit Plus Use Cases
32 pages
QuickBooks Error Code 6210, 0
No ratings yet
QuickBooks Error Code 6210, 0
6 pages
User Guide: Xperia XZ Premium
No ratings yet
User Guide: Xperia XZ Premium
154 pages
2024-07-23 Release Notes For Xiegu X6200 V1.0.1 As of 2024-07-20
No ratings yet
2024-07-23 Release Notes For Xiegu X6200 V1.0.1 As of 2024-07-20
15 pages
Cursor
No ratings yet
Cursor
4 pages
What Are The Different Types of Joins? What Is The Difference Between Them? Inner Join
No ratings yet
What Are The Different Types of Joins? What Is The Difference Between Them? Inner Join
9 pages
CS343 Embedded SQL Tutorial
No ratings yet
CS343 Embedded SQL Tutorial
15 pages
Other Course Information ARTC 1305 Basic Graphic Design/ Photoshop Ii SPRING, 2012
No ratings yet
Other Course Information ARTC 1305 Basic Graphic Design/ Photoshop Ii SPRING, 2012
8 pages
Liftnetinstallationmanual
No ratings yet
Liftnetinstallationmanual
15 pages
UNIT - 4 Notes
No ratings yet
UNIT - 4 Notes
28 pages
MSX Computing - Jun-Jul 1986
No ratings yet
MSX Computing - Jun-Jul 1986
68 pages
CSM 151 - Lec1 - 2024
No ratings yet
CSM 151 - Lec1 - 2024
69 pages
Salesforce Interview Questions and Answers - Salesforce Admin and Developer Training For Beginners
No ratings yet
Salesforce Interview Questions and Answers - Salesforce Admin and Developer Training For Beginners
10 pages
3D - 4D Image Reconstruction
No ratings yet
3D - 4D Image Reconstruction
30 pages
CustomizedCourse S7-1500 V1.0
No ratings yet
CustomizedCourse S7-1500 V1.0
68 pages
Laporan Desain Proyek
No ratings yet
Laporan Desain Proyek
28 pages
Tenetech Guide On Using The Mobile Application 1
No ratings yet
Tenetech Guide On Using The Mobile Application 1
6 pages
Dell Assignment
No ratings yet
Dell Assignment
7 pages
Operation of Frigopack E FMV With Rs485 Modbus Rtu: Connections
No ratings yet
Operation of Frigopack E FMV With Rs485 Modbus Rtu: Connections
2 pages
Network Monitoring Tools
No ratings yet
Network Monitoring Tools
33 pages
WTL Mini Project ON: Online Food Ordering System
No ratings yet
WTL Mini Project ON: Online Food Ordering System
27 pages
HTC23 - HTCondorArchitecture
No ratings yet
HTC23 - HTCondorArchitecture
13 pages