Lab - 03 Lexical Analysis Complete
Lab - 03 Lexical Analysis Complete
Introduction
The lexical analyzer is the part of the compiler that reads the source program and performs
certain secondary tasks at the user interface. One such task is stripping out comments and
white space in the form of blanks, tabs and new line characters, from the source program.
Another is correlating error messages from the compiler with the source program i.e. keeping a
correspondence between errors and source line numbers.
Objectives
Tools/Software Requirement
Description
Lexical analysis is the process of converting a sequence of characters into a sequence of tokens.
A program or function which performs lexical analysis is called a lexical analyzer, lexer or a
scanner. A lexer often exists as a single function which is called by a parser or another function.
For LAB experiments we use a very simple high level programming language TINY whose
specifications are as given below. Although it is a very simple language – it lacks floating point
values, arrays, functions and many more – but it is still large enough to explain essential
features of compiler construction.
Program Statements
2. There are no procedures or functions, not even the main function. No function calls,
return statements or function prototypes.
3. Declaration of variables before their use is not required. Variables are automatically
declared when values are assigned to them.
4. All variables are integer variable. Variables of other data types like float, char etc. are
not allowed.
5. There are only two control statements: an if-statement and a repeat-statement. Both
control statements may themselves contain statement sequences. An if-statement has
optional else-part and must be terminated by the keyword end. The repeat-until
statement is the only loop available in TINY. It must be terminated with a semicolon.
6. Input is taken by the read statement which consists of the reserve word read followed
by the name of variable.
7. For output write statement is used which consists of reserve word write followed by the
variable name or some integer constant.
8. Comments are enclosed in curly brackets {. . .}. Comments cannot be nested, however
multiline comments are allowed.
a) Arithmetic Expressions:
b) Boolean Expressions:
a) Reserved Words
There are eight reserved words which include if, then, else, end, repeat, until, read and write
b) Special Symbols
Arithmetic Operators +, -, *, /
Parenthesis (, )
Semicolon ;
Only one operator i.e. := consists of two characters, all remaining are one character long.
c) Others
The two other tokens are number (1 or more digits) and identifier (1 or more letters).
digit
white INNUM
digit
space [other]
letter
letter [other]
START INID DONE
=
:
[other]
{
}
INASSIGN
INCOMMENT
other other
Lab Tasks
We will be considering TINY (Specifications are given in a separate document) as our source
language. In particular, your scanner implementation should return tokens for the following
token types:
Test your implementation using the “factorial” program written in TINY language below.
Factorial Program:
if 0 < x then
fact := 1;
repeat
fact := fact * x;
x := x – 1;
until x = 0;
write fact;
end