Lab 4
Lab 4
1 Objectives
1. To familiarize students with the basics of the Flex tool, its structure, and its application
in generating lexical analyzers.
2. To understand the three main sections of a Flex program: denition, rules, and user code,
and their roles in building a lexer.
3. To understand and apply regular expressions for token denitions and pattern matching.
4. To learn how to compile and execute Flex programs, analyze the generated C les, and
observe the behavior of the lexer.
5. To demonstrate the use of Flex-specic variables such as yytext, yyleng, yyin, and yyout
and understand their roles in input and output handling.
2 Introduction
Lex/Flex is a widely used tool that automates the process of creating lexical analyzers, which
are crucial components in the development of compilers and interpreters. By default, Flex is
included in most standard Linux operating systems. The primary purpose of Flex is to generate
a lexer, also known as a scanner, which reads input streams of text and breaks them into a
sequence of tokens based on predened rules. These tokens serve as the building blocks for
further stages of language processing, such as parsing.
Essentially, a lexer transforms the raw input text into meaningful units called tokens. Tokens
are sequences of characters that represent identiable elements of a language, such as keywords,
identiers, operators, or numbers. Flex facilitates this by allowing developers to dene a set
of rules, expressed as regular expressions (RE), that describe valid tokens. Flex then processes
these rules and produces a C program, known as the lexer or scanner, which performs the
tokenization task.
Regular expressions (RE) form the backbone of the token denitions in Flex. They are powerful
constructs for pattern matching, capable of representing complex tokenization rules in a concise
manner. Flex compiles these regular expressions into ecient code that scans the input text
and identies tokens according to the specied patterns.
A typical workow with Flex involves writing a Flex specication le, compiling it to generate
the C source code for the lexer, and then further compiling this C code to create an executable
lexer. This lexer can process input streams, identify tokens, and execute corresponding actions,
making it a vital tool for compiler construction, text processing, and many other applications
in programming.
1
The gure below illustrates the input and output of a lexer:
By understanding the workings of Flex and its specication structure, developers can create
robust and ecient lexical analyzers tailored to their specic requirements. This lab aims to
provide a comprehensive introduction to the Flex tool, guiding students through its key features,
usage, and practical applications in the context of tokenization and lexical analysis.
%%
Rules Section
%%
• Flex denitions A denition is very much like a #define C directive. For example
letter [a-zA-Z]
digit [0-9]
punct [!?]
nonblank [^\t]
2
These denitions can be used in the rules section: one could start a rule
• State denitions: If a rule depends on context, it’s possible to introduce states and
incorporate those into the rules.
A state denition looks like %s STATE_NAME, and a state INITIAL is already given by
default.
We will get into details of this in the next lab sheet.
The rules section is the core of a Flex program, where patterns and their corresponding actions
are dened. Each rule in this section consists of a pattern, written as a regular expression, and
an associated action in the C code. When the (longest) prex of the input matches a pattern,
the corresponding action is executed. The rules section is enclosed between two sets of double
percentage signs (%% ...%%).
• Action: A block of C code that is executed when the pattern is matched. If the action is
more than a single command, it should be enclosed in braces.
In this example:
• The rst rule matches words made up of letters and prints them.
• The third rule matches newlines and indicates when they occur.
• The last rule catches any character not matched by previous rules and labels it as unknown.
3
3.2.2 Important Notes on Patterns and Actions
• Patterns can include Flex denitions from the denitions section to simplify and modu-
larize rules.
• Actions can modify variables, call functions, or perform any task C language supports.
• When no action is provided for a rule, the default action is to discard the matched text
and continue scanning.
Flex resolves conicts using the following principles when multiple rules match the same input.
• Longest Match: If two or more rules match prexes of the input, the rule that matches
the longest sequence of characters is chosen, and the corresponding characters are taken
as a valid lexeme.
For example, the pattern [a-z]+ will match "hello" rather than stopping at "h" (see
Example 3 for more understanding).
• Rule Order: If two matches are of the same length, the rst rule in the list is selected (see
Example 4). Therefore, rules should be ordered carefully to ensure the desired behavior.
The User Code section is where additional logic, such as main functions or helper functions,
is implemented. It can contain the program’s main entry point and any auxiliary functions
that support the lexer. If left empty, a default main() function is provided, which simply calls
yylex().
If main() function is included, then it should call yylex() where yylex is the scanner built
from the rules.
int main()
{
yylex();
return 0;
}
In addition, this section may contain some user-dened functions. See the below example.
4
%{
#include<stdio.h>
void printMessage(const char* message);
%}
%%
[a-zA-Z]+ { printMessage("Word detected!"); }
[0-9]+ { printMessage("Number detected!"); }
\n { return 0; }
%%
This will create a C le lex.yy.c (will contain code in C for the lexer)
5
Example 1 The following program counts the number of vowels and consonants in a given
input string.
//flex defintions
vowels [aeiouAEIOU]
alphabets [a-zA-Z]
newline \n
%%
//Rules section
{vowels} {vowels++; }
{alphabets} {cons++;}
{newline} {return 0;}
%%
Try to compile the above program given in Example 1 (save the program with name p1.l)
and observe the output. See the following gures; Figure 2 shows the compilation process and
changes in the current directory, and Figure 3 shows the output of the program p1.l for dierent
inputs.
6
Figure 3: Output of p1.l for two dierent inputs.
Task 1 (a) Remove “{newline} {return 0;}” from p1.l and observe the change in the
programs behavior.
Metacharacter Matches
. Any character except newline
\n Newline
* Zero or more copies of preceding expression
+ One or more copies of preceding expression
? Zero or one copy of preceding expression
ˆ Beginning of line
$ End of line
a|b a or b
(ab)+ One or more copies of ab (grouping)
"a+b" Literal "a+b" (C escapes still work)
[ ] Character class
Example 2 Run the following program and check the output for the following inputs:
1. abbbaababaaab
2. ababaabbbaaab
7
%{
#include<stdio.h>
%}
%%
(ab)+ printf("1");
ab+ printf("2");
a+b printf("3");
\n {printf("\n"); return 0;}
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
Task 2 In the above program, change the operator + to * and observe the changes in the output
for both inputs.
Example 3 Run the following program and check the output for the following inputs:
1. begin
2. beginning
%{
#include <stdio.h>
%}
%%
begin printf("Compiler");
beginning printf("Compiler Design");
\n return 0;
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
Example 4 Run the following program and check the output for the following inputs:
1. begin
2. beginning
8
%{
#include <stdio.h>
%}
%%
begin printf("Compiler");
[a-z]+ printf("Compiler Design");
\n return 0;
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
Task 3 In examples 3 and 4, change the order of the rules and observe the changes in the
output for both inputs.
Some of the useful ex variables are given in the following table.
Name Function
int yylex(void) Call to invoke lexer, returns token
char *yytext Pointer to matched string
yyleng Length of matched string
yyval Value associated with token
int yywrap(void) Wrapup, return 1 if done, 0 if not done
FILE *yyout Output le
FILE *yyin Input le
INITIAL Initial start condition
BEGIN Condition switch start condition
ECHO Write matched string
Example 5 The following program illustrates the use of yytext and yyleng.
• yyleng stores the length of the lexeme that matches the rule (in other words, it denotes
the length of the lexeme stored in yytext.
9
%{
#include<stdio.h>
%}
%%
(ab)+ {printf("rule 1 lexeme: %s ",yytext); printf("length: %d\n",yyleng);}
ab+ {printf("rule 2 lexeme: %s ",yytext); printf("length: %d\n",yyleng);}
a+b {printf("rule 3 lexeme: %s ",yytext); printf("length: %d\n",yyleng);}
\n {printf("\n"); return 0;}
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
%{
#include<stdio.h>
%}
%%
(ab)+ {ECHO; printf(" length: %d\n",yyleng);}
ab+ {ECHO; printf(" length: %d\n",yyleng);}
a+b {ECHO; printf(" length: %d\n",yyleng);}
\n {printf("\n"); return 0;}
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
Task 4 Run the code for the same input strings in Figure 4 and observe the output.
10
Note 1 Flex adds a rule “. ECHO; ” at the bottom of all rules. Due to this, if
a character does not match any rule, it will be echoed onto the screen. You can
observe the same in Figure 3 (for the first input, several spaces are echoed before
“Output:.....”) and in Figure 4 (two b’s are echoed in the last line “bbrule 3 lexeme:
aaab length: 4”).
Example 7 The following program illustrates how to read an input from a text le.
• To read input from a text le (say, sample.txt), we need to set the ex variable yyin to
the le pointer of sample.txt or
yyin=fopen( "sample.txt", "r");
or
FILE *fp;
fp=fopen( "sample.txt", "r");
yyin=fp;
%{
# include <stdio.h>
%}
%%
(.|\n)* printf("%s", yytext);
%%
int main()
{
yyin=fopen( "sample.txt", "r");
yylex();
return 0;
}
Task 5 (a) Run the code in Example 7 and observe the output.
(b) Modify the code in Example 7 to count the number words (here, a word is collection of
english alphabets) and integers in the sample.txt le.
Example 8 The following program illustrates how to write the output into a text le.
• To write the output into a text le (say, output.txt), we need to set the ex variable
yyout to the le pointer of output.txt like yyout=fopen( "output.txt", "w");
11
%{
# include <stdio.h>
%}
%%
[a-zA-Z]+ fprintf(yyout, "%s", yytext);
[ \t] {ECHO;}
. { }
%%
int main()
{
yyin=fopen( "sample.txt", "r");
yyout=fopen( "output.txt", "w");
yylex();
return 0;
}
Task 6 Run the code in Example 8 and observe the output. Use the same sample.txt le
given above.
Example 9 Run the following program and observe the outputs for dierent inputs.
%{
#include<stdio.h>
%}
%%
aa { printf("1"); }
b?a+b? { printf("2"); }
b?a*b? { printf("3"); }
. { printf("4"); }
\n {return 0; }
%%
int main(){
printf("Enter the string:");
yylex();
return 0;
}
Task 7 Write a FLEX program to count the number of lines, words, and characters from a C
le.
12