0% found this document useful (0 votes)
17 views22 pages

Lab 2

The document discusses using Flex to perform lexical analysis. It describes what Flex is and how it works, including how it uses regular expressions to identify tokens in input and generates C code. It also explains the structure of Flex specification files and some important Flex concepts and functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Lab 2

The document discusses using Flex to perform lexical analysis. It describes what Flex is and how it works, including how it uses regular expressions to identify tokens in input and generates C code. It also explains the structure of Flex specification files and some important Flex concepts and functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Compiler Design LAB

Introduction To Flex
Lesson objective
At the end of the lesson student will able to:
 be familiar with Lexical analysis using Flex
and the process how to create tokens
Flex and lexical analysis
 From the area of compilers, we get a host of tools
to convert text files into programs. The first part of
that process is often called lexical analysis,
particularly for such languages as C / C++…etc.

 A good tool for creating lexical analysis is flex or lex.

 It takes a file.l specification file and creates an


analyzer, usually called lex.yy.c then using gcc/g++
….etc we create an application that can create
tokens
Lexical analysis terms
 A token is a group of characters having collective
meaning.
 A lexeme is an actual character sequence forming a
specific instance of a token, such as num.
 A pattern is a rule expressed as a regular
expression and describing how a particular token
can be formed. For example,
[A-Za-z][A-Za-z_0-9]* is a rule.
 Note : Characters between tokens are called
whitespace; these include spaces, tabs, newlines.
Tools for lexical analysis
Use a lexical analyzer generator tool, such as lex /
flex.
lex= lexical analyzer generator ( in Unix OS)
flex= fast lexical analyzer generator ( In windows)
flex takes your specification code(RE) and
generates a combined NFA to recognize all your
patterns, converts it to an equivalent DFA,
minimizes the automaton as much as possible, and
generates C code that will implement it.
How it process ? It uses some command lines
flex name.l produce lex .yy.c
gcc/g++ lex .yy.c produce a.exe
gcc/g++ lex.yy.c -o token produce token.exe
flex source
program flex lex.yy.c
with . l
lex.yy.c C++ compiler a.exe

input a.exe tokens

6
Flex file format
Definition section
%%
Rule section
%%
auxiliary procedures
flex input files are structured as follows( flex
specifications)
 The flex input file consists of three sections
separated by a line with just %%
%{
declarations
%}
regular definitions
%%
translation rules
%%
auxiliary procedures (user subroutines)
Definitions is structured as follows:
Declarations of ordinary variables ,constants,
%{  Include header files …
Declaration of some global variables
Declarations
%}
%{
%option directive /* This is a comment inside the
definition*/
Regular definition #include <math.h> //may need headers
#include <iostream> // for cout
%}

Definitions that can be used in rules section


syntax: name definition
Example: DIGIT [0-9]
ID [a-z][a-z0-9]*
flex Rules (Translation Rules Section)
The form of rules are: The actions are C/C++ code
P1 action1 If it takes more than one line,
enclose with braces {action }
P2 action2
...
In specifying patterns, flex supports a
Pn actionn fairly rich set of conveniences(REs)
(character classes, repetition, etc.)

where Pi are regular expressions pattern and


actioni are C/C++ program segments (actions)
Example rules:

[a-z]+ cout<<"found word:" << yytext <<"\n";


[A-Z][a-z]* { cout<<"found capitalized word:";
cout<< yytext <<"\n";
}
Rules: Most modern lexical- analyzer
generators follow 3 rules
• Look for the longest token
The longest initial substring that can match any
regular expression is taken as the next token.
• Rule priority: Look for the first-listed pattern
that matches the longest token
– In keywords and identifiers, keywords must be
written first , then identifiers
• List frequently occurring patterns first
– white space
12
User Subroutines Section
• You can use your flex routines in the same ways you
use routines in other programming languages.

int main()
{
yylex();
}

13
Example
[ \t\n] { /* no action and no return */ }
if {cout<”keyword found”;}
else {cout<”keyword found”;}
[A_Za-z_][A-Za-z0-9_]+ {cout<<“”ID found”;}
[0-9]+ {cout<<”integer found”;}
“<=” {cout<<“relop found”;}
“==” {cout<<”relop found”;}
...
%%

14
option directive
1. Maintaining Line Number :
Flex allows to maintain the number of the current line in
the global variable yylineno using the following option
mechanism
%option yylineno
2. Removes the call to the routine yywrap()
%option noyywrap
- is called whenever flex reaches an end-of-file(eof)
- It is an option not include yywrap()
- indicating this is the end of the file or no more file content

Note: we write this in the first section


15
Some flex Predefined Variables
• yytext -- a string containing the lexeme
• yyleng -- the length of the lexeme
. Etc…..
• E.g.
[a-z]+ cout<<yytext;
[a-zA-Z]+{words++; chars += yyleng;}

16
flex Library Routines
• yylex()
– The default main() contains a call of yylex()
• yywarp()
– is called whenever flex reaches an end-of-file(eof)
– The default yywarp() always returns 1
• yymore()
– return the next token
• yyless(n)
– retain the first n characters in yytext

17
yylex()
• Most programs with flex scanners use the
scanner to return a stream of tokens that are
handled by a parser
• Each time the program needs a token, it calls
yylex(), which reads a little input and returns the
token
yywrap()
• Used to continue reading from another file
• It is called at EOF
• U can then open another file and return 0 or
• U can return 1, indicating this is the end or
• U can use an option not include it
Reading from a file
• Flex reads its input from a global pointer to a C
FILE variable called yyin
• yyin is set to STDIN by default
• So all we have to do is set that pointer to our file
handle
-FILE*myfile=fopen(“filename","r");
-If(! myfile) {…}//cout<<"Error opening file"<<endl;
// return -1
- yyin=myfile;

- yylex();
How the input is matched

• When the generated scanner is run, it analyzes


its input looking for strings which match any of
its patterns
• The text corresponding to the match is made
available in the global character pointer yytext
and its length in the global integer yyleng
• The action corresponding to the matched
pattern is then executed and then the
remaining input is scanned for another match
How the input is matched…
• yytext can be defined in two different ways
• U can control which definition flex uses by including
‘%pointer’ or ‘%array’ directives in the control section of
the flex program
– As a character pointer
• Faster scanning and no buffer overflow when matching very large tokens
• Calls to the unput() function destroys the present contents of yytext
– As a character array
• Size of YYLMAX but u can modify it by #define YYLMAX
• calls to the unput() do not destroy yytext

You might also like