Lecture 07 PDF
Lecture 07 PDF
Compiler Construction
CS-4207
Lecture – 07
First of all we need to write an input file, lex.l , written in Lex language. The lex compiler
transforms lex.l file into a C file that is always named as lex.yy.c . this file is latter compiled
by the c compiler into a file called a.out, as always. This file is working lexical analyzer that
receives stream of input characters and produces a stream of token.
Atif Ishaq - Lecturer GC University, Lahore
"<="{return(TOK_LE);}
">=" {return(TOK_GE);}
"==" {return(TOK_EQ);}
"!=" {return(TOK_NE);}
{D}+ {return(TOK_INT);}
{id} {return(TOK_ID);}
[\n]|[\t]|[ ] ;
%%
The file lex.l includes another file named “tokdefs.h”. The contents of the tokdefs.h file are
#define TOK_VOID 1
#define TOK_INT 2
#define TOK_IF 3
#define TOK_ELSE 4
#define TOK_WHILE 5
#define TOK_LE 6
#define TOK_GE 7
#define TOK_EQ 8
#define TOK_NE 9
#define TOK_INT 10
#define TOK_ID 111
Flex creates C++ classes that implements the lexical analyzer. The code for these classes is placed
in the Flex’s output file. Below is the code that needed to invoked the scanner, this is placed in the
main.cpp
void main()
{
FlexLexer lex;
int tc = lex.yylex(); while(tc
!= 0) {
cout << tc << “,” <<lex.YYText() << endl; tc =
lex.yylex();
}
}
Atif Ishaq - Lecturer GC University, Lahore
The following commands can be used to generate a scanner executable file in windows.
flex lex.l g++ –c
lex.cpp
g++ –c main.cpp
g++ –o lex.exe lex.o main.o
The output of the scanner when executed and given the file main.cpp as input, the scanner is being
asked to provide tokens found in the file main.cpp
259,void
258,main
283,(
284,)
285,{
258,FlexLexer
258,lex
290,;
260,int
258,tc
266,=
258,lex
291,.
258,yylex
283,(
284,)
290,;
263,while
283,(
258,tc
276,!=
257,0
Atif Ishaq - Lecturer GC University, Lahore
284,)
258,cout
279,<<
258,tc
279,<<
292,","
279,<<
258,lex
291,.
258,YYText
283,(
284,)
279,<<
258,endl
290,;
258,tc
266,=
258,lex
291,.
258,yylex
283,(
284,)
290,;
286,}
Here is another code for reference. The declaration section includes a pair of special brackets %{
and }% . Anything within these brackets is directly copied to the file lex.yy.c, and is not
treated as regular definition. It is common place there the definitions of the manifest constant,
using C++ #define can be incorporated. In our second example code some of the manifest constants
are LE, GT and so on are defined in comments without proper definition. We can also find the
sequence of regular definition in declaration section after the closing of manifest constant section.
Atif Ishaq - Lecturer GC University, Lahore
Regular definition (we have already discussed in previous lectures) that are used in later definition
or in the pattern of translation rues are surrounded by curly braces. The ‘delim’ is defined to be
shorthand for character class consisting of newline, tab and space.
In the definition of id and number the curly braces are used as grouping and do not stand for
themselves. If the symbol + , * , . or ? or parenthesis are to be used itself then they must be proceed
with a backslash. We can see \. In the definition of number.
As the white space is not returning any token. So if a white space is encountered in our code no
token will return to the parser and but look for another lexeme. For the keywords the regular
expression is the keyword itself. If a keywords also matches an identifier the lexical analyzer has
to decide to whichever is listed first.
%{
/*definition of manifest constants
LT , LE , EQ , NE , GT , GE , IF, THEN , ELSE , ID , NUMBER , RELOP */
%}
%%
“<=”
Atif Ishaq - Lecturer GC University, Lahore
%%
yyin :- the input stream pointer (i.e it points to an input file which is to be scanned or tokenised),
however the default input of default main() is stdin .
yylex() :- implies the main entry point for lex, reads the input stream generates tokens, returns zero
at the end of input stream . It is called to invoke the lexer (or scanner) and each time yylex() is
called, the scanner continues processing the input from where it last left off.
yytext :- a buffer that holds the input characters that actually match the pattern (i.e lexeme) or say
a pointer to the matched string .
yyleng :- the length of the lexeme .
Lex automatically reads one character ahead of the last character that forms the selected lexeme,
and then retracts the input so the lexeme itself is consumed from the input.