Lex Material 1
Lex Material 1
• During the first phase the compiler reads the input and converts strings in
the source to tokens.
• Typically an action returns a token that represents the matched string for
subsequent use by the parser.
How Lex Works
• The RE to recognize identifiers is:
• Lex will read this pattern and produce C code for a lexical analyzer that
scans for identifiers.
• Using the next input character and current state the next state is easily
determined by indexing into a computer-generated state table.
Limitations of Lex
• What could be the biggest limitation??
• Two operators allowed in a character class are the hyphen (“-”) and circumflex
(“^”).
• The circumflex, when used as the first character, negates the expression. If two
patterns match the same string, the longest match wins. In case both matches
are the same length, then the first pattern listed is used.
Character Class
Pattern Matching
• If two patterns match the same string, the longest match wins.
• In case both matches are the same length, then the first
pattern listed is used.
How to write Lex Code?
• Input to Lex is divided into three sections with %% dividing the
sections.
First Lex Program
First Lex Program
• ECHO is a macro that writes code matched by the pattern. This is the default
action for any unmatched strings. Typically, ECHO is defined as:
main()
{
yyin = fopen(“abc.c”, “r”);
yyout = fopen(“def.txt”, “w”);
yylex();
fprintf(yyout,“%d \t %d \t %d\n”,nchar,nword,nline);
fclose(yyin);
fclose(yyout);
}
Programming Exercise
Ex5: Count the number of each type of functions you have identified in EX4.
EX6: Differentiate the user defined function with special functions provided with C.
(Your scanner should mention that a function is user defined or not)
Ex7: Write a lex program to check whether the given no. is prime or not.
Program to recognize datatype, ID, num, header file, pre-processor directives
%{ #include<stdio.h>
%}
main()
DIGIT [0-9]+
{
ID [a-z A-Z _][a-z A-Z 0-9]*
yyin = fopen("abc.c", "r");
DATATYPE (int|float|double|char)
yyout = fopen("def.txt", "w");
%%
yylex();
"//".* printf("Comment:%s", yytext);
fclose(yyin);
"#".{ID}* printf("Preprocessor Directive %s \n", yytext);
fclose(yyout);
"<".*".h>" printf("Header file %s\n", yytext);
}
{DIGIT}+ {printf("Integer:%s \n", yytext);}
{DATATYPE} {printf(“Datatype is %s\n”, yytext);}
{ID} {printf("Identifier %s\n",yytext);}
%%
/*lex code to determine whether input is an identifier or not*/
%{
#include <stdio.h>
%}
// rule section
%%
// regex for valid identifiers
^[a - z A - Z _][a - z A - Z 0 - 9 _] * printf("Valid Identifier");
// regex for invalid identifiers
^[^a - z A - Z _] printf("Invalid Identifier");
. ;
%%
void main()
{
yylex();
}
/* Lex Program to check valid Mobile Number */
%{
/* Definition section */
%}
/* Rule Section */
%%
%%
int main()
{
printf("\nEnter Mobile Number : ");
yylex();
printf("\n");
return 0;
}
/* Lex Program to accept string starting with vowel */
%{
int flag = 0;
%}
%%
[aeiouAEIOU].[a-zA-Z0-9.]+ flag=1;
[a-zA-Z0-9]+
%%
main()
{
yylex();
if (flag == 1)
printf("Accepted");
else
printf("Not Accepted");
}
/* Lex program to Identify and Count Positive and Negative Numbers */
%{
int positiveno = 0, negativeno = 0;
%}
/* Rules for identifying and counting positive and negative numbers*/
%%
^[-][0-9]+ {negativeno++; printf("negative number = %s\n",yytext);} // negative no
[0-9]+ {positiveno++; printf("positive number = %s\n", yytext);} // positive no
%%
int main( )
{ yylex();
printf ("no of positive numbers = %d," "no of negative numbers = %d\n", positiveno, negativeno);
return 0;
Note: yytext will match with the string datatype,
}
if it is number, then need to be converted into no using atoi function
/* Lex Program to check whether a number is Prime or Not */
// driver code
%{ /* Definition section */
int main()
#include<stdio.h>
{
#include<stdlib.h>
yylex();
int flag,c,j;
return 0;
%}
}
/* Rule Section */
%%
[0-9]+ {c=atoi(yytext);
if(c==2)
{ printf("\n Prime number"); }
else if(c==0 || c==1)
{ printf("\n Not a Prime number"); }
else
{ for(j=2;j<c;j++)
{ if(c%j==0) flag=1; }
if(flag==1)
printf("\n Not a prime number");
else if(flag==0)
printf("\n Prime number");
}}
%%
Tips
• Use Ctrl D which stops reading the input and see the output
• Use [] to select the symbol
• Use ^ for the beginning of the line
• Use [^a-z] to take the complement of symbols or range
• Use proper regular expression and print the results for valid input
• Also print the message for invalid input
• Use priorities while listing the regular expression
• Longest match is considered
• Use flags, to print valid match.
• Use . Operator to accept any character other than new line
• Use yywrap() which will return 1 if done otherwise 0
• Use fixed ranges {1,3} or (3} where ever there is a limit on the closure value
• Don’t give unnecessary spaces in the regular expression