0% found this document useful (0 votes)
12 views37 pages

Lex Material 1

Uploaded by

sampath reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Lex Material 1

Uploaded by

sampath reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lex

• During the first phase the compiler reads the input and converts strings in
the source to tokens.

• With regular expressions we can specify patterns to lex so it can generate


code that will allow it to scan and match strings in the input.

• Each pattern specified in the input to lex has an associated action.

• Typically an action returns a token that represents the matched string for
subsequent use by the parser.
How Lex Works
• The RE to recognize identifiers is:

• Lex will read this pattern and produce C code for a lexical analyzer that
scans for identifiers.

• The corresponding FSM is:

• Any FSM Can be converted as a computer program.


FSM to Program Conversion

• The corresponding Program can be:


Lex Technique
• Regular expressions are translated by lex to a computer program that
mimics an FSA.

• Using the next input character and current state the next state is easily
determined by indexing into a computer-generated state table.
Limitations of Lex
• What could be the biggest limitation??

• What is the limitations of FAs?

• lex cannot be used to recognize nested structures such as


parentheses.

• Nested structures are handled by incorporating a stack. Whenever


we encounter a “(” we push it on the stack. When a “)” is
encountered we match it with the top of the stack and pop the
stack.
Limitations of Lex
• Yacc augments an FSA with a stack and can process constructs
such as parentheses with ease.

• Lex is good at pattern matching.

• Yacc is appropriate for more challenging tasks.


Special Characters
Operators
Character Class
• A character class defines a single character and normal operators lose their
meaning.

• Two operators allowed in a character class are the hyphen (“-”) and circumflex
(“^”).

• When used between two characters the hyphen represents a range of


characters.

• The circumflex, when used as the first character, negates the expression. If two
patterns match the same string, the longest match wins. In case both matches
are the same length, then the first pattern listed is used.
Character Class
Pattern Matching
• If two patterns match the same string, the longest match wins.

• In case both matches are the same length, then the first
pattern listed is used.
How to write Lex Code?
• Input to Lex is divided into three sections with %% dividing the
sections.
First Lex Program
First Lex Program
• ECHO is a macro that writes code matched by the pattern. This is the default
action for any unmatched strings. Typically, ECHO is defined as:

• Variable yytext is a pointer to the matched string (NULL-terminated) and yyleng


is the length of the matched string.

• Variable yyout is the output file and defaults to stdout.


• yylex that is the main entry-point for lex.

• Function yywrap is called by lex when input is exhausted.


Return 1 if you are done or 0 if more processing is required.
Lex Predefined Variables
Lex Definition Section
• The definitions section is composed of substitutions, code, and
start states. Code in the definitions section is simply copied as-is to
the top of the generated C file and must be bracketed with “%{“
and “%}” markers.

• Substitutions simplify pattern-matching rules. For example, we may


define digits and letters:
digit [0-9]
letter [A-Za-z]
Commands
• Set the file with extension .l or .lex
– abc.l or abc.lex
• Compile Lex file with lex abc.l or flex abc.lex
– It will create lex.yy.c

• Now we have a c file in hand so:


– cc lex.yy.c –ll (or –lfl)
– -ll for linking lex library
• Execution: ./a.out
Example2: Lex Program
(Count the number of identifiers)
Example3: Lex Program
(Count the number of characters, words and lines)
How to pass a File to the Lex Scanner Generator

main()
{
yyin = fopen(“abc.c”, “r”);
yyout = fopen(“def.txt”, “w”);
yylex();
fprintf(yyout,“%d \t %d \t %d\n”,nchar,nword,nline);
fclose(yyin);
fclose(yyout);
}
Programming Exercise

Ex1. Count the number of words in a given text file.

Ex2:Count the number of consonants and Vowels in a text file.


Count of number of lines, words, characters in a file
%{ int main()
#include<stdio.h> {
int w=0,c=0,l=0; yyin=fopen(“abc.txt","r");
%} yylex();
%% printf(“c=%d w=%d l=%d”,c,w,l);
\n { l++; }
[^ \t\n]+ { w++; fclose(yyin);
c=c+yyleng;} return 0;
}
. {c++;}
%%
Sample example to print into file

fp = fopen( "out_file.txt", "w" );


// Open file for writing
fprintf(fp, "x = %f, y = %f, vx = %f, vy = %f", x,y,vx,vy);
Lex program for C Comment statements
%{ #include<stdio.h>
int c=0,m=0;
%}
%%
“//”[^\n]* {c++; printf(“Single line comment\n”);}
“/*”[^*|*^/]+”*/” {m++; printf(“Multiline comment\n”);}
%%
int main()
{yylex();
printf(“c=%d m=%d”,c,m);
return 0;}
Multiline Comment

“/*”([^*]|“*”+ [^/] )+ “*/”


Fixed size variables for lex
%{ #include<stdio.h>
%}
key [while|int]
%%
{key}* {printf(“Keywords\n”);}
[a-zA-z_][a-zA-Z_0-9]{1,3} {printf(“variables\n”);}
%%
int main()
{yylex();
return 0;}
Programming Exercise

Ex3. Write a Lex program to identify all possible token in a given


program input. For Example:
Header file
Keywords
Relational Operators
Single Line Comment
Multi-line comments
Identifiers
Preprocessor Directive
Data Types
Digits
Programming Exercise
Ex4. Write a Lex program to identify :
Functions without argument.
Functions with one argument.
Functions with n arguments.
Note: Function declaration and function call both should be treated as functions.
Kind of function Declaration:
int add(int a, int b, int c)
add(int a, int b, int c)
add(int, int, int)
Int add(int, int,int)
Programming Exercise
Ex4. Write a Lex program to identify :
Functions without argument.
Functions with one argument.
Functions with n arguments.

Ex5: Count the number of each type of functions you have identified in EX4.

EX6: Differentiate the user defined function with special functions provided with C.
(Your scanner should mention that a function is user defined or not)

Ex7: Write a lex program to check whether the given no. is prime or not.
Program to recognize datatype, ID, num, header file, pre-processor directives
%{ #include<stdio.h>
%}
main()
DIGIT [0-9]+
{
ID [a-z A-Z _][a-z A-Z 0-9]*
yyin = fopen("abc.c", "r");
DATATYPE (int|float|double|char)
yyout = fopen("def.txt", "w");
%%
yylex();
"//".* printf("Comment:%s", yytext);
fclose(yyin);
"#".{ID}* printf("Preprocessor Directive %s \n", yytext);
fclose(yyout);
"<".*".h>" printf("Header file %s\n", yytext);
}
{DIGIT}+ {printf("Integer:%s \n", yytext);}
{DATATYPE} {printf(“Datatype is %s\n”, yytext);}
{ID} {printf("Identifier %s\n",yytext);}
%%
/*lex code to determine whether input is an identifier or not*/
%{
#include <stdio.h>
%}
// rule section
%%
// regex for valid identifiers
^[a - z A - Z _][a - z A - Z 0 - 9 _] * printf("Valid Identifier");
// regex for invalid identifiers
^[^a - z A - Z _] printf("Invalid Identifier");
. ;
%%

void main()
{
yylex();
}
/* Lex Program to check valid Mobile Number */
%{
/* Definition section */
%}

/* Rule Section */
%%

[1-9][0-9]{9} {printf("\nMobile Number Valid\n");}

.+ {printf("\nMobile Number Invalid\n");}

%%

int main()
{
printf("\nEnter Mobile Number : ");
yylex();
printf("\n");
return 0;
}
/* Lex Program to accept string starting with vowel */
%{
int flag = 0;
%}
%%
[aeiouAEIOU].[a-zA-Z0-9.]+ flag=1;
[a-zA-Z0-9]+

%%
main()
{
yylex();
if (flag == 1)
printf("Accepted");
else
printf("Not Accepted");
}
/* Lex program to Identify and Count Positive and Negative Numbers */
%{
int positiveno = 0, negativeno = 0;
%}
/* Rules for identifying and counting positive and negative numbers*/
%%
^[-][0-9]+ {negativeno++; printf("negative number = %s\n",yytext);} // negative no
[0-9]+ {positiveno++; printf("positive number = %s\n", yytext);} // positive no
%%
int main( )
{ yylex();
printf ("no of positive numbers = %d," "no of negative numbers = %d\n", positiveno, negativeno);
return 0;
Note: yytext will match with the string datatype,
}
if it is number, then need to be converted into no using atoi function
/* Lex Program to check whether a number is Prime or Not */
// driver code
%{ /* Definition section */
int main()
#include<stdio.h>
{
#include<stdlib.h>
yylex();
int flag,c,j;
return 0;
%}
}
/* Rule Section */
%%
[0-9]+ {c=atoi(yytext);
if(c==2)
{ printf("\n Prime number"); }
else if(c==0 || c==1)
{ printf("\n Not a Prime number"); }
else
{ for(j=2;j<c;j++)
{ if(c%j==0) flag=1; }
if(flag==1)
printf("\n Not a prime number");
else if(flag==0)
printf("\n Prime number");
}}
%%
Tips
• Use Ctrl D which stops reading the input and see the output
• Use [] to select the symbol
• Use ^ for the beginning of the line
• Use [^a-z] to take the complement of symbols or range
• Use proper regular expression and print the results for valid input
• Also print the message for invalid input
• Use priorities while listing the regular expression
• Longest match is considered
• Use flags, to print valid match.
• Use . Operator to accept any character other than new line
• Use yywrap() which will return 1 if done otherwise 0
• Use fixed ranges {1,3} or (3} where ever there is a limit on the closure value
• Don’t give unnecessary spaces in the regular expression

You might also like