0% found this document useful (0 votes)
12 views7 pages

Lex Tool

Lex is a tool that generates C code for a lexical analyzer from a specification file containing regular expressions. A Lex program is structured into three sections: declarations, translation rules, and auxiliary procedures, with each section separated by %% delimiters. The generated C code includes functions for scanning input, managing tokens, and handling errors, allowing for efficient pattern matching and lexical analysis.

Uploaded by

jr.musicclub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Lex Tool

Lex is a tool that generates C code for a lexical analyzer from a specification file containing regular expressions. A Lex program is structured into three sections: declarations, translation rules, and auxiliary procedures, with each section separated by %% delimiters. The generated C code includes functions for scanning input, managing tokens, and handling errors, allowing for efficient pattern matching and lexical analysis.

Uploaded by

jr.musicclub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lex Tool:

Lex is a tool that reads a specification file (typically with the .l extension), containing
regular expressions, and generates C code that implements a lexical analyzer. The
lexical analyzer scans the input text, recognizing patterns and converting it into a
sequence of tokens.
A Lex program consists of three parts and is separated by %% delimiters:-
Declarations
%%
Translation rules
%%
Auxiliary procedures
Definitions Section: You can define macros, global variables, or regular expressions.
This section contains user-defined macros, regular expressions, and includes any
necessary header files.
It’s where you define constants, include libraries, or define regular expressions that will
be reused in the rules section.
Rules Section: This is where you write regular expressions and the actions for each
matched pattern.
The rules section defines regular expressions and the actions associated with them.
Each rule consists of a regular expression pattern and an action.
The pattern specifies the strings to be matched, and the action specifies the operation
to perform when a pattern is matched. Actions are usually written in C code.
The rules section is placed between two %% delimiters.
User Code Section: C code that is included in the generated lexical analyzer, typically
for initialization or cleanup.
The user code section is where you can add C code that needs to be included in the
generated C file.
This section typically includes the main() function and any other necessary initialization
or cleanup functions. The function yylex() (generated by Lex) is called in this section to
start the lexical analysis.
The user code section is placed after the second %% delimiter.
Explanation of the Example:
● %{ ... %}: Code enclosed in this section is copied directly into the generated C
file. Here, we include the stdio.hheader.
● %%: Delimiters that separate the different sections of the Lex file.
● Pattern-Action Pairs:
○ [0-9]+: Matches one or more digits. When matched, it prints the number.
○ [ \t\n]+: Matches whitespace and ignores it.
○ [a-zA-Z]+: Matches identifiers (alphabetic strings).
○ "+" and "=": Match the + operator and = assignment, respectively,
printing the corresponding output.
● int main(): The main function calls yylex(), which is the generated function to
start scanning the input.
Processing of the Lex File
Once you write the Lex specification file, you can feed it into the Lex tool. The Lex tool
then processes the file and performs the following steps:
1. Lexical Analysis:
○ Lex reads the specification file and generates a C source file (lex.yy.c) that
implements a lexical analyzer.
○ The Lex tool internally creates a finite automaton for the regular
expressions and embeds this automaton in the generated C code.
2. Finite Automaton (DFA/NFA):
○ NFA Construction: Internally, Lex converts each regular expression into a
Non-deterministic Finite Automaton (NFA).
○ DFA Construction: Then, it converts the NFA into a Deterministic Finite
Automaton (DFA), which is used for efficient pattern matching. Lex
optimizes the DFA to improve scanning performance.
3. Generating C Code:
○ Lex generates a C source file (lex.yy.c) that contains the code for the
lexical analyzer.
○ This file includes:
■ yylex(): A function that performs the scanning of the input stream
and matches patterns against the defined regular expressions.
■ yytext: A global variable that holds the current text matched by a
pattern.
■ yyin and yyout: Input and output files, respectively, which are used
by Lex for reading and writing data.
■ State Machine: The core of the generated code, which implements
the DFA/NFA.
4. Compiling the Generated Code:
○ After generating the lex.yy.c file, the next step is to compile the C file into
an object code. This can be done using a C compiler.
Example Command:
lex example.l # Generate lex.yy.c
gcc lex.yy.c -o lexer -lfl # Compile and link with the Flex library
5. Running the Lexer:
○ Once compiled, the generated lexical analyzer can be executed, which
reads the input, matches the patterns, and executes the associated
actions (e.g., printing tokens, recognizing keywords, etc.).
Example Command:
./lexer < input.txt # Runs the lexer on input.txt
Key Predefined Variables in Lex
1. Yytext
● Purpose: Contains the text of the current token matched by the regular
expression.
● Type: char* (a pointer to a string).
● Usage:
○ You can access yytext in the action part of a rule to refer to the
string that was matched.
○ It is automatically updated after each successful pattern match.
Example: [a-zA-Z]+ { printf("Identifier: %s\n", yytext); }
2. Yyleng
● Purpose: Stores the length of the string in yytext.
● Type: int.
● Usage:
○ Indicates the number of characters matched by the regular
expression.
○ Useful for validating the length of tokens.
Example: [a-zA-Z]+ { printf("Identifier (%d characters): %s\n", yyleng, yytext); }
3. Yylineno
● Purpose: Tracks the current line number in the input file being scanned.
● Type: int.
● Usage:
○ Helps in error reporting and debugging by providing the line number
where a token is found.
○ You need to enable line tracking by defining #define YYLMAX or
including -lfl (in Flex).
Example: { printf("Unrecognized character '%s' at line %d\n", yytext, yylineno); }
4. Yyin
● Purpose: Points to the input file being scanned.
● Type: FILE*.
● Default Value: stdin.
● Usage:
○ By default, Lex reads from the standard input. You can assign yyin
to another file pointer to scan from a file.
Example: yyin = fopen("input.txt", "r"); // Redirect input to "input.txt"
5. Yyout
● Purpose: Points to the output file for token processing.
● Type: FILE*.
● Default Value: stdout.
● Usage:
○ You can redirect output to a specific file by changing the value of
yyout.
Example: yyout = fopen("output.txt", "w"); // Redirect output to "output.txt"
6. yywrap()
● Purpose: Determines what happens when the end of the input is reached.
● Type: Function.
● Default Behavior: Returns 1, signaling the end of input.
● Usage:
○ If you want to provide additional input after reaching the end of a
file, you can override yywrap() to return 0 and continue scanning.
Example: int yywrap() {
return 1; // Indicate end of input
}
7. YYSTATE
● Purpose: Represents the current state of the scanner.
● Type: int.
● Usage:
○ Used in conjunction with start conditions to manage lexical analysis
in different contexts.
○ You can explicitly set or check the scanner's state.
Example:
%x COMMENT
%%
"/*" { BEGIN(COMMENT); }
<COMMENT>"*/" { BEGIN(INITIAL); }
8. YY_START
● Purpose: Indicates the starting condition for the scanner.
● Type: Constant.
● Usage:
○ Represents the current state in terms of start conditions.
○ Can be used in actions to check or set the scanner’s state.
Example:
%x STRING
%%
"\"" { printf("Entering STRING mode\n"); BEGIN(STRING); }
<STRING>. { printf("STRING content: %s\n", yytext); }
<STRING>"\"" { printf("Exiting STRING mode\n"); BEGIN(INITIAL); }
9. YY_USER_ACTION
● Purpose: Allows you to insert user-defined code that will execute before
any action for a matched token.
● Type: Macro.
● Usage:
○ Can be used for debugging or tracking purposes.
○ Typically defined in the definitions section.
Example:
%{
#define YY_USER_ACTION printf("Matched token: %s\n", yytext);
%}
%%
[a-zA-Z]+ { /* Your action here */ }
10. YY_FATAL_ERROR()
● Purpose: Handles fatal errors during lexical analysis.
● Type: Function/Macro.
● Usage:
○ You can override this function to customize error handling in case of
scanner failures.
Example:
void YY_FATAL_ERROR(const char* msg) {
fprintf(stderr, "Fatal Error: %s\n", msg);
exit(1);
}

Summary of Predefined Variables

Variable Purpose

yytext Holds the matched token text.

yyleng Length of the matched token.

yylineno Current line number (useful for debugging).

yyin Input file pointer (default is stdin).

yyout Output file pointer (default is stdout).

yywrap() Determines what happens at the end of input.

YYSTATE Current scanner state (for start conditions).

YY_START Starting condition (useful for context switching).

YY_USER_ACTION Executes user-defined code before each action.

YY_FATAL_ERROR Handles errors in the scanner.


Library Routines:

Routine Purpose

yylex() Main scanning function.

yywrap() Handles end-of-input conditions.

yyrestart() Resets the scanner for a new input file.

yy_flush_buffer() Clears the internal buffer.

yy_create_buffer() Creates a buffer for a new input source.

yy_switch_to_buffer() Switches to a specified buffer.

yytext Holds the current matched token.

yyleng Length of the matched token.

yyin Input file pointer.

yyout Output file pointer.

YY_FATAL_ERROR() Handles fatal errors.

yymore() Appends to the current token.

yyless() Pushes part of the token back to the input


stream.

You might also like