0% found this document useful (0 votes)
3 views

Compiler Design Assignment Write Specification of LEX/FLEX Program.

The LEX/FLEX program generates a lexical analyzer that reads input text to identify tokens using regular expressions and perform actions based on those tokens. It includes features for tokenization, error handling, and customizable output formats. The program structure consists of definitions, rules, and user code sections, and it is designed for efficient processing of input data.

Uploaded by

sbackups999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Compiler Design Assignment Write Specification of LEX/FLEX Program.

The LEX/FLEX program generates a lexical analyzer that reads input text to identify tokens using regular expressions and perform actions based on those tokens. It includes features for tokenization, error handling, and customizable output formats. The program structure consists of definitions, rules, and user code sections, and it is designed for efficient processing of input data.

Uploaded by

sbackups999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Suyash Sunil Dongre

Roll no:87

Class: TY CS-B

PRN: 12320236
Assignment no:1

1. Write Specification of LEX/FLEX Program.

1. Purpose:

The purpose of this LEX/FLEX program is to generate a lexical analyzer (scanner) that reads
input text and identifies tokens using regular expressions, then processes those tokens based on
defined actions. This can be used for applications such as text processing, building compilers,
or analysing structured input data.

2. Input:

• The program reads a stream of characters from standard input or a file.

• The input consists of text that may contain words, numbers, punctuation, whitespace,
or other symbols.

3. Output:

• The program outputs the recognized tokens and associated actions.

• If unrecognized characters are found, the program outputs an error message.

• The output format for recognized tokens is customizable (e.g., printing the token,
storing it, or processing it further).

4. Functional Requirements:
1. Tokenization:

o Recognizes various tokens (e.g., numbers, identifiers, operators) based on


predefined regular expressions.

o Tokens are categorized as types such as keywords, identifiers, numbers,


operators, etc.
2. Actions:
o For each token recognized, execute an action. Common actions include:
▪ Printing the matched token.

▪ Storing the token for later use.

▪ Counting occurrences of specific tokens.

3. Error Handling:
o The program must handle invalid input by printing an error message for
unrecognized characters.

4. End of Input:
o The program should recognize the end of input and stop processing.

5. Whitespace Handling:

o Whitespace characters (spaces, tabs, newlines) can either be ignored or


recognized depending on the requirements.

5. Regular Expressions and Tokens:

The regular expressions define the patterns for recognizing various tokens in the input. Some
common token types include:

• Identifiers (e.g., variable names):

o Regular Expression: [A-Za-z_][A-Za-z0-9_]*

• Integer Numbers:

o Regular Expression: [0-9]+


• Floating-point Numbers:

o Regular Expression: [0-9]+\.[0-9]+

• Operators (e.g., +, -, *):

o Regular Expression: \+|\-|\*|\/

• Whitespace (spaces, tabs, newlines):

o Regular Expression: [ \t\n]+ (can be ignored or processed)

• Comments (for languages with comments):


o Regular Expression (multi-line comment): /\*[^*]*\*+([^/*][^*]*\*+)*/

• Errors (any unrecognized character):

o Regular Expression: . (matches any character)


6. Main Program Structure:

A typical LEX/FLEX program consists of the following sections:

1. Definitions Section (%{ ... %}):

o Includes necessary header files and definitions.


Example:

%{

#include <stdio.h>

%}

2. Rules Section (%% ... %%):

o Defines the regular expressions and the associated actions.


Example:
[0-9]+ { printf("Number: %s\n", yytext); }

[A-Za-z]+ { printf("Identifier: %s\n", yytext); }

. { printf("Error: Unknown character %s\n", yytext); }

3. User Code Section:

o Contains the main function and any necessary supporting functions.

o The main() function typically calls the yylex() function to start the lexical analysis
process.

Example:
int main() {

yylex(); // Begin lexical analysis

return 0;

7. Error Handling:

If no pattern matches, an error message is printed.


{ printf("Error: Unrecognized character %s\n", yytext); }
8. Example of a Simple LEX/FLEX Program:

%{

#include <stdio.h>
%}

%%

[0-9]+ { printf("Number: %s\n", yytext); }

[A-Za-z]+ { printf("Word: %s\n", yytext); }

. { printf("Error: Unknown character %s\n", yytext); }

%%

int main() {
yylex(); // Start lexical analysis
return 0;

9. Compilation and Execution:

1. Generate C Code: Use flex to generate the C source code (lex.yy.c).

o Command: flex program.l

2. Compile C Code: Use a C compiler to compile the generated C file.


o Command: gcc lex.yy.c -o program

3. Run the Program: Execute the compiled program to perform lexical analysis on the
input.

o Command: ./program < input.txt

10. Performance Considerations:

• Efficiency: Flex uses a finite state machine (FSM) to process regular expressions, which
is efficient for many types of input. Optimizing regular expressions can improve
performance for larger inputs.

• Memory Management: Ensure proper handling of large inputs and consider using buffer
management to process large files efficiently.
Conclusion:

This LEX/FLEX program is designed to tokenize input text based on regular expressions,
perform specified actions (like printing or counting tokens), and handle errors gracefully. It is
an essential tool for text processing tasks such as lexical analysis, compiler design, or building
search engines.

You might also like