Compiler Project Abstract
Compiler Project Abstract
OBJECTIVES
1. To read C source code, break it into individual components (tokens), and categorize them
into predefined token types (keywords, operators, numbers, etc.).
2. To ignore unnecessary elements like comments and whitespace.
3. To recognize and print unrecognized characters that do not fit into valid token categories.
4. To support the user by reporting basic token information in a human-readable format, such
as token type and value.
FEATURES
- Keyword Recognition: The analyzer identifies common C keywords such as `int`, `if`, `else`,
`while`, `for`, etc.
- Identifier Matching: It detects valid identifiers based on C language rules, including variable
names and function names.
- Number Detection: It processes integer, floating-point, octal, and hexadecimal literals in
C programs.
- String and Character Literals: The tool correctly identifies and classifies string and
character literals.
- Operators and Punctuation: Recognizes a variety of operators like `+`, `-`, `*`, `/`, relational
operators (`==`, `!=`, `<`, `>`), and punctuation marks like `{`, `}`, `;`, etc.
- Comment Ignoring: Single-line (`//`) and multi-line comments (`/* */`) are ignored during
tokenization.
- Error Handling: The analyzer prints a message when encountering any unrecognized or
invalid characters.
- File Handling: Users can provide a source file as input, allowing the analyzer to process
entire C programs.
METHEDOLOGY
The project uses Flex, a powerful tool for lexical analysis, to define rules for recognizing
different types of tokens. The Flex rules for token categories (keywords, operators, identifiers,
etc.) are implemented using regular expressions. The lexical analyzer is built to read a file
containing C code, break it down into tokens, and print the token type and value to the console.
CONCLUSION
This project successfully implements a basic Lexical Analyzer for C programs, allowing users to
visualize the structure of a C source file by identifying key tokens. By recognizing keywords,
identifiers, operators, literals, and more, the analyzer provides a foundational step toward further
compiler development. It offers essential features like error detection and report generation for
basic C programs.
In future work, the lexical analyzer could be extended to handle more complex C language
features, such as preprocessor directives and macros. Additionally, it could be integrated with a
parser to continue the syntactic and semantic analysis phases, bringing it closer to a full-fledged
compiler.
FUTURE ENHANCEMENTS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%option noyywrap
%%
If { print_token(“KEYWORD”, “if”); }
Else { print_token(“KEYWORD”, “else”); }
While { print_token(“KEYWORD”, “while”); }
For { print_token(“KEYWORD”, “for”); }
Return { print_token(“KEYWORD”, “return”);
} Break { print_token(“KEYWORD”, “break”); }
Continue { print_token(“KEYWORD”, “continue”); }
Int { print_token(“KEYWORD”, “int”); }
Float { print_token(“KEYWORD”, “float”); }
Char { print_token(“KEYWORD”, “char”); }
Void { print_token(“KEYWORD”, “void”); }
%%
Yyin = file;
}
Yylex();
Return EXIT_SUCCESS;
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%option noyywrap
%%
If { print_token(“KEYWORD”, “if”); }
Else { print_token(“KEYWORD”, “else”); }
While { print_token(“KEYWORD”, “while”); }
For { print_token(“KEYWORD”, “for”); }
Return { print_token(“KEYWORD”, “return”);
} Break { print_token(“KEYWORD”, “break”); }
Continue { print_token(“KEYWORD”, “continue”); }
Int { print_token(“KEYWORD”, “int”); }
Float { print_token(“KEYWORD”, “float”); }
Char { print_token(“KEYWORD”, “char”); }
Void { print_token(“KEYWORD”, “void”); }
%%
Yyin = file;
}
Yylex();
return EXIT_SUCCESS;