0% found this document useful (0 votes)
30 views

Compiler Project Abstract

Uploaded by

RITHIK JOSHUA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Compiler Project Abstract

Uploaded by

RITHIK JOSHUA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

PROJECT ABSTRACT

LEXICAL ANALYZER FOR C LANGUAGE


INTRODUCTION
The goal of this project is to implement a Lexical Analyzer for the C programming language.
Lexical analysis is the first phase of the compilation process, where the input source code is
converted into a sequence of tokens. Each token represents a basic unit of meaning such as
keywords, identifiers, literals, operators, and punctuation. This tool will be built using Flex,
a widely-used lexical analyzer generator, and will output various tokens while processing a
C program file.

OBJECTIVES

The primary objectives of this project are:

1. To read C source code, break it into individual components (tokens), and categorize them
into predefined token types (keywords, operators, numbers, etc.).
2. To ignore unnecessary elements like comments and whitespace.

3. To recognize and print unrecognized characters that do not fit into valid token categories.

4. To support the user by reporting basic token information in a human-readable format, such
as token type and value.

FEATURES

- Keyword Recognition: The analyzer identifies common C keywords such as `int`, `if`, `else`,
`while`, `for`, etc.

- Identifier Matching: It detects valid identifiers based on C language rules, including variable
names and function names.
- Number Detection: It processes integer, floating-point, octal, and hexadecimal literals in
C programs.
- String and Character Literals: The tool correctly identifies and classifies string and
character literals.
- Operators and Punctuation: Recognizes a variety of operators like `+`, `-`, `*`, `/`, relational
operators (`==`, `!=`, `<`, `>`), and punctuation marks like `{`, `}`, `;`, etc.
- Comment Ignoring: Single-line (`//`) and multi-line comments (`/* */`) are ignored during
tokenization.
- Error Handling: The analyzer prints a message when encountering any unrecognized or
invalid characters.
- File Handling: Users can provide a source file as input, allowing the analyzer to process
entire C programs.

METHEDOLOGY

The project uses Flex, a powerful tool for lexical analysis, to define rules for recognizing
different types of tokens. The Flex rules for token categories (keywords, operators, identifiers,
etc.) are implemented using regular expressions. The lexical analyzer is built to read a file
containing C code, break it down into tokens, and print the token type and value to the console.

CONCLUSION

This project successfully implements a basic Lexical Analyzer for C programs, allowing users to
visualize the structure of a C source file by identifying key tokens. By recognizing keywords,
identifiers, operators, literals, and more, the analyzer provides a foundational step toward further
compiler development. It offers essential features like error detection and report generation for
basic C programs.

In future work, the lexical analyzer could be extended to handle more complex C language
features, such as preprocessor directives and macros. Additionally, it could be integrated with a
parser to continue the syntactic and semantic analysis phases, bringing it closer to a full-fledged
compiler.

FUTURE ENHANCEMENTS

- Extend the analyzer to handle preprocessor directives and macros.

- Implement scope handling for variables and function declarations.

- Integrate the lexical analyzer with a parser for syntactic analysis.


This project demonstrates the importance of lexical analysis in compiler construction and
provides a hands-on understanding of how a compiler breaks down source code into
meaningful units.
LEXICAL ANALYZER FOR C LANGUAGE
SOURCE CODE
%{

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Void print_token(const char *token_type, const char *token_value)


{ Printf(“Token Type: %s, Token Value: %s\n”, token_type,
token_value);
}

%}

%option noyywrap

%%

If { print_token(“KEYWORD”, “if”); }
Else { print_token(“KEYWORD”, “else”); }
While { print_token(“KEYWORD”, “while”); }
For { print_token(“KEYWORD”, “for”); }
Return { print_token(“KEYWORD”, “return”);
} Break { print_token(“KEYWORD”, “break”); }
Continue { print_token(“KEYWORD”, “continue”); }
Int { print_token(“KEYWORD”, “int”); }
Float { print_token(“KEYWORD”, “float”); }
Char { print_token(“KEYWORD”, “char”); }
Void { print_token(“KEYWORD”, “void”); }

[a-zA-Z_][a-zA-Z0-9_]* { print_token(“IDENTIFIER”, yytext); }

0[xX][0-9a-fA-F]+ { print_token(“NUMBER”, yytext); }


0[0-7]* { print_token(“NUMBER”, yytext); }
[1-9][0-9]* { print_token(“NUMBER”, yytext); }

[0-9]+\.[0-9]+ { print_token(“NUMBER”, yytext); }

\”([^\\\”]|\\.)*\” { print_token(“STRING”, yytext); } // String literals

\’([^\\\’]|\\.)\’ { print_token(“CHAR_LITERAL”, yytext); } // Character literals

“+” { print_token(“OPERATOR”, “+”); }

“-“ { print_token(“OPERATOR”, “-“); }


“*” { print_token(“OPERATOR”, “*”); }

“/” { print_token(“OPERATOR”, “/”); }

“=” { print_token(“OPERATOR”, “=”); }

“==” { print_token(“OPERATOR”, “==”); }

“<” { print_token(“OPERATOR”, “<”); }

“>” { print_token(“OPERATOR”, “>”); }

“<=” { print_token(“OPERATOR”, “<=”); }


“>=” { print_token(“OPERATOR”, “>=”); }
“!=” { print_token(“OPERATOR”, “!=”); }

“++” { print_token(“OPERATOR”, “++”); }

“—” { print_token(“OPERATOR”, “—“); }

“{“ { print_token(“PUNCTUATION”, “{“); }

“}” { print_token(“PUNCTUATION”, “}”); }

“(“ { print_token(“PUNCTUATION”, “(“); }

“)” { print_token(“PUNCTUATION”, “)”); }

“;” { print_token(“PUNCTUATION”, “;”); }

“,” { print_token(“PUNCTUATION”, “,”); }

“//”.* { /* Ignore single line comments */ } “/\\*([^*]|[\\r\\n]|(\\


*+[^*/]))*\\*+/” { /* Ignore multi-line comments */ }

[ \t\n]+ { /* Ignore whitespace */ }

. { printf(“Unrecognized character: ‘%s’\n”, yytext); }

%%

Int main(int argc, char **argv)


{ If (argc > 1) {
FILE *file = fopen(argv[1], “r”);
If (!file) {
Perror(“Error opening file”);
Return EXIT_FAILURE;
}

Yyin = file;
}

Yylex();

Return EXIT_SUCCESS;

%{

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Void print_token(const char *token_type, const char *token_value)


{ Printf(“Token Type: %s, Token Value: %s\n”, token_type,
token_value);
}

%}

%option noyywrap

%%

If { print_token(“KEYWORD”, “if”); }
Else { print_token(“KEYWORD”, “else”); }
While { print_token(“KEYWORD”, “while”); }
For { print_token(“KEYWORD”, “for”); }
Return { print_token(“KEYWORD”, “return”);
} Break { print_token(“KEYWORD”, “break”); }
Continue { print_token(“KEYWORD”, “continue”); }
Int { print_token(“KEYWORD”, “int”); }
Float { print_token(“KEYWORD”, “float”); }
Char { print_token(“KEYWORD”, “char”); }
Void { print_token(“KEYWORD”, “void”); }

[a-zA-Z_][a-zA-Z0-9_]* { print_token(“IDENTIFIER”, yytext); }

0[xX][0-9a-fA-F]+ { print_token(“NUMBER”, yytext); }


0[0-7]* { print_token(“NUMBER”, yytext); }
[1-9][0-9]* { print_token(“NUMBER”, yytext); }

[0-9]+\.[0-9]+ { print_token(“NUMBER”, yytext); }

\”([^\\\”]|\\.)*\” { print_token(“STRING”, yytext); } // String literals

\’([^\\\’]|\\.)\’ { print_token(“CHAR_LITERAL”, yytext); } // Character literals

“+” { print_token(“OPERATOR”, “+”); }


“-“ { print_token(“OPERATOR”, “-“); }
“*” { print_token(“OPERATOR”, “*”); }

“/” { print_token(“OPERATOR”, “/”); }

“=” { print_token(“OPERATOR”, “=”); }

“==” { print_token(“OPERATOR”, “==”); }


“<” { print_token(“OPERATOR”, “<”); }

“>” { print_token(“OPERATOR”, “>”); }

“<=” { print_token(“OPERATOR”, “<=”); }

“>=” { print_token(“OPERATOR”, “>=”); }

“!=” { print_token(“OPERATOR”, “!=”); }

“++” { print_token(“OPERATOR”, “++”); }

“—” { print_token(“OPERATOR”, “—“); }

“{“ { print_token(“PUNCTUATION”, “{“); }

“}” { print_token(“PUNCTUATION”, “}”); }

“(“ { print_token(“PUNCTUATION”, “(“); }

“)” { print_token(“PUNCTUATION”, “)”); }

“;” { print_token(“PUNCTUATION”, “;”); }

“,” { print_token(“PUNCTUATION”, “,”); }

“//”.* { /* Ignore single line comments */ } “/\\*([^*]|[\\r\\n]|(\\


*+[^*/]))*\\*+/” { /* Ignore multi-line comments */ }

[ \t\n]+ { /* Ignore whitespace */ }


. { printf(“Unrecognized character: ‘%s’\n”, yytext); }

%%

Int main(int argc, char **argv) { If (argc > 1) {


FILE *file = fopen(argv[1], “r”); If (!file) {
Perror(“Error opening file”); Return EXIT_FAILURE;
}

Yyin = file;
}

Yylex();

return EXIT_SUCCESS;

You might also like