0% found this document useful (0 votes)
29 views17 pages

Compiler Construction: Department of Computer Science

The document explains lexical analyzer generator, structure of Lex programs, sample Lex programs and other compiler construction tools. It then provides Python code to tokenize a C source code file and output the sequence of tokens. Finally, it explains recursive descent parsing and predictive parsing with examples.

Uploaded by

MAZHAR ABBAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views17 pages

Compiler Construction: Department of Computer Science

The document explains lexical analyzer generator, structure of Lex programs, sample Lex programs and other compiler construction tools. It then provides Python code to tokenize a C source code file and output the sequence of tokens. Finally, it explains recursive descent parsing and predictive parsing with examples.

Uploaded by

MAZHAR ABBAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

COMPILER CONSTRUCTION

(ASSIGNMENT # 01 .... SEMESTER FALL- 2023)


Submission Date (16 Nov, 2023)
Submitted By:
Mazhar Abbas
20021519-054
Submitted To:
Dr. Saliha Zahoor
Course Code:
CS-462
Degree Program Title and Section:
BS-VII Computer Science (A)

Department of Computer Science

1|Page
Question # 01: Explain Lexical Analyzer Generator, Structure of Lex program,
Sample Lex programs. Also explain other Compiler Construction Tools.

1.1. Lexical Analyzer Generator


o Lexical analyzers process character streams and produce streams of tokens, which are
fundamental units in programming languages.
o Lexical analyzers are typically the initial stage in a compiler, breaking down source
code into tokens for further processing.
o The parser, as the subsequent stage, analyzes tokens to determine program structure
and generates an intermediate representation.
o Lex can generate lexical analyzers for various programming languages, making it a
versatile tool in compiler construction.

1.1.1. Components and Workflow


Input Description:
o Lexical Analyzer Generators need an input description of the language's lexical structure they
are processing.
o This description is usually specified using regular expressions or patterns linked to different
token types.

Lex Specification File:


o Regular expressions in the file outline patterns for recognizing tokens in the source code.
o Actions detail the response when a pattern is matched, including tasks like returning a token
or executing specific code.
o Programmers create a Lex specification file with rules, pairing regular expressions with
actions.

Lexical Analyzer Generation:


o Lexical Analyzer Generator reads the Lex specification file.
o It generates code, often in C, based on the specified lexical rules in the file.
o The produced code functions as a lexical analyzer, capable of identifying tokens as per the
defined rules in the Lex specification.

Integration with Compiler:


o The generated lexical analyzer is usually incorporated into a compiler or interpreter.
o It works in conjunction with other components like parsers and semantic analyzers within the
overall language processing system.

2|Page
1.1.2. Advantages
o Automation
o Efficiency
o Consistency

1.2. Structure of Lex Program


A Lex program is divided into three parts:
o Definitions Section
o Rules Section
o Action Section

➢ Definitions Section
The Definitions Section in a Lex program includes:
Header Declarations:
o Enclosed within %{ and %} symbols in Lex, the "code section" holds C or target language code.
o This code is included at the start of the generated lexical analyzer.

Macro Definitions:
o The "definitions section" in Lex defines macros or regular expression patterns and their
associated names.
o Enclosed within %{ and %}, these definitions are used in the lexical rules section of the Lex
specification.

Example:

%{
#include <stdio.h>
%}

DIGIT [0-9]
ID [a-zA-Z][a-zA-Z0-9]*

%%

3|Page
➢ Rules Section
The Rules Section contains the lexical rules and associated actions:
Pattern-Action Pairs:
o The "lexical rules section" in Lex comprises rules, each with a pattern and an associated action.
o Patterns are usually regular expressions, specifying what the lexer should identify.
o Actions determine the response when a pattern is matched, defining what actions to take.

Regular Expressions:
o Regular expressions match input text, identifying tokens or patterns to be recognized.

Action Code:
o Action code in Lex can include C or target language code.
o When a pattern is matched, the linked action code is executed.

Example:

DIGIT+ { printf("Number: %s\n", yytext); }


ID { printf("Identifier: %s\n", yytext); }
"+" { printf("Addition Operator\n"); }

➢ User Code Section


The User Code Section includes any additional code required for the lexer:
Code Outside Rules:
o The "user code section" in Lex contains extra C or target language code directly copied to the
generated lexer.
o Commonly includes main() function or other necessary helper functions.

4|Page
Example:

%%
int main() {
yylex(); // Function that starts the lexical analysis
return 0;
}

5|Page
➢ Sample Lex Programs
1. Identifying Keywords and Identifiers
This Lex program identifies keywords (if, else, while, etc.) and identifiers (variable names).

%{
#include <stdio.h>
%}
// Definitions section
letter [a-zA-Z]
digit [0-9]
identifier {letter}({letter}|{digit})*
%%
// Rules section
"if" { printf("<keyword , if>\n"); }
"else" { printf("<keyword , else>\n"); }
"while" { printf("<keyword , while>\n"); }
{identifier} { printf("<id , %s>\n", yytext); }
. ; // Ignore other characters
%%
int main() {
yylex();
return 0;
}

6|Page
2. Handling Arithmetic Operations and Numbers
This Lex program identifies arithmetic operators (+, -, *, /) and numeric constants.

%{
#include <stdio.h>
%}

%%
// Rules section
"+" { printf("<operator , +>\n"); }
"-" { printf("<operator , ->\n"); }
"*" { printf("<operator , *>\n"); }
"/" { printf("<operator , />\n"); }
[0-9]+ { printf("<number , %s>\n", yytext); }
. ; // Ignore other characters

%%
int main() {
yylex();
return 0;
}

7|Page
➢ Other Compiler Construction Tools
o Yacc (Yet another Compiler Constructor)
o ANTLR (ANother Tool for Language Recognition)
o LLVM (Low-Level Virtual Machine)
o JavaCC (Java Compiler Constructor)

1. Yacc
Example:

// Example Yacc grammar for a simple arithmetic expression

%{

#include <stdio.h>

%}

%token NUMBER

%left '+' '-'

%left '*' '/'

%%

expression : expression '+' expression

| expression '-' expression

| expression '*' expression

| expression '/' expression

| NUMBER

%%

int main() {

yyparse();

return 0;

8|Page
2. ANTLR
ANTLR is a powerful parser generator used for building parsers, lexers, and tree walkers. It's capable of
generating code in various languages like Java, C#, Python, and JavaScript.

Example:

grammar Expr;
expression : term (('+' | '-') term)* ;
term : factor (('*' | '/') factor)* ;
factor : NUMBER | '(' expression ')' ;
NUMBER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;

3. LLVM
LLVM is an infrastructure for building compilers that provides reusable libraries and tools for various
compilation tasks, including optimization and code generation.

Example:

define i32 @add(i32 %a, i32 %b) {


%sum = add i32 %a, %b
ret i32 %sum
}

9|Page
4. JavaCC
JavaCC is a parser generator specifically designed for the Java programming language, creating parsers,
lexers, and syntax trees.

Example:

options {
STATIC = false;
}
PARSER_BEGIN(ExpressionParser)
public class ExpressionParser {}
PARSER_END(ExpressionParser)
TOKEN: {
<NUMBER: (["0"-"9"])+>
}
void expression() :
{}
{
( <NUMBER> )+
}

10 | P a g e
Question # 02: Write a program of identification of tokens. Your program will
take a .txt file that contains the source code of C language as an input and gives the
sequence of tokens an output.

➢ Python Code
def tokenize_c_code(file_path):
patterns = [
('keyword', r'\b(?:int|if|else|while|for|return)\b'),
('id', r'\b[a-zA-Z_]\w*\b'),
('relational_operator', r'==|<=|>=|!=|[<>]=?'),
('operator', r'[+\-*/%<>&|^!]=?'),
('assignment', r'='),
('number', r'\b\d+(\.\d+)?\b'),
('delimiter', r'[;,(){}\[\]]'),
('whitespace', r'\s+')
]
pattern = '|'.join('(?P<%s>%s)' % pair for pair in patterns)
regex = re.compile(pattern)
with open(file_path, 'r') as file:
source_code = file.read()
tokens = []
printed_tokens = set() # Track printed tokens
for match in regex.finditer(source_code):
token_type = match.lastgroup
token_value = match.group()
if token_type != 'whitespace':
token = (token_type, token_value)
if token not in printed_tokens: # Check if token is already printed

11 | P a g e
tokens.append(token)
printed_tokens.add(token) # Add token to printed set
return tokens
input_file_path = 'sample_c_code.txt' # Replace with the path to your C code file
tokens = tokenize_c_code(input_file_path)

# Print the sequence of tokens


for token in tokens:
print(f'<{token[0]} , {token[1]}>')

o Input
int b;
b = 5;
if ( b== 5 )
{
b=b+1;
}

o Ouput

12 | P a g e
Question # 03: Explain Recursive Descent parser and Predictive Parser with the
help of examples.

➢ Recursive Descent Parser


Recursive Descent Parser is a top-down parsing technique used in compiler construction to parse the
syntax of a programming language based on its grammar rules. It operates by recursively applying
predictive functions or procedures that correspond to the production rules of the grammar (backtracking).

➢ Key Features
o Top-Down Parsing:
• Parsing process begins at the root of the syntax tree.
• Progresses from root to leaves of the tree.
• Starts from the start symbol of the grammar.
• Seeks to match input with grammar rules.
o LL(k) Parsing:
• Recursive Descent Parsers are termed LL(k) parsers.
• LL stands for Left-to-right, Leftmost derivation.
• "k" indicates the number of lookahead tokens for parsing decisions.
• These parsers predict the production to use based on a fixed number of lookahead
tokens.
o Procedure-based:
• Each non-terminal in the grammar corresponds to a parsing procedure.
• This approach makes it easy to conceptualize and implement parsing logic.
o Readability:
• Easy to understand and write, particularly for simpler grammars.
• Parsing logic closely mirrors the grammar rules.
o Backtracking:
• Traditional Recursive Descent parsers may use backtracking.
• Backtracking involves exploring different production rules if the current path fails.
• Drawback in efficiency for complex grammars due to potential reevaluation of paths.
o Error Handling:
• Error recovery and reporting can be challenging, especially when dealing with
ambiguous or incorrect input.

➢ Steps
o Grammar Representation: The grammar is represented explicitly, usually in BNF (Backus-Naur
Form) or EBNF (Extended Backus-Naur Form).
o Parsing Procedures: Recursive procedures are created for each non-terminal symbol in the
grammar. These procedures are called recursively to parse the input.

13 | P a g e
o Tokenization: The input is tokenized, breaking it down into tokens (like identifiers, keywords,
operators, etc.).
o Parsing Logic: The parsing procedures handle each production rule, recursively calling
themselves to parse sub-components of the input according to the grammar rules.
o Error Handling: Error handling is crucial in Recursive Descent parsers. It can involve identifying
syntax errors and possibly recovering from them to continue parsing.

➢ Example
Grammar:

S → mXn | mZn
X → pq | sq
Z → qr

Step 1:

Step 2:

Step 3:

Backtracking

Step 4:

14 | P a g e
➢ Predictive Parser
A Predictive Parser is a specialized form of Recursive Descent parser that can predict which production
rule to use without backtracking. It achieves this by using a parsing table derived from the grammar.

➢ Key Features
o Deterministic Parsing: Based on a parsing table constructed from the grammar, allowing the
parser to make deterministic choices without backtracking.
o Parsing Table: Utilizes a table (often called a parsing or LL(1) table) to predict the production
rule based on the current input symbol and the symbol at the top of the stack.
o Efficiency: Avoids backtracking, resulting in potentially faster parsing for unambiguous
grammars.
o Handling Ambiguity: Works well for grammars that are unambiguous and don't have left
recursion. Ambiguous grammars might require modifications to be parsed predictively.
o Error Handling: Because of its deterministic nature, error handling is generally more
straightforward compared to traditional Recursive Descent parsers.

➢ Steps
o Constructing Parsing Table: The parsing table is created based on the grammar. Rows represent
non-terminal symbols, columns represent terminal symbols, and cells in the table contain the
production rule to use.
o Input Parsing: The parsing algorithm reads the input stream and the parsing table to predict the
correct production rule for each step.
o Stack-based Parsing: The parser uses a stack to simulate the parsing process. It matches the input
symbols with the stack symbols and decides which production rule to apply based on the current
input symbol and the symbol at the top of the stack.
o Handling Errors: Similar to Recursive Descent parsers, error handling is essential in Predictive
parsers. Invalid inputs or syntax errors need to be identified and handled gracefully.

15 | P a g e
Grammar:

S → aABb
S’ → c / є
E → d/є

First Sets:

FIRST(S) = { a }
FIRST(S’) = { c, є }
FIRST(E) = { d, є }

Follow Sets:

FOLLOW(S) = { $ }
FOLLOW(A) = { d, b}
FOLLOW(B) = { b }

Input:

acdb

Using Stack:

Stack Input Action


$ acdb$ Push S
$S acdb$ S->aABb
$bBAa acdb$ Pop a
$bBA cdb$ A->C
$bBc cdb$ Pop c
$bB db$ B->d
$bd db$ Pop d
$b b$ Pop b
$ $ Accept

16 | P a g e
M-Table:

a b c d $
S S->aABb
A A-> є A-> є A-> є
B B-> є B->d

THE END

17 | P a g e

You might also like