0% found this document useful (0 votes)

29 views17 pages

Compiler Construction: Department of Computer Science

The document explains lexical analyzer generator, structure of Lex programs, sample Lex programs and other compiler construction tools. It then provides Python code to tokenize a C source code file and output the sequence of tokens. Finally, it explains recursive descent parsing and predictive parsing with examples.

Uploaded by

MAZHAR ABBAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views17 pages

Compiler Construction: Department of Computer Science

Uploaded by

MAZHAR ABBAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

COMPILER CONSTRUCTION

(ASSIGNMENT # 01 .... SEMESTER FALL- 2023)

Submission Date (16 Nov, 2023)
Submitted By:
Mazhar Abbas
20021519-054
Submitted To:
Dr. Saliha Zahoor
Course Code:
CS-462
Degree Program Title and Section:
BS-VII Computer Science (A)

Department of Computer Science

1|Page
Question # 01: Explain Lexical Analyzer Generator, Structure of Lex program,
Sample Lex programs. Also explain other Compiler Construction Tools.

1.1. Lexical Analyzer Generator

o Lexical analyzers process character streams and produce streams of tokens, which are
fundamental units in programming languages.
o Lexical analyzers are typically the initial stage in a compiler, breaking down source
code into tokens for further processing.
o The parser, as the subsequent stage, analyzes tokens to determine program structure
and generates an intermediate representation.
o Lex can generate lexical analyzers for various programming languages, making it a
versatile tool in compiler construction.

1.1.1. Components and Workflow

Input Description:
o Lexical Analyzer Generators need an input description of the language's lexical structure they
are processing.
o This description is usually specified using regular expressions or patterns linked to different
token types.

Lex Specification File:

o Regular expressions in the file outline patterns for recognizing tokens in the source code.
o Actions detail the response when a pattern is matched, including tasks like returning a token
or executing specific code.
o Programmers create a Lex specification file with rules, pairing regular expressions with
actions.

Lexical Analyzer Generation:

o Lexical Analyzer Generator reads the Lex specification file.
o It generates code, often in C, based on the specified lexical rules in the file.
o The produced code functions as a lexical analyzer, capable of identifying tokens as per the
defined rules in the Lex specification.

Integration with Compiler:

o The generated lexical analyzer is usually incorporated into a compiler or interpreter.
o It works in conjunction with other components like parsers and semantic analyzers within the
overall language processing system.

2|Page
1.1.2. Advantages
o Automation
o Efficiency
o Consistency

1.2. Structure of Lex Program

A Lex program is divided into three parts:
o Definitions Section
o Rules Section
o Action Section

➢ Definitions Section
The Definitions Section in a Lex program includes:
Header Declarations:
o Enclosed within %{ and %} symbols in Lex, the "code section" holds C or target language code.
o This code is included at the start of the generated lexical analyzer.

Macro Definitions:
o The "definitions section" in Lex defines macros or regular expression patterns and their
associated names.
o Enclosed within %{ and %}, these definitions are used in the lexical rules section of the Lex
specification.

Example:

%{
#include <stdio.h>
%}

DIGIT [0-9]
ID [a-zA-Z][a-zA-Z0-9]*

3|Page
➢ Rules Section
The Rules Section contains the lexical rules and associated actions:
Pattern-Action Pairs:
o The "lexical rules section" in Lex comprises rules, each with a pattern and an associated action.
o Patterns are usually regular expressions, specifying what the lexer should identify.
o Actions determine the response when a pattern is matched, defining what actions to take.

Regular Expressions:
o Regular expressions match input text, identifying tokens or patterns to be recognized.

Action Code:
o Action code in Lex can include C or target language code.
o When a pattern is matched, the linked action code is executed.

Example:

DIGIT+ { printf("Number: %s\n", yytext); }

ID { printf("Identifier: %s\n", yytext); }
"+" { printf("Addition Operator\n"); }

➢ User Code Section

The User Code Section includes any additional code required for the lexer:
Code Outside Rules:
o The "user code section" in Lex contains extra C or target language code directly copied to the
generated lexer.
o Commonly includes main() function or other necessary helper functions.

4|Page
Example:

%%
int main() {
yylex(); // Function that starts the lexical analysis
return 0;
}

5|Page
➢ Sample Lex Programs
1. Identifying Keywords and Identifiers
This Lex program identifies keywords (if, else, while, etc.) and identifiers (variable names).

%{
#include <stdio.h>
%}
// Definitions section
letter [a-zA-Z]
digit [0-9]
identifier {letter}({letter}|{digit})*
%%
// Rules section
"if" { printf("<keyword , if>\n"); }
"else" { printf("<keyword , else>\n"); }
"while" { printf("<keyword , while>\n"); }
{identifier} { printf("<id , %s>\n", yytext); }
. ; // Ignore other characters
%%
int main() {
yylex();
return 0;
}

6|Page
2. Handling Arithmetic Operations and Numbers
This Lex program identifies arithmetic operators (+, -, *, /) and numeric constants.

%{
#include <stdio.h>
%}

%%
// Rules section
"+" { printf("<operator , +>\n"); }
"-" { printf("<operator , ->\n"); }
"*" { printf("<operator , *>\n"); }
"/" { printf("<operator , />\n"); }
[0-9]+ { printf("<number , %s>\n", yytext); }
. ; // Ignore other characters

%%
int main() {
yylex();
return 0;
}

7|Page
➢ Other Compiler Construction Tools
o Yacc (Yet another Compiler Constructor)
o ANTLR (ANother Tool for Language Recognition)
o LLVM (Low-Level Virtual Machine)
o JavaCC (Java Compiler Constructor)

1. Yacc
Example:

// Example Yacc grammar for a simple arithmetic expression

#include <stdio.h>

%token NUMBER

%left '+' '-'

%left '*' '/'

expression : expression '+' expression

| expression '-' expression

| expression '*' expression

| expression '/' expression

| NUMBER

int main() {

yyparse();

return 0;

8|Page
2. ANTLR
ANTLR is a powerful parser generator used for building parsers, lexers, and tree walkers. It's capable of
generating code in various languages like Java, C#, Python, and JavaScript.

Example:

grammar Expr;
expression : term (('+' | '-') term)* ;
term : factor (('*' | '/') factor)* ;
factor : NUMBER | '(' expression ')' ;
NUMBER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;

3. LLVM
LLVM is an infrastructure for building compilers that provides reusable libraries and tools for various
compilation tasks, including optimization and code generation.

Example:

define i32 @add(i32 %a, i32 %b) {

%sum = add i32 %a, %b
ret i32 %sum
}

9|Page
4. JavaCC
JavaCC is a parser generator specifically designed for the Java programming language, creating parsers,
lexers, and syntax trees.

Example:

options {
STATIC = false;
}
PARSER_BEGIN(ExpressionParser)
public class ExpressionParser {}
PARSER_END(ExpressionParser)
TOKEN: {
<NUMBER: (["0"-"9"])+>
}
void expression() :
{}
{
( <NUMBER> )+
}

10 | P a g e
Question # 02: Write a program of identification of tokens. Your program will
take a .txt file that contains the source code of C language as an input and gives the
sequence of tokens an output.

➢ Python Code
def tokenize_c_code(file_path):
patterns = [
('keyword', r'\b(?:int|if|else|while|for|return)\b'),
('id', r'\b[a-zA-Z_]\w*\b'),
('relational_operator', r'==|<=|>=|!=|[<>]=?'),
('operator', r'[+\-*/%<>&|^!]=?'),
('assignment', r'='),
('number', r'\b\d+(\.\d+)?\b'),
('delimiter', r'[;,(){}\[\]]'),
('whitespace', r'\s+')
]
pattern = '|'.join('(?P<%s>%s)' % pair for pair in patterns)
regex = re.compile(pattern)
with open(file_path, 'r') as file:
source_code = file.read()
tokens = []
printed_tokens = set() # Track printed tokens
for match in regex.finditer(source_code):
token_type = match.lastgroup
token_value = match.group()
if token_type != 'whitespace':
token = (token_type, token_value)
if token not in printed_tokens: # Check if token is already printed

11 | P a g e
tokens.append(token)
printed_tokens.add(token) # Add token to printed set
return tokens
input_file_path = 'sample_c_code.txt' # Replace with the path to your C code file
tokens = tokenize_c_code(input_file_path)

# Print the sequence of tokens

for token in tokens:
print(f'<{token[0]} , {token[1]}>')

o Input
int b;
b = 5;
if ( b== 5 )
{
b=b+1;
}

o Ouput

12 | P a g e
Question # 03: Explain Recursive Descent parser and Predictive Parser with the
help of examples.

➢ Recursive Descent Parser

Recursive Descent Parser is a top-down parsing technique used in compiler construction to parse the
syntax of a programming language based on its grammar rules. It operates by recursively applying
predictive functions or procedures that correspond to the production rules of the grammar (backtracking).

➢ Key Features
o Top-Down Parsing:
• Parsing process begins at the root of the syntax tree.
• Progresses from root to leaves of the tree.
• Starts from the start symbol of the grammar.
• Seeks to match input with grammar rules.
o LL(k) Parsing:
• Recursive Descent Parsers are termed LL(k) parsers.
• LL stands for Left-to-right, Leftmost derivation.
• "k" indicates the number of lookahead tokens for parsing decisions.
• These parsers predict the production to use based on a fixed number of lookahead
tokens.
o Procedure-based:
• Each non-terminal in the grammar corresponds to a parsing procedure.
• This approach makes it easy to conceptualize and implement parsing logic.
o Readability:
• Easy to understand and write, particularly for simpler grammars.
• Parsing logic closely mirrors the grammar rules.
o Backtracking:
• Traditional Recursive Descent parsers may use backtracking.
• Backtracking involves exploring different production rules if the current path fails.
• Drawback in efficiency for complex grammars due to potential reevaluation of paths.
o Error Handling:
• Error recovery and reporting can be challenging, especially when dealing with
ambiguous or incorrect input.

➢ Steps
o Grammar Representation: The grammar is represented explicitly, usually in BNF (Backus-Naur
Form) or EBNF (Extended Backus-Naur Form).
o Parsing Procedures: Recursive procedures are created for each non-terminal symbol in the
grammar. These procedures are called recursively to parse the input.

13 | P a g e
o Tokenization: The input is tokenized, breaking it down into tokens (like identifiers, keywords,
operators, etc.).
o Parsing Logic: The parsing procedures handle each production rule, recursively calling
themselves to parse sub-components of the input according to the grammar rules.
o Error Handling: Error handling is crucial in Recursive Descent parsers. It can involve identifying
syntax errors and possibly recovering from them to continue parsing.

➢ Example
Grammar:

S → mXn | mZn
X → pq | sq
Z → qr

Step 1:

Step 2:

Step 3:

Backtracking

Step 4:

14 | P a g e
➢ Predictive Parser
A Predictive Parser is a specialized form of Recursive Descent parser that can predict which production
rule to use without backtracking. It achieves this by using a parsing table derived from the grammar.

➢ Key Features
o Deterministic Parsing: Based on a parsing table constructed from the grammar, allowing the
parser to make deterministic choices without backtracking.
o Parsing Table: Utilizes a table (often called a parsing or LL(1) table) to predict the production
rule based on the current input symbol and the symbol at the top of the stack.
o Efficiency: Avoids backtracking, resulting in potentially faster parsing for unambiguous
grammars.
o Handling Ambiguity: Works well for grammars that are unambiguous and don't have left
recursion. Ambiguous grammars might require modifications to be parsed predictively.
o Error Handling: Because of its deterministic nature, error handling is generally more
straightforward compared to traditional Recursive Descent parsers.

➢ Steps
o Constructing Parsing Table: The parsing table is created based on the grammar. Rows represent
non-terminal symbols, columns represent terminal symbols, and cells in the table contain the
production rule to use.
o Input Parsing: The parsing algorithm reads the input stream and the parsing table to predict the
correct production rule for each step.
o Stack-based Parsing: The parser uses a stack to simulate the parsing process. It matches the input
symbols with the stack symbols and decides which production rule to apply based on the current
input symbol and the symbol at the top of the stack.
o Handling Errors: Similar to Recursive Descent parsers, error handling is essential in Predictive
parsers. Invalid inputs or syntax errors need to be identified and handled gracefully.

15 | P a g e
Grammar:

S → aABb
S’ → c / є
E → d/є

First Sets:

FIRST(S) = { a }
FIRST(S’) = { c, є }
FIRST(E) = { d, є }

Follow Sets:

FOLLOW(S) = { $ }
FOLLOW(A) = { d, b}
FOLLOW(B) = { b }

Input:

acdb

Using Stack:

Stack Input Action

$ acdb$ Push S
$S acdb$ S->aABb
$bBAa acdb$ Pop a
$bBA cdb$ A->C
$bBc cdb$ Pop c
$bB db$ B->d
$bd db$ Pop d
$b b$ Pop b
$ $ Accept

16 | P a g e
M-Table:

a b c d $
S S->aABb
A A-> є A-> є A-> є
B B-> є B->d

THE END

17 | P a g e

CS3501 - Compiler Design Lab Manual
No ratings yet
CS3501 - Compiler Design Lab Manual
37 pages
Compiler Lab Manual Final E-Content
75% (16)
Compiler Lab Manual Final E-Content
55 pages
Cs3501 Compiler Design Laboratory 2021r - Lab Manual
No ratings yet
Cs3501 Compiler Design Laboratory 2021r - Lab Manual
55 pages
Compiler Design Practical File PDF
No ratings yet
Compiler Design Practical File PDF
33 pages
SS Lab Manual
No ratings yet
SS Lab Manual
38 pages
Compiler Design (CD) : Lab Assignment 1
No ratings yet
Compiler Design (CD) : Lab Assignment 1
36 pages
Program 1, 2 (1) - Merged
0% (1)
Program 1, 2 (1) - Merged
39 pages
CS3501 - Compiler Design Lab Manual
No ratings yet
CS3501 - Compiler Design Lab Manual
53 pages
Lab
No ratings yet
Lab
169 pages
Compiler Desing-Final ppt2
No ratings yet
Compiler Desing-Final ppt2
194 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
36 pages
Compiler Design Lab
No ratings yet
Compiler Design Lab
68 pages
SS Manual GEC 18CSL66
No ratings yet
SS Manual GEC 18CSL66
49 pages
Code:: Compiler Design (3170701) 190090107055
No ratings yet
Code:: Compiler Design (3170701) 190090107055
76 pages
CD Cse Record
No ratings yet
CD Cse Record
76 pages
CD (Aicte 2020-2021)
No ratings yet
CD (Aicte 2020-2021)
74 pages
SS Lab Manual
No ratings yet
SS Lab Manual
66 pages
Compiler Design
No ratings yet
Compiler Design
40 pages
System Programming & Compiler Design Lab Manual
No ratings yet
System Programming & Compiler Design Lab Manual
41 pages
CD Lab RECORD PRINT' 1
No ratings yet
CD Lab RECORD PRINT' 1
91 pages
Compiler 56
No ratings yet
Compiler 56
39 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Laboratory - Manual: Compiler Design
No ratings yet
Laboratory - Manual: Compiler Design
38 pages
Cs3501 Compiler Design Lab Manual
No ratings yet
Cs3501 Compiler Design Lab Manual
54 pages
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
No ratings yet
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
15 pages
CD MANUAL Edited
No ratings yet
CD MANUAL Edited
26 pages
E4 - Flex and Bison
No ratings yet
E4 - Flex and Bison
23 pages
Lex Yacc Tutorial
No ratings yet
Lex Yacc Tutorial
38 pages
Cs3501 Compiler Design Laboratory
No ratings yet
Cs3501 Compiler Design Laboratory
50 pages
LexYacc Final
No ratings yet
LexYacc Final
44 pages
System Software Manual
No ratings yet
System Software Manual
27 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
CDLabmanual
No ratings yet
CDLabmanual
40 pages
9536 Exp5 Merged
No ratings yet
9536 Exp5 Merged
18 pages
Lecture003 LEXandYACC
No ratings yet
Lecture003 LEXandYACC
64 pages
Lex Material 1
No ratings yet
Lex Material 1
37 pages
Compiler-Design Notes
No ratings yet
Compiler-Design Notes
5 pages
Lexical Analyser Parser
No ratings yet
Lexical Analyser Parser
37 pages
Compiler Design Lab KCS552
No ratings yet
Compiler Design Lab KCS552
82 pages
Introduction To Lex
No ratings yet
Introduction To Lex
18 pages
Lab Manual CD
No ratings yet
Lab Manual CD
19 pages
CompilerDesignLabManual PDF
No ratings yet
CompilerDesignLabManual PDF
11 pages
CD - Yash Final
No ratings yet
CD - Yash Final
50 pages
SSCD Assignment1
No ratings yet
SSCD Assignment1
11 pages
UNIT I BKS Lexical Analysis IX - LEX
No ratings yet
UNIT I BKS Lexical Analysis IX - LEX
17 pages
Introduction For Lab Compiler
No ratings yet
Introduction For Lab Compiler
15 pages
Spring 2024 Compiler Constructoin A Lab 5
No ratings yet
Spring 2024 Compiler Constructoin A Lab 5
9 pages
2 Lexing
No ratings yet
2 Lexing
16 pages
AdityaGaba Compiler
No ratings yet
AdityaGaba Compiler
32 pages
SPCC Exp7
No ratings yet
SPCC Exp7
8 pages
CC2
No ratings yet
CC2
6 pages
Compiler Construction
No ratings yet
Compiler Construction
7 pages
Lex and Yacc Roll No 23
No ratings yet
Lex and Yacc Roll No 23
7 pages
Experiment No 4 CD
No ratings yet
Experiment No 4 CD
4 pages
CC Assignment-1
No ratings yet
CC Assignment-1
3 pages
CD Lab Performance 6 Question Paper - S25
No ratings yet
CD Lab Performance 6 Question Paper - S25
3 pages

Compiler Construction: Department of Computer Science

Uploaded by

Compiler Construction: Department of Computer Science

Uploaded by

COMPILER CONSTRUCTION

(ASSIGNMENT # 01 .... SEMESTER FALL- 2023)

Department of Computer Science

1.1. Lexical Analyzer Generator

1.1.1. Components and Workflow

Lex Specification File:

Lexical Analyzer Generation:

Integration with Compiler:

1.2. Structure of Lex Program

DIGIT+ { printf("Number: %s\n", yytext); }

➢ User Code Section

// Example Yacc grammar for a simple arithmetic expression

%left '+' '-'

%left '*' '/'

expression : expression '+' expression

| expression '-' expression

| expression '*' expression

| expression '/' expression

define i32 @add(i32 %a, i32 %b) {

# Print the sequence of tokens

➢ Recursive Descent Parser

Stack Input Action

You might also like