Delhi Technological University Department of Computer Science and Engineering
Delhi Technological University Department of Computer Science and Engineering
CO302
(2K21/CO/493)
INDEX
THEORY: As it is known that Lexical Analysis is the first phase of compiler also known as
scanner. It converts the input program into a sequence of tokens. A C program consists of various
tokens and a token is either a keyword, an identifier, a constant, a string literal or a symbol.
3) Operators: Examples-'+','++','-'etc.
ALGORITHM:
PROGRAM:
#include <iostream>
#include <string>
#include <sstream>
#include <unordered_set>
};
bool isOperator(char c) {
stringstream ss(program);
string token;
if (isKeyword(token)) {
keywordsCount++;
operatorsCount++;
} else {
identifiersCount++;
}
int main() {
string program;
cout << "Enter the C++ program (end with Ctrl+D or Ctrl+Z):" << endl;
stringstream buffer;
program = buffer.str();
int keywordsCount = 0;
int operatorsCount = 0;
int identifiersCount = 0;
cout << "Total number of tokens: " << totalTokens << endl;
return 0;
}
OUTPUT:
EXPERIMENT - 2
AIM: Write a program to convert NFA to DFA.
THEORY:
An NFA can have zero, one or more than one move from a given state on a given input symbol.
An NFA can also have NULL moves (moves without input symbol). On the other hand, DFA has
one and only one move from a given state on a given input symbol.
Steps for converting NFA to DFA:
Step 1: Convert the given NFA to its equivalent transition table
To convert the NFA to its equivalent transition table, we need to list all the states, input symbols,
and the transition rules. The transition rules are represented in the form of a matrix, where the
rows represent the current state, the columns represent the input symbol, and the cells represent
the next state.
Step 2: Create the DFA’s start state
The DFA’s start state is the set of all possible starting states in the NFA. This set is called the
“epsilon closure” of the NFA’s start state. The epsilon closure is the set of all states that can be
reached from the start state by following epsilon (?) transitions.
Step 3: Create the DFA’s transition table
The DFA’s transition table is similar to the NFA’s transition table, but instead of individual states,
the rows and columns represent sets of states. For each input symbol, the corresponding cell in
the transition table contains the epsilon closure of the set of states obtained by following the
transition rules in the NFA’s transition table.
Step 4: Create the DFA’s final states
The DFA’s final states are the sets of states that contain at least one final state from the NFA.
Step 5: Simplify the DFA
The DFA obtained in the previous steps may contain unnecessary states and transitions. To
simplify the DFA, we can use the following techniques:
● Remove unreachable states: States that cannot be reached from the start state can be
removed from the DFA.
● Remove dead states: States that cannot lead to a final state can be removed from the
DFA.
● Merge equivalent states: States that have the same transition rules for all input
symbols can be merged into a single state.
Step 6: Repeat steps 3-5 until no further simplification is possible
After simplifying the DFA, we repeat steps 3-5 until no further simplification is possible. The
final DFA obtained is the minimized DFA equivalent to the given NFA.
PROGRAM:
closure = set(states)
stack = list(states)
while stack:
current_state = stack.pop()
if 'ε' in nfa[current_state]:
closure.add(state)
stack.append(state)
return closure
result = set()
if symbol in nfa[state]:
result |= set(nfa[state][symbol])
return result
def nfa_to_dfa(nfa):
dfa = {}
alphabet = set()
alphabet |= set(nfa[state].keys())
dfa[initial_state] = {}
stack = [initial_state]
while stack:
current_states = stack.pop()
dfa[current_states][symbol] = next_states
dfa[next_states] = {}
stack.append(next_states)
return dfa
def print_dfa(dfa):
print()
nfa = {
'initial': 'q0',
dfa = nfa_to_dfa(nfa)
print_dfa(dfa)
OUTPUT:
EXPERIMENT - 3
AIM: Write a program to remove left recursion in a grammar.
THEORY:
Left recursion is a phenomenon that occurs in a grammar when a non-terminal can derive a string
that begins with itself. For example, consider a production rule like this:
A → Aα | β
In this production rule, A is left-recursive because it can derive a string starting with A. Left
recursion can cause issues in parsing algorithms because it can lead to infinite recursion or
ambiguity.
To remove left recursion from a grammar, you can use a technique called left recursion
elimination. The basic idea is to rewrite the production rules so that the left recursion is
converted to a right Recursion.
PROGRAM:
def remove_left_recursion(grammar):
new_grammar = {}
recursive_productions = []
non_recursive_productions = []
if production.startswith(non_terminal):
recursive_productions.append(production)
else:
non_recursive_productions.append(production)
if len(recursive_productions) > 0:
new_grammar[new_non_terminal].append(production +
new_non_terminal)
new_grammar[new_non_terminal].append('')
new_grammar[new_non_terminal].extend(production[1:] +
new_non_terminal)
else:
new_grammar[non_terminal] = productions
return new_grammar
def print_grammar(grammar):
grammar = {
print("Original Grammar:")
print_grammar(grammar)
new_grammar = remove_left_recursion(grammar)
print_grammar(new_grammar)
OUTPUT:
EXPERIMENT - 4
AIM: Write a program to left factor the given grammar.
THEORY:
In compiler design, left factorization is a technique that simplifies and improves a programming
language's syntax. To reduce ambiguity and duplication in the grammar, it entails detecting
common prefixes in productions of the grammar and factoring them out into independent
productions. This lowers the parser's complexity and increases parsing efficiency. The following
outlines the fundamentals of left factorization and how to construct a C++ programme for it:
• When multiple production rules share the same prefix, it can lead to ambiguity or redundancy
during parsing.
• Left factorization aims to factor out these common prefixes into separate production rules to
simplify the grammar and improve parsing efficiency.
• The resulting grammar should be unambiguous and capable of generating the same language as
the original grammar.
PROGRAM:
def left_factor(grammar):
new_grammar = {}
common_prefixes = {}
prefix = ''
prefix += symbol
common_prefixes[prefix].append(symbol)
if len(symbols) > 1:
new_grammar[new_non_terminal] = [prefix]
updated_productions = []
if production.startswith(prefix):
updated_productions.append(production[len(prefix):])
new_grammar[non_terminal] = updated_productions
if symbol != prefix:
if len(new_grammar.get(non_terminal, [])) == 0:
new_grammar[non_terminal] = []
new_grammar[non_terminal].append(symbol +
new_non_terminal)
new_grammar = left_factor(new_grammar)
break
return new_grammar
def print_grammar(grammar):
print("Original Grammar:")
print_grammar(grammar)
new_grammar = left_factor(grammar)
print_grammar(new_grammar)
OUTPUT:
EXPERIMENT - 5
AIM: Write a program to compute First and Follow.
THEORY:
First(y) is the set of terminals that begin the strings derived from y. Follow(A) is the set of
terminals that can appear to the right of A. First and Follow are used in the construction of the
parsing table. A → abc / def / ghi First(A) = { a , d , g }
To compute First :-
To compute Follow :-
1) Place $ in Follow(S), where S is the start symbol and $ is the end-of-input marker.
PROGRAM:
def compute_first(grammar):
first = {}
first[non_terminal] = set()
while True:
updated = False
continue
first_symbol = production[0]
old_size = len(first[non_terminal])
first[non_terminal] |= first[first_symbol]
if len(first[non_terminal]) != old_size:
updated = True
break
else:
old_size = len(first[non_terminal])
first[non_terminal].add('')
if len(first[non_terminal]) != old_size:
updated = True
if not updated:
break
return first
follow[start_symbol].add('$')
while True:
updated = False
for i in range(len(production)):
symbol = production[i]
if symbol in grammar:
if i < len(production) - 1:
next_symbol = production[i + 1]
if next_symbol in grammar:
follow_set = first[next_symbol]
if '' in follow_set:
follow_set -= {''}
follow_set |= follow[non_terminal]
else:
follow_set = {next_symbol}
old_size = len(follow[symbol])
follow[symbol] |= follow_set
if len(follow[symbol]) != old_size:
updated = True
else:
old_size = len(follow[symbol])
follow[symbol] |= follow[non_terminal]
if len(follow[symbol]) != old_size:
updated = True
if not updated:
break
return follow
print("First Set:")
print("\nFollow Set:")
grammar = {
'S': ['AB'],
first_set = compute_first(grammar)
print_sets(first_set, follow_set)
OUTPUT:
EXPERIMENT - 6
AIM: Write a program to implement Shift Reduce parser.
THEORY:
Shift Reduce parser attempts for the construction of parse in a similar manner as done in
bottom-up parsing i.e. the parse tree is constructed from leaves(bottom) to the root(up). A more
general form of the shift-reduce parser is the LR parser.
This parser requires some data structures i.e.
● An input buffer for storing the input string.
● A stack for storing and accessing the production rules.
PROGRAM:
class ShiftReduceParser:
self.grammar = grammar
self.start_symbol = start_symbol
self.stack = []
self.parsing_table = parsing_table
self.stack.append(token)
print("Stack:", self.stack)
for _ in range(len(production.right)):
self.stack.pop()
self.stack.append(production.left)
print("Reduced by production:", production.left, "->",
production.right)
print("Stack:", self.stack)
input_index = 0
next_token = input_tokens[input_index]
if action.startswith('S'):
self.shift(next_token)
input_index += 1
elif action.startswith('R'):
production = self.grammar[int(action[1:])]
self.reduce(production)
print("Parsing Successful!")
return True
else:
return False
class Production:
self.left = left
self.right = right
class Grammar:
self.productions = productions
productions = [
Production('E', ['id'])
grammar = Grammar(productions)
parsing_table = {
parser.parse(input_tokens)
OUTPUT:
EXPERIMENT - 7
AIM: Write a program to construct LL(1) parsing table.
THEORY:
LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the
Left to Right manner and the second L shows that in this parsing technique, we are going to use
the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead, which
means how many symbols are you going to see when you want to make a decision.
PROGRAM:
class LL1Parser:
self.grammar = grammar
self.parsing_table = parsing_table
stack = ['$']
input_tokens.append('$')
index = 0
stack.append(self.grammar.start_symbol)
while stack:
top = stack[-1]
current_token = input_tokens[index]
print("Parsing Successful!")
return True
if top == current_token:
stack.pop()
index += 1
else:
return False
stack.pop()
if production != 'ε':
stack.append(symbol)
else:
return False
return False
class Production:
self.left = left
self.right = right
class Grammar:
self.start_symbol = start_symbol
self.terminals = terminals
self.non_terminals = non_terminals
self.productions = productions
start_symbol = 'E'
productions = [
Production('E', ['T']),
Production('T', ['F']),
Production('F', ['id'])
parsing_table = {
parser.parse(input_tokens)
OUTPUT:
EXPERIMENT - 8
AIM: Write a program to construct LR(0) parsing table.
THEORY:
It is an efficient bottom-up syntax analysis technique that can be used to parse large classes of
context free grammar is called LR(0) parsing.
PROGRAM:
class Production:
self.head = head
class Item:
self.production = production
self.dot_position = dot_position
def __repr__(self):
body = self.production.body[:]
body.insert(self.dot_position, '•')
class LR0Parser:
def __init__(self, grammar, start_symbol):
self.grammar = grammar
self.start_symbol = start_symbol
self.canonical_collection = []
self.action_table = {}
self.goto_table = {}
closure = items[:]
added = True
while added:
added = False
symbol_after_dot =
item.production.body[item.dot_position]
if symbol_after_dot in self.grammar.non_terminals:
if production.head == symbol_after_dot:
new_item = Item(production, 0)
closure.append(new_item)
added = True
return closure
new_items.append(new_item)
return self.compute_closure(new_items)
def construct_canonical_collection(self):
start_item = Item(start_production, 0)
self.canonical_collection = [self.compute_closure([start_item])]
added = True
while added:
added = False
self.canonical_collection.append(new_items)
added = True
def construct_parsing_table(self):
symbol = item.production.body[item.dot_position]
if new_state in self.canonical_collection:
else:
for j, items2 in
enumerate(self.canonical_collection):
for symbol in
self.grammar.follow[item.production.head]:
if new_state in self.canonical_collection:
self.goto_table[(i, symbol)] =
self.canonical_collection.index(new_state)
def print_parsing_table(self):
print("LR(0) Parsing Table:")
print("Action Table:")
print("Goto Table:")
productions = [
Production('E', ['T']),
Production('T', ['F']),
Production('F', ['id'])
start_symbol = 'E'
grammar.compute_follow()
parser.construct_canonical_collection()
parser.construct_parsing_table()
parser.print_parsing_table()
OUTPUT:
EXPERIMENT - 9
AIM: Write a program to construct SLR(1) parsing table.
THEORY:
SLR Parser :
SLR is simple LR. It is the smallest class of grammar having few number of states. SLR is very
easy to construct and is similar to LR parsing. The only difference between SLR parser and
LR(0) parser is that in LR(0) parsing table, there’s a chance of ‘shift reduced’ conflict because
we are entering ‘reduce’ corresponding to all terminal states. We can solve this problem by
entering ‘reduce’ corresponding to FOLLOW of LHS of production in the terminating state. This
is called SLR(1) collection of items
Steps for constructing the SLR parsing table :
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-terminals] in the
parsing table.
PROGRAM:
class Production:
self.head = head
self.body = body
def __repr__(self):
class Item:
self.production = production
self.dot_position = dot_position
self.lookahead = lookahead or []
def __repr__(self):
body = self.production.body[:]
body.insert(self.dot_position, '•')
class SLRParser:
self.grammar = grammar
self.start_symbol = start_symbol
self.canonical_collection = []
self.action_table = {}
self.goto_table = {}
closure = items[:]
added = True
while added:
added = False
symbol_after_dot =
item.production.body[item.dot_position]
if symbol_after_dot in self.grammar.non_terminals:
for production in self.grammar.productions:
if production.head == symbol_after_dot:
new_item = Item(production, 0)
closure.append(new_item)
added = True
return closure
new_items = []
new_items.append(new_item)
return self.compute_closure(new_items)
def construct_canonical_collection(self):
self.canonical_collection = [self.compute_closure([start_item])]
added = True
while added:
added = False
for items in self.canonical_collection:
self.canonical_collection.append(new_items)
added = True
def construct_parsing_table(self):
symbol = item.production.body[item.dot_position]
if new_state in self.canonical_collection:
if new_state in self.canonical_collection:
self.goto_table[(i, symbol)] =
self.canonical_collection.index(new_state)
def print_parsing_table(self):
print("Action Table:")
print("Goto Table:")
productions = [
Production('E', ['T']),
Production('T', ['F']),
Production('F', ['id'])
parser.construct_canonical_collection()
parser.construct_parsing_table()
parser.print_parsing_table()
OUTPUT:
EXPERIMENT - 10
AIM: Write a program to construct LALR(1) parsing table.
THEORY:
LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large
classes of grammar. The size of CLR parsing table is quite large as compared to other parsing
table. LALR reduces the size of this table.LALR works similar to CLR. The only difference is , it
combines the similar states of CLR parsing table into one single state.
PROGRAM:
class Production:
self.head = head
self.body = body
def __repr__(self):
class Item:
self.production = production
self.dot_position = dot_position
def __repr__(self):
body = self.production.body[:]
body.insert(self.dot_position, '•')
class LALRParser:
self.grammar = grammar
self.start_symbol = start_symbol
self.canonical_collection = []
self.action_table = {}
self.goto_table = {}
closure = items[:]
added = True
while added:
added = False
symbol_after_dot =
item.production.body[item.dot_position]
if symbol_after_dot in self.grammar.non_terminals:
if production.head == symbol_after_dot:
new_item = Item(production, 0,
item.lookahead)
if new_item not in closure:
closure.append(new_item)
added = True
return closure
if item.dot_position == len(item.production.body) - 1:
return item.lookahead
else:
symbol_after_dot = item.production.body[item.dot_position + 1]
first_set = self.grammar.compute_first(symbol_after_dot)
lookahead = set()
if symbol != 'ε':
lookahead.add(symbol)
lookahead.add(terminal)
return lookahead
new_items = []
new_items.append(new_item)
return self.compute_closure(new_items)
def construct_canonical_collection(self):
self.canonical_collection = [self.compute_closure([start_item])]
added = True
while added:
added = False
self.canonical_collection.append(new_items)
added = True
def merge_states(self):
merged_states = {}
if core in merged_states:
merged_states[core].update(lookahead_sets)
else:
merged_states[core] = set(lookahead_sets)
self.canonical_collection = [self.compute_closure([Item(core[0],
core[1], lookahead)])
def construct_parsing_table(self):
symbol = item.production.body[item.dot_position]
if new_state in self.canonical_collection:
self.goto_table[(i, symbol)] =
self.canonical_collection.index(new_state)
def print_parsing_table(self):
print("Action Table:")
print("Goto Table:")
productions = [
Production('E', ['T']),
Production('T', ['F']),
Production('F', ['id'])
start_symbol = 'E'
grammar.compute_follow()
parser = LALRParser(grammar, start_symbol)
parser.construct_canonical_collection()
parser.merge_states()
parser.construct_parsing_table()
parser.print_parsing_table()
OUTPUT:
EXPERIMENT - 11
AIM: Write a program to construct CLR(1) parsing table.
THEORY:
The CLR parser stands for canonical LR parser.It is a more powerful LR parser.It makes use of
lookahead symbols. This method uses a large set of items called LR(1) items.The main
difference between LR(0) and LR(1) items is that, in LR(1) items, it is possible to carry more
information in a state, which will rule out useless reduction states.This extra information is
incorporated into the state by the lookahead symbol. The general syntax becomes [A->∝.B, a ]
where A->∝.B is the production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
PROGRAM:
class Production:
self.head = head
self.body = body
def __repr__(self):
class Item:
self.production = production
self.dot_position = dot_position
def __repr__(self):
body = self.production.body[:]
body.insert(self.dot_position, '•')
return f"{self.production.head} -> {' '.join(body)}, {',
'.join(self.lookahead)}"
class CLRParser:
self.grammar = grammar
self.start_symbol = start_symbol
self.canonical_collection = []
self.action_table = {}
self.goto_table = {}
closure = items[:]
added = True
while added:
added = False
symbol_after_dot =
item.production.body[item.dot_position]
if symbol_after_dot in self.grammar.non_terminals:
if production.head == symbol_after_dot:
new_item = Item(production, 0,
self.compute_lookahead(item))
added = True
return closure
if item.dot_position == len(item.production.body) - 1:
return item.lookahead
else:
symbol_after_dot = item.production.body[item.dot_position + 1]
first_set = self.grammar.compute_first(symbol_after_dot)
lookahead = set()
if symbol != 'ε':
lookahead.add(symbol)
lookahead.add(terminal)
return lookahead
new_items = []
return self.compute_closure(new_items)
def construct_canonical_collection(self):
self.canonical_collection = [self.compute_closure([start_item])]
added = True
while added:
added = False
self.canonical_collection.append(new_items)
added = True
def construct_parsing_table(self):
symbol = item.production.body[item.dot_position]
if new_state in self.canonical_collection:
self.goto_table[(i, symbol)] =
self.canonical_collection.index(new_state)
def print_parsing_table(self):
print("Action Table:")
print("Goto Table:")
productions = [
Production('E', ['T']),
Production('T', ['T', '*', 'F']),
Production('T', ['F']),
Production('F', ['id'])
start_symbol = 'E'
grammar.compute_follow()
parser.construct_canonical_collection()
parser.construct_parsing_table()
parser.print_parsing_table()
OUTPUT:
EXPERIMENT - 12
AIM: Write a program to construct operator precedence parser.
THEORY:
An operator precedence parser is a bottom-up parser that interprets an operator grammar. This
parser is only used for operator grammars. Ambiguous grammars are not allowed in any parser
except operator precedence parser. There are two methods for determining what precedence
relations should hold between a pair of terminals:
1. Use the conventional associativity and precedence of operator.
2. The second method of selecting operator-precedence relations is first to construct an
unambiguous grammar for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.
PROGRAM:
class OperatorPrecedenceParser:
def __init__(self):
self.precedence = {
'+': 1,
'-': 1,
'*': 2,
'/': 2,
'^': 3
stack = []
output = []
output.append(token)
stack.append(token)
output.append(stack.pop())
stack.pop()
else:
output.append(stack.pop())
stack.append(token)
while stack:
output.append(stack.pop())
return output
stack = []
if token.isalnum():
stack.append(int(token))
else:
operand2 = stack.pop()
operand1 = stack.pop()
if token == '+':
stack.append(operand1 + operand2)
stack.append(operand1 - operand2)
stack.append(operand1 * operand2)
stack.append(operand1 / operand2)
stack.append(operand1 ** operand2)
return stack[0]
parser = OperatorPrecedenceParser()
postfix = parser.parse_expression(expression)
result = parser.evaluate_postfix(postfix)
print("Result:", result)
OUTPUT: