Compiler Design File
Compiler Design File
COMMUNICATION TECHNOLOGY
Introduction-
Some of the most time consuming and tedious parts of writing a compiler involve the
lexical scanning and syntax analysis. Luckily there is freely available software to assist in
these functions. While they will not do everything for you, they will enable faster
implementation of the basic functions. Lex and Yacc are the most commonly used
packages with Lex managing the token recognition and Yacc handling the syntax. They
work well together, but conceivably can be used individually as well.
Both operate in a similar manner in which instructions for token recognition or grammar
are written in a special file format. The text files are then read by lex and/or yacc to
produce c code. This resulting source code is compiled to make the final application. In
practice the lexical instruction file has a “.l” suffix and the grammar file has a “.y” suffix.
LEX-
The file format for a lex file consists of (4) basic sections
•The first is an area for c code that will be place verbatim at the beginning of the generated
source code. Typically is will be used for things like #include, #defines, and variable
declarations.
• The next section is for definitions of token types to be recognized. These are not
mandatory, but in general makes the next section easier to read and shorter.
• The third section set the pattern for each token that is to be recognized, and can also
include c code to be called when that token is identified
• The last section is for more c code (generally subroutines) that will be appended to the
end of the generated c code. This would typically include a main function if lex is to be
used by itself.
• The format is applied as follows (the use and placement of the % symbols are
necessary):
%{
//header c code
%}
//definitions
%%
//rules
%%
//subroutines
YACC-
The format for a yacc file is similar, but includes a few extras.
• The first area (preceded by a %token) is a list of terminal symbols. You do not need to
list single character ASCII symbols, but anything else including multiple ASCII symbols
need to be in this list (i.e. “==”).
• The next is an area for c code that will be place verbatim at the beginning of the
generated source code. Typically is will be used for things like #include, #defines, and
variable declarations.
• The next section is for definitions - none of the following examples utilize this area
• The fourth section set the pattern for each token that is to be recognized, and can also
include c code to be called when that token is identified
• The last section is for more c code (generally subroutines) that will be appended to the
end of the generated c code. This would typically include a main function if lex is to be
used by itself.
• The format is applied as follows (the use and placement of the % symbols are
necessary):
2
These formats and general usage will be covered in greater detail in the following (4)
sections. In general it is best not to modify the resulting c code as it is overwritten each
time lex or yacc is run. Most desired functionality can be handled within the lexical and
grammar files, but there are some things that are difficult to achieve that may require
editing of the c file.
As a side note, the functionality of these programs has been duplicated by the GNU open
source projects Flex and Bison. These can be used interchangeably with Lex and Yacc for
everything this document will cover and most other uses as well.
3
def is_valid_string(input_string):
stack = []
for char in input_string:
if char == 'a':
stack.append('a')
elif char == 'b':
if not stack:
return False # 'b' without corresponding 'a'
stack.pop()
else:
return False # Invalid character
return not stack # True if stack is empty (equal 'a' and 'b' counts)
Output-
4
class Node:
def __init__(self, value):
self.value = value
self.children = []
def generate_parse_tree(expression):
tokens = expression.replace(" ", "") # Remove spaces
root = Node("Expression")
current_node = root
stack = [root]
return root
# Example usage:
expression = "3 + 4 * (5 + 2)"
parse_tree = generate_parse_tree(expression)
print_parse_tree(parse_tree)
5
OUTPUT-
6
def find_leading_terminals(grammar):
leading_terminals = {}
return leading_terminals
# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab', 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}
leading_terminals = find_leading_terminals(grammar)
print("Leading Terminals:")
for non_terminal, terminals in leading_terminals.items():
print(f"{non_terminal}: {terminals}")
OUTPUT-
7
def find_trailing_terminals(grammar):
trailing_terminals = {}
return trailing_terminals
# Example usage:
# Grammar: S -> aA | bB | c, A -> d, B -> e
grammar = {
'S': ['aA', 'bB', 'c'],
'A': ['d'],
'B': ['e']
}
trailing_terminals = find_trailing_terminals(grammar)
print("Trailing Terminals:")
for non_terminal, terminals in trailing_terminals.items():
print(f"{non_terminal}: {terminals}")
OUTPUT-
8
def compute_first_sets(grammar):
first_sets = {}
return first_sets
return first_set
# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab’, 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}
first_sets = compute_first_sets(grammar)
print("FIRST Sets:")
for non_terminal, first_set in first_sets.items():
print(f"{non_terminal}: {first_set}")
9
OUTPUT-
10
while True:
prev_follow_sets = {non_terminal: set(follow_set) for
non_terminal, follow_set in follow_sets.items()}
if '' in first_of_remaining:
first_of_remaining.remove('') # Remove
epsilon
follow_sets[symbol] |= first_of_remaining
return follow_sets
else:
break
return first_set
return first_set
# Example usage:
# Grammar: S -> Ab | Bc | d, A -> a, B -> b
grammar = {
'S': ['Ab', 'Bc', 'd'],
'A': ['a'],
'B': ['b']
}
start_symbol = 'S'
print("FOLLOW Sets:")
for non_terminal, follow_set in follow_sets.items():
print(f"{non_terminal}: {follow_set}")
OUTPUT-