0% found this document useful (0 votes)
30 views11 pages

Project CC

Uploaded by

usairashahbaz152
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Project CC

Uploaded by

usairashahbaz152
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 11

For your project, the grammar defines the syntax of your calculator language.

Below is a detailed
breakdown of the key components (keywords, operators, identifiers, numbers, strings, constants,
reserved words, etc.), including each token type and its regular expression (RE).

1. Keywords
Keywords are reserved words that have a specific meaning in the language. They cannot be used as
identifiers.
Keywords:
• func (used to define a function)
• return (used to return a value from a function)

• Print used to print

Token Type: KEYWORD


Regular Expression (RE):
func|return|print

2. Operators
Operators are symbols used for arithmetic and assignment operations.
Operators:
• Arithmetic: +, -, *, /
• Assignment: =
• Parentheses: (, )

Token Type: OPERATOR


Regular Expressions (RE):
• Addition: \+
• Subtraction: \-
• Multiplication: \*
• Division: /
• Assignment: =
• Left Parenthesis: \(
• Right Parenthesis: \)

3. Identifiers
Identifiers are names for variables, functions, or parameters. They must start with a letter or _ and
can be followed by letters, digits, or _.

Examples:
• x, myVar, _total, addNumbers
Token Type: IDENTIFIER
Regular Expression (RE):
[a-zA-Z_][a-zA-Z0-9_]*

4. Numbers
Numbers are numeric literals. They can be integers or floating-point numbers.
Examples:
• Integer: 42, 0
• Floating-point: 3.14, 0.001

Token Type: NUMBER


Regular Expression (RE):
\d+(\.\d+)? # Matches integers and floating-point numbers

5. Strings
Strings are sequences of characters enclosed in quotes. In your calculator language, you might not
have strings, but they can be added for functions like print().

Examples:
• "Hello"
• 'World'
Token Type: STRING
Regular Expression (RE):
"[^"]*"|'[^']*'

6. Constants
Constants are fixed values in the language, such as PI or E. These can be treated as predefined
identifiers.
Examples:
• PI = 3.14159
• E = 2.71828
Token Type: CONSTANT
Regular Expression (RE):
[a-zA-Z_][a-zA-Z0-9_]* # Same as identifiers, with a predefined list

7. Reserved Words
Reserved words are special terms in the language that have specific roles. Reserved words in your
project may include keywords or predefined constants.
Reserved Words:
• func, return,print

Token Type: RESERVED_WORD


Regular Expression (RE):
func|return\print

8. Delimiters
Delimiters are symbols used to separate components of the language.
Examples:
• , (comma to separate parameters in function definitions)
• {, } (braces to define blocks of code)

Token Type: DELIMITER


Regular Expressions (RE):
• Comma: ,
• Left Brace: {
• Right Brace: }

Summary of Tokens and Their Regular Expressions


Token Type Description
KEYWORD func, return, print
OPERATOR +, -, *, /, =, (, )
IDENTIFIER Variable/function names
NUMBER Integer or floating-point numbers
STRING Quoted string literals
CONSTANT Predefined values like PI, E
RESERVED_WORD Reserved words (same as keywords)
DELIMITER Commas, braces

Examples of Tokens in Code


Input Code:
func add(a, b) {
result = a + b
return result
}

Tokenized Output:

Token Type
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
return KEYWORD
result IDENTIFIER
} DELIMITER

Here’s the comprehensive output, BNF forms, and related documentation for your project:

BNF (Backus-Naur Form) Grammar for the Calculator Language


<program> ::= <function>+

<function> ::= "func" <identifier> "(" <parameters> ")" "{" <statements> "}"

<parameters> ::= <parameter> ("," <parameter>)*


<parameter> ::= <identifier>

<statements> ::= <statement>*


<statement> ::= <assignment> | <return_statement> | <print_statement>

<assignment> ::= <identifier> "=" <expression>


<return_statement> ::= "return" <expression>
<print_statement> ::= "print" "(" <expression> ")"

<expression> ::= <term> (("+" | "-") <term>)*


<term> ::= <factor> (("*" | "/") <factor>)*
<factor> ::= <identifier> | <number> | "(" <expression> ")"

<identifier> ::= [a-zA-Z_][a-zA-Z0-9_]*


<number> ::= [0-9]+(\.[0-9]+)?
Tokenized Output for Example Input Code
Input Code:
func add(a, b) {
result = a + b
return result
}

Tokenized Output:
Token Type
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
return KEYWORD
result IDENTIFIER
} DELIMITER
Tokenized Output with Error Detection
Input Code with Error:
func add(a, b) {
result = a + b
returns result
}

Output:
Token Type Error
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
returns IDENTIFIER Invalid keyword detected: returns
result IDENTIFIER
} DELIMITER
Extracted Regular Expressions (REs)
Token Type Regular Expression Purpose
KEYWORD `\b(func return
Matches arithmetic operators (+, -, *, /) and
OPERATOR [\+\-\*/=()]
assignment (=), as well as (, ).
\b[a-zA-Z_][a-zA-
IDENTIFIER Matches variable, function, and parameter names.
Z0-9_]*\b
NUMBER \d+(\.\d+)? Matches integers and floating-point numbers.
STRING `"[^"]*" '[^']*'`
CONSTANT `\b(PI E)\b`
DELIMITER [{},] Matches delimiters {, }, and ,.
SKIP [ \t]+ Matches and skips spaces and tabs.
NEWLINE \n Matches newline characters to track line numbers.
MISMATCH . Matches any single character not matching other
patterns (error handling).

Error Handling Rules


1. Invalid Identifiers:
• Identifiers containing reserved keywords are flagged as errors.
• Example: returnValue flagged if return is a reserved keyword.
2. Unknown Tokens:
• Any character or sequence not matching defined token types is flagged as an error.
• Example: @ is not valid and flagged.
3. Unmatched Delimiters:
• Parentheses, braces, or commas without a matching counterpart are flagged.
• Example: func add(a, b { flagged for missing ).

Examples of Errors
Code Error Description
returns result returns is not a valid keyword.
func add(a, b { Missing closing parenthesis ).
result = a ++ b Invalid operator ++ in the current grammar.
3abc = 5 Invalid identifier 3abc starts with a number.

This documentation fully supports your CC project, including grammar, tokens, BNF forms, and
error handling. Let me know if you need further enhancements!

import re

# Define token types and their corresponding regular expressions


TOKEN_SPECIFICATIONS = [
('KEYWORD', r'\b(func|return|print)\b'), # Keywords
('OPERATOR', r'[\+\-\*/=()]'), # Operators
('IDENTIFIER', r'\b[a-zA-Z_][a-zA-Z0-9_]*\b'), # Identifiers
('NUMBER', r'\d+(\.\d+)?'), # Numbers (integers and floats)
('STRING', r'"[^"]*"|\'[^\']*\''), # Strings
('CONSTANT', r'\b(PI|E)\b'), # Constants (predefined identifiers)
('DELIMITER', r'[{},]'), # Delimiters
('SKIP', r'[ \t]+'), # Skip spaces and tabs
('NEWLINE', r'\n'), # Newlines (to track line numbers)
('MISMATCH', r'.') # Any other character (error handling)
]

# Compile regular expressions into patterns


TOKEN_REGEX = '|'.join(f'(?P<{name}>{pattern})' for name, pattern in
TOKEN_SPECIFICATIONS)

# Define the lexical analyzer function with enhanced error handling


def tokenize_with_keyword_check(code):
tokens = []
errors = []
line_number = 1 # Start at the first line

for match in re.finditer(TOKEN_REGEX, code):


token_type = match.lastgroup
value = match.group(token_type)
if token_type == 'NEWLINE': # Increment line number for newlines
line_number += 1
elif token_type == 'SKIP': # Ignore spaces and tabs
continue
elif token_type == 'MISMATCH': # Handle invalid characters
errors.append(f"Error: Unexpected character '{value}' at line {line_number}")
elif token_type == 'IDENTIFIER' and any(keyword in value for keyword in ['func', 'return',
'print']):
errors.append(f"Error: Invalid identifier '{value}' containing a reserved keyword at line
{line_number}")
else:
tokens.append((token_type, value, line_number))

return tokens, errors

# Example input code


input_code = """
func add(a, b) {
result = a + b
returns result
}
"""

# Tokenize the input code


tokenized_output, error_list = tokenize_with_keyword_check(input_code)

# Print tokenized output


print("Tokenized Output:")
for token_type, value, line_number in tokenized_output:
print(f"{value} -> {token_type} (Line {line_number})")

# Print errors if any


if error_list:
print("\nErrors:")
for error in error_list:
print(error)
class ShiftReduceParser:
def __init__(self, grammar, start_symbol="<program>"):
self.grammar = grammar # Grammar rules
self.start_symbol = start_symbol # Start symbol of the grammar
self.stack = [] # Stack to hold symbols
self.input = [] # Tokens from input code
self.parse_tree = [] # List to hold the final parse tree
self.table = [] # Table to store Stack, Input, and Action for each step

def parse(self, input_tokens):


"""
Parses the input tokens using the Shift-Reduce technique and prints the table.
"""
self.input = input_tokens + ["$"] # Add the end of input symbol ($)
self.stack = [] # Reset the stack
self.parse_tree = [] # Reset the parse tree
self.table = [] # Reset the table
while self.input:
# Try to reduce
reduced = False
for rule in self.grammar:
rhs = rule[1]
rhs_len = len(rhs)
if len(self.stack) >= rhs_len and self.stack[-rhs_len:] == rhs:
# If the RHS matches the top of the stack, reduce
self.stack = self.stack[:-rhs_len] # Pop RHS from stack
self.stack.append(rule[0]) # Push LHS to stack
self.parse_tree.append(f"Reduced {rhs} to {rule[0]}")
self.table.append((list(self.stack), list(self.input), f"Reduced {rhs} to {rule[0]}"))
reduced = True
break
# If no reduction is possible, shift
if not reduced:
if self.input[0] == "$" and len(self.stack) == 1 and self.stack[0] == self.start_symbol:
self.table.append((list(self.stack), list(self.input), "Accept"))
print("Parsing completed successfully.")
return True
else:
# Shift the first input symbol to the stack
self.stack.append(self.input.pop(0)) # Move from input to stack
self.table.append((list(self.stack), list(self.input), f"Shifted {self.stack[-1]}"))
print("Error: Unable to parse input.")
return False

def print_parsing_table(self):
"""
Prints the parsing table.
"""
print(f"{'Step':<5}{'Stack':<40}{'Input':<40}{'Action':<40}")
for step, (stack, input_buffer, action) in enumerate(self.table):
print(f"{step+1:<5}{' '.join(stack):<40}{' '.join(input_buffer):<40}{action:<40}")

# Define the CFG rules as a list of tuples (LHS, RHS)


grammar = [
("<program>", ["<function>"]),
("<function>", ["func", "<identifier>", "(", "<parameters>", ")", "{", "<statements>",
"}"]),
("<parameters>", ["<parameter>"]),
("<parameters>", ["<parameter>", ",", "<parameters>"]),
("<parameter>", ["<identifier>"]),
("<statements>", ["<statement>"]),
("<statements>", ["<statement>", "<statements>"]),
("<statement>", ["<assignment>"]),
("<statement>", ["<return_statement>"]),
("<statement>", ["<print_statement>"]),
("<assignment>", ["<identifier>", "=", "<expression>"]),
("<return_statement>", ["return", "<expression>"]),
("<print_statement>", ["print", "(", "<expression>", ")"]),
("<expression>", ["<term>"]),
("<expression>", ["<term>", "+", "<expression>"]),
("<expression>", ["<term>", "-", "<expression>"]),
("<term>", ["<factor>"]),
("<term>", ["<factor>", "*", "<term>"]),
("<term>", ["<factor>", "/", "<term>"]),
("<factor>", ["<identifier>"]),
("<factor>", ["<number>"]),
("<factor>", ["(", "<expression>", ")"]),
("<identifier>", ["ID"]), # Token representation of identifier
("<number>", ["NUM"]), # Token representation of number
]

# Sample input for testing the parser (tokens for a function definition)
input_tokens = [
"func", "add", "ID", "(", "ID", ",", "ID", ")", "{", "ID", "=", "ID", "+", "ID", "return", "ID", "}"
]

# Create a ShiftReduceParser object


parser = ShiftReduceParser(grammar)

# Run the parser on the sample input


parser.parse(input_tokens)

# Print the parsing table


parser.print_parsing_table()

You might also like