Project CC
Project CC
Below is a detailed
breakdown of the key components (keywords, operators, identifiers, numbers, strings, constants,
reserved words, etc.), including each token type and its regular expression (RE).
1. Keywords
Keywords are reserved words that have a specific meaning in the language. They cannot be used as
identifiers.
Keywords:
• func (used to define a function)
• return (used to return a value from a function)
2. Operators
Operators are symbols used for arithmetic and assignment operations.
Operators:
• Arithmetic: +, -, *, /
• Assignment: =
• Parentheses: (, )
3. Identifiers
Identifiers are names for variables, functions, or parameters. They must start with a letter or _ and
can be followed by letters, digits, or _.
Examples:
• x, myVar, _total, addNumbers
Token Type: IDENTIFIER
Regular Expression (RE):
[a-zA-Z_][a-zA-Z0-9_]*
4. Numbers
Numbers are numeric literals. They can be integers or floating-point numbers.
Examples:
• Integer: 42, 0
• Floating-point: 3.14, 0.001
5. Strings
Strings are sequences of characters enclosed in quotes. In your calculator language, you might not
have strings, but they can be added for functions like print().
Examples:
• "Hello"
• 'World'
Token Type: STRING
Regular Expression (RE):
"[^"]*"|'[^']*'
6. Constants
Constants are fixed values in the language, such as PI or E. These can be treated as predefined
identifiers.
Examples:
• PI = 3.14159
• E = 2.71828
Token Type: CONSTANT
Regular Expression (RE):
[a-zA-Z_][a-zA-Z0-9_]* # Same as identifiers, with a predefined list
7. Reserved Words
Reserved words are special terms in the language that have specific roles. Reserved words in your
project may include keywords or predefined constants.
Reserved Words:
• func, return,print
8. Delimiters
Delimiters are symbols used to separate components of the language.
Examples:
• , (comma to separate parameters in function definitions)
• {, } (braces to define blocks of code)
Tokenized Output:
Token Type
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
return KEYWORD
result IDENTIFIER
} DELIMITER
Here’s the comprehensive output, BNF forms, and related documentation for your project:
<function> ::= "func" <identifier> "(" <parameters> ")" "{" <statements> "}"
Tokenized Output:
Token Type
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
return KEYWORD
result IDENTIFIER
} DELIMITER
Tokenized Output with Error Detection
Input Code with Error:
func add(a, b) {
result = a + b
returns result
}
Output:
Token Type Error
func KEYWORD
add IDENTIFIER
( DELIMITER
a IDENTIFIER
, DELIMITER
b IDENTIFIER
) DELIMITER
{ DELIMITER
result IDENTIFIER
= OPERATOR
a IDENTIFIER
+ OPERATOR
b IDENTIFIER
returns IDENTIFIER Invalid keyword detected: returns
result IDENTIFIER
} DELIMITER
Extracted Regular Expressions (REs)
Token Type Regular Expression Purpose
KEYWORD `\b(func return
Matches arithmetic operators (+, -, *, /) and
OPERATOR [\+\-\*/=()]
assignment (=), as well as (, ).
\b[a-zA-Z_][a-zA-
IDENTIFIER Matches variable, function, and parameter names.
Z0-9_]*\b
NUMBER \d+(\.\d+)? Matches integers and floating-point numbers.
STRING `"[^"]*" '[^']*'`
CONSTANT `\b(PI E)\b`
DELIMITER [{},] Matches delimiters {, }, and ,.
SKIP [ \t]+ Matches and skips spaces and tabs.
NEWLINE \n Matches newline characters to track line numbers.
MISMATCH . Matches any single character not matching other
patterns (error handling).
Examples of Errors
Code Error Description
returns result returns is not a valid keyword.
func add(a, b { Missing closing parenthesis ).
result = a ++ b Invalid operator ++ in the current grammar.
3abc = 5 Invalid identifier 3abc starts with a number.
This documentation fully supports your CC project, including grammar, tokens, BNF forms, and
error handling. Let me know if you need further enhancements!
import re
def print_parsing_table(self):
"""
Prints the parsing table.
"""
print(f"{'Step':<5}{'Stack':<40}{'Input':<40}{'Action':<40}")
for step, (stack, input_buffer, action) in enumerate(self.table):
print(f"{step+1:<5}{' '.join(stack):<40}{' '.join(input_buffer):<40}{action:<40}")
# Sample input for testing the parser (tokens for a function definition)
input_tokens = [
"func", "add", "ID", "(", "ID", ",", "ID", ")", "{", "ID", "=", "ID", "+", "ID", "return", "ID", "}"
]