0% found this document useful (0 votes)
3 views6 pages

CSC 333-HW02

Uploaded by

madtecharch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

CSC 333-HW02

Uploaded by

madtecharch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

HW02 CSC -333 Code Report

1. Skip Initial White Spaces:


• Pseudocode:

skip any initial white space (spaces, tabs, and newlines)


• Implementation: The lexer uses a token specification for white spaces and
skips over them:

('SKIP', r'[ \t]+'), # Skip spaces and tabs


('NEWLINE', r'\n'), # Skip newlines
These tokens are checked using regular expressions, and no tokens are generated for
spaces, tabs, or newlines:

elif kind == 'SKIP':


continue # Ignore white spaces
elif kind == 'NEWLINE':
continue # Ignore newlines

2. Single Character Tokens:


• Pseudocode:

if cur.char ∈ ('(', ')', '+', '-', '*')


return the corresponding single-character token
• Implementation: Single-character tokens (like parentheses and arithmetic
operators) are matched with specific regex patterns and returned as tokens:

('OP', r'[+\-*/%]|\*\*'), # Numeric Operators


('LPAR', r'{'), # Left Parenthesis
('RPAR', r'}'), # Right Parenthesis
('LBRAC', r'\('), # Left Bracket
('RBRAC', r'\)'), # Right Bracket
('SEMICOLON', r';'), # Semicolon
('COMMA', r','), # Comma
Example return for single characters:

elif kind == 'LPAR':


token = f"Token: LPAR({value})"
elif kind == 'RPAR':
token = f"Token: RPAR({value})"

3. Handling Assignment (=):


• Pseudocode:

if cur.char = '='
read the next character
if it is '=' return the relational operator token
else return assign
• Implementation: The code uses regular expressions to detect both the
assignment operator (=) and the relational operator (==):

('ASSIGN', r'='), # Assignment Operator


('REL_OP', r'[<>]=?|!=|=='), # Relational Operators
For =, it checks if it's a part of a relational operator (==) or a simple assignment:

elif kind == 'ASSIGN':


token = f"Token: ASSIGN({value})"
elif kind == 'REL_OP':
token = f"Token: REL_OP({value})"

4. Handling Division and Comments (/, //):


• Pseudocode:

if cur.char = '/'
peek at the next character
if it is '*' or '/'
read additional characters until "*/" or newline is seen, respectively
• Implementation: The lexer recognizes the division operator (/) and handles
comments using a regular expression for comments starting with #:

('COMMENT', r'#.*'), # Comments starting with #


('OP', r'[+\-*/%]|\*\*'), # Numeric Operators including division `/`
Comments are ignored:

elif kind == 'COMMENT':


continue # Ignore comments

5. Numbers (Integer and Floating-point):


• Pseudocode:

if cur.char is a digit
read any additional digits and at most one decimal point
return number
• Implementation: The lexer checks for integers and floating-point numbers
using two separate regular expressions:

('FLOAT', r'\d+\.\d{1,3}'), # Floating-point numbers


('INTEGER', r'\d+'), # Integer numbers
The floating-point regex ensures that up to 3 decimal places are allowed. Tokens are
returned as either FLOAT or INTEGER:

elif kind == 'FLOAT':


token = f"Token: FLOAT({value})"
elif kind == 'INTEGER':
token = f"Token: INTEGER({value})"
6. Identifiers and Keywords:
• Pseudocode:

if cur.char is a letter
read any additional letters and digits
check to see whether the resulting string is a keyword
if so, return the corresponding token
else return id
• Implementation: Identifiers and keywords are handled by a regular expression
that matches letters followed by letters, digits, or underscores. The lexer checks if
the matched token is a keyword:

('ID', r'[A-Za-z_][A-Za-z_0-9]*'), # Identifiers


('KEYWORD', r'\b(if|else|while|break|read|write|function|return)\b'), # Keywords
If the token is a keyword, it returns a KEYWORD token. Otherwise, it returns an ID:

elif kind == 'ID':


token = f"Token: ID({value})"
elif kind == 'KEYWORD':
token = f"Token: KEYWORD({value})"

7. Error Handling (Unknown Characters):


• Pseudocode:
else announce an error
• Implementation: The lexer handles any characters that do not match a valid
token by returning an error message for unknown tokens:

('MISMATCH', r'.'), # Any other character


If an unknown character is found, the lexer generates an error token:

elif kind == 'MISMATCH':


token = f"@ unknown({value})"

The lexical analyzer is designed to follow the logic described in the pseudocode. It
systematically processes the input character by character, using regular expressions to
detect tokens and handle errors. Each token is processed as per its type, and results are
printed to the console and saved to a file.
This lexer can be extended easily for additional language features by adding more token
patterns and modifying the state transitions.

You might also like