Pdf&rendition 1
Pdf&rendition 1
ASSIGNMENT – 2
Submitted by
MAHALAKSHMI M (Reg.no.921722102075)
NOORJAHAN A (Reg.no.921722102113)
BACHELOR OF ENGINEERING
in
SOLUTION:
A syntax analyzer (parser) checks whether the input token stream from the lexical
analyzer follows the syntax rules defined by a context-free grammar (CFG).
Steps to Implement a Syntax Analyzer
1. Define the Grammar: Choose a CFG that represents the structure of the
programming language.
2. Build a Parser: Implement a parsing algorithm such as:
o Recursive Descent Parsing (Top-Down)
o LL(1) Parsing (Top-Down)
o LR Parsing (Bottom-Up)
3. Implement Parsing Logic: Use token sequences from the lexical analyzer to
check if the input follows the grammar.
4. Handle Errors: Detect and report syntax errors.
1
import re
# Token types
NUMBER, PLUS, MINUS, MUL, DIV, LPAREN, RPAREN, EOF = 'NUMBER',
'PLUS', 'MINUS', 'MUL', 'DIV', 'LPAREN', 'RPAREN', 'EOF'
# Tokenizer Class
class Lexer:
def init (self, text):
self.text = text
self.pos = 0
self.current_char = self.text[self.pos]
def advance(self):
"""Move to the next character."""
self.pos += 1
self.current_char = self.text[self.pos] if self.pos < len(self.text) else None
def skip_whitespace(self):
"""Ignore spaces."""
while self.current_char is not None and self.current_char.isspace():
self.advance()
def integer(self):
"""Extract a multi-digit number."""
result = ''
while self.current_char is not None and self.current_char.isdigit():
result += self.current_char
self.advance()
return int(result)
def get_next_token(self):
"""Tokenize the input string."""
while self.current_char is not None:
2
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return (NUMBER, self.integer())
if self.current_char == '+':
self.advance()
return (PLUS, '+')
if self.current_char == '-':
self.advance()
return (MINUS, '-')
if self.current_char == '*':
self.advance()
return (MUL, '*')
if self.current_char == '/':
self.advance()
return (DIV, '/')
if self.current_char == '(':
self.advance()
return (LPAREN, '(')
if self.current_char == ')':
self.advance()
return (RPAREN, ')')
raise Exception(f'Invalid character: {self.current_char}')
return (EOF, None)
3
Syntax Analyzer (Parser)
The parser will use recursive descent parsing to match the CFG rules.
class Parser:
def init (self, lexer):
self.lexer = lexer
self.current_token = self.lexer.get_next_token()
def factor(self):
"""Factor → '(' Expr ')' | NUMBER"""
if self.current_token[0] == NUMBER:
self.eat(NUMBER)
elif self.current_token[0] == LPAREN:
self.eat(LPAREN)
self.expr()
self.eat(RPAREN)
else:
raise Exception('Syntax error: expected NUMBER or "("')
4
def term(self):
"""Term → Factor Term'"""
self.factor()
while self.current_token[0] in (MUL, DIV):
self.eat(self.current_token[0]) self.factor()
def expr(self):
"""Expr → Term Expr'"""
self.term()
while self.current_token[0] in (PLUS, MINUS): self.eat(self.current_token[0])
self.term()
def parse(self):
"""Start parsing from Expr."""
self.expr()
if self.current_token[0] != EOF:
raise Exception('Syntax error: unexpected token at the end')
print("Parsing successful! The expression is syntactically correct.")
5
lexer = Lexer(text)
parser = Parser(lexer)
parser.parse()
except Exception as e:
print(f"Error: {e}")
How It Works
1. Lexical Analysis: The Lexer converts an input string (e.g., "3 + 5 * (2 - 8)")
into a sequence of tokens.
2. Parsing: The Parser processes these tokens using recursive descent parsing to
ensure the input follows the given grammar.
3. Error Handling: If there is a syntax error (e.g., 3 + * 5), the parser raises an
exception.
Example Runs
Valid Expressions
Enter an expression: 3 + 5 * (2 - 8)
Parsing successful! The expression is syntactically correct.
Enter an expression: (4 / 2) + 6
Parsing successful! The expression is syntactically correct.
Invalid Expression
Enter an expression: 3 + * 5
Error: Syntax error: expected NUMBER or "(", got MUL
6
Conclusion
This syntax analyzer successfully checks whether arithmetic expressions are
syntactically correct according to a defined context-free grammar (CFG)
using recursive descent parsing. It can be extended to support a full programming
language syntax.