0% found this document useful (0 votes)

29 views25 pages

03LexicalAndSyntaxAnalysis 1

Lexical and syntax analysis

Uploaded by

Mark Anthony Navarro Dano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views25 pages

03LexicalAndSyntaxAnalysis 1

Lexical and syntax analysis

Uploaded by

Mark Anthony Navarro Dano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Lexical and Syntax Analysis

Part I

1
Introduction

• Every implementation of Programming Languages (i.e. a compiler) uses a Lexical Analyzer and
a Syntax Analyzer in its initial stages.

• The Lexical Analyzer tokenizes the input program

• The syntax analyzer, referred to as a parser, checks for syntax of the input program and
generates a parse tree.

• Parsers almost always rely on a CFG that speciﬁes the syntax of the programs.

• In this section, we study the inner workings of Lexical Analyzers and Parsers

• The algorithms that go into building lexical analyzers and parsers rely on automata and
formal language theory that forms the foundations for these systems.

2
Lexemes and Tokens
• A lexical analyzer collects characters into groups (lexemes) and assigns
an internal code (a token) to each group.
Token Lexeme
• Lexemes are recognized by matching the input against patterns. IDENT result
ASSIGN =
• Tokens are usually coded as integer values, but for the sake of IDENT olds
readability, they are often referenced through named constants. SUB -
IDENT value
Example assignment statement (tokens/lexemes shown to the right): DIV /
INT_LIT 100
result = oldsum - value / 100; SEMI ;

• In earlier compilers, entire input used to be read by Lexical analyzer

and a ﬁle of tokens/lexemes produced. Modern day lexers provide the
next token when requested.
• Other tasks performed by Lexers: skip comments and white space;
Detect syntactic errors in tokens 3
Lexical Analysis (continued)
Approaches to building a lexical analyzer:

• Write a formal description of the token patterns of the language and use a software
tool such as PLY to automatically generate a lexical analyzer. We have seen this
earlier!

• Design a state transition diagram that describes the token patterns of the
language and write a program that implements the diagram. We will develop this in
this section.

• Design a state transition diagram that describes the token patterns of the language
and hand-construct a table-driven implementation of the state diagram.

A state transition diagram, or state diagram, is a directed graph. The nodes are
labeled with state names. The edges are labeled with input characters. An edge may
also include actions to be done when the transition is taken.

4
Lexical Analyzer: An implementation
• Consider the problem of building a Lexical Analyzer that recognizes lexemes that appear in
arithmetic expressions, including variable names and integers.

• Names consist of uppercase letters, lowercase letters, and digits, but must begin with a letter.
Names have no length limitations.

• To simplify the state transition diagram, we will treat all letters the same way; so instead of 52
transitions or edges, we will have just one edge labeled Letter. Similarly for digits, we will use the
label Digit.

• The following “actions” will be useful to visualize when thinking about the Lexical Analyzer:

getChar: read the next character from the input

addChar: add the character to the end of the lexeme being recognized

getNonBlank: skip white space

lookup: ﬁnd the token for single character lexemes

5
Lexical Analyzer: An implementation (continued)
A state diagram that recognizes names,
integer literals, parentheses, and arithmetic
operators.

Shows how to recognize one lexeme; The

process will be repeated until EOF.

The diagram includes actions on each edge.

Next, we will look at a Python program that

implements this state diagram to tokenize
arithmetic expressions.

6
Lexical Analyzer: An implementation (in Python; TokenTypes.py)

TokenTypes.py

import enum

class TokenTypes(enum.Enum):
LPAREN = 1
RPAREN = 2
ADD = 3
SUB = 4
MUL = 5
DIV = 6
ID = 7
INT = 8
EOF = 0

7
Lexical Analyzer: An implementation (Token.py)
Token.py

class Token:

def __init__(self,tok,value):
self._t = tok
self._c = value

def __str__(self):
if self._t.value == TokenTypes.ID.value:
return "<" + self._t + ":"+ self._c + ">"
elif self._t.value == TokenTypes.INT.value:
return "<" + self._c + ">"
else:
return self._t

def get_token(self):
return self._t

def get_value(self):
return self._c

8
Lexical Analyzer: An implementation (Lexer.py)
import sys
from TokenTypes import *
from Token import *
elif c == '+':
# Lexical analyzer for arithmetic expressions which result.append(Token(TokenTypes.ADD, "+"))
# include variable names and positive integer literals i = i + 1
# e.g. (sum + 47) / total elif c == '-':
result.append(Token(TokenTypes.SUB, "-"))
class Lexer: i = i + 1
elif c == '*':
def __init__(self,s): result.append(Token(TokenTypes.MUL, "*"))
self._index = 0 i = i + 1
self._tokens = self.tokenize(s) elif c == '/':
result.append(Token(TokenTypes.DIV, "/"))
def tokenize(self,s): i = i + 1
result = [] elif c in ' \r\n\t':
i = 0 i = i + 1
while i < len(s): continue
c = s[i] elif c.isdigit():
if c == '(': j = i
result.append(Token(TokenTypes.LPAREN, "(")) while j < len(s) and s[j].isdigit():
i = i + 1 j = j + 1
elif c == ')': result.append(Token(TokenTypes.INT,s[i:j]))
result.append(Token(TokenTypes.RPAREN, ")")) i = j
i = i + 1

9
Lexical Analyzer: An implementation (Lexer.py)
elif c.isalpha():
j = i
while j < len(s) and s[j].isalnum():
j = j + 1
result.append(Token(TokenTypes.ID,s[i:j]))
i = j
else:
print("UNEXPECTED CHARACTER ENCOUNTERED: "+c)
sys.exit(-1)
result.append(Token(TokenTypes.EOF, “-1"))
return result

def lex(self):
t = None
if self._index < len(self._tokens):
t = self._tokens[self._index]
self._index = self._index + 1
print("Next Token is: "+str(t.get_token())+", Next lexeme is "+t.get_value())
return t

10
Lexical Analyzer: An implementation (LexerTest.py)
LexerTest.py

from Lexer import *

from TokenTypes import *

def main():
input = "(sum + 47) / total"
lexer = Lexer(input)
print("Tokenizing ",end="")
print(input)
while True:
t = lexer.lex()
if t.get_token().value == TokenTypes.EOF.value:
break

main()

Go to live demo.
11
Lexical Analyzer: An implementation (Sample Run)

macbook-pro:handCodedLexerRecursiveDescentParser raj$ python3 LexerTest.py

Tokenizing (sum + 47) / total
Next Token is: TokenTypes.LPAREN, Next lexeme is (
Next Token is: TokenTypes.ID, Next lexeme is sum
Next Token is: TokenTypes.ADD, Next lexeme is +
Next Token is: TokenTypes.INT, Next lexeme is 47
Next Token is: TokenTypes.RPAREN, Next lexeme is )
Next Token is: TokenTypes.DIV, Next lexeme is /
Next Token is: TokenTypes.ID, Next lexeme is total
Next Token is: TokenTypes.EOF, Next lexeme is -1

12
Introduction to Parsing

• Syntax analysis is often referred to as parsing.

• A parser checks to see if the input program is syntactically correct and constructs a
parse tree.

• When an error is found, a parser must produce a diagnostic message and recover.
Recovery is required so that the compiler ﬁnds as many errors as possible.

• Parsers are categorized according to the direction in which they build the parse tree:

• Top-down parsers build the parse tree from the root downwards to the leaves.

• Bottom-up parsers build the parse tree from the leaves upwards to the root.

13
Notational Conventions

Terminal symbols — Lowercase letters at the beginning of the alphabet (a, b, …)

Nonterminal symbols — Uppercase letters at the beginning of the alphabet (A, B, …)

Terminals or nonterminals — Uppercase letters at the end of the alphabet (W, X, Y, Z)

Strings of terminals — Lowercase letters at the end of the alphabet (w, x, y, z)

Mixed strings (terminals and/or nonterminals) — Lowercase Greek letters (α, β, γ, δ)

14
Top-Down Parser

• A top-down parser traces or builds the parse tree in preorder: each node is visited before its branches
are followed.

• The actions taken by a top-down parser correspond to a leftmost derivation.

• Given a sentential form xAα that is part of a leftmost derivation, a top-down parser’s task is to ﬁnd the
next sentential form in that leftmost derivation.

• Determining the next sentential form is a matter of choosing the correct grammar rule that has A as its
left-hand side (LHS).

• If the A-rules are A → bB, A → cBb, and A → a, the next sentential form could be xbBα, xcBbα, or xaα.

• The most commonly used top-down parsing algorithms choose an A-rule based on the token that
would be the ﬁrst generated by A.

15
Top-Down Parser (continued)

• The most common top-down parsing algorithms are closely related:

• A recursive-descent parser is coded directly from the CFG description of the syntax
of a language.

• An alternative is to use a parsing table rather than code.

• Both are LL algorithms, and both are equally powerful. The first L in LL specifies a left-
to-right scan of the input; the second L specifies that a leftmost derivation is
generated.

• We will look at a hand-written recursive-descent parser later in this section (in Python).

16
Bottom-Up Parser
• A bottom-up parser constructs a parse tree by beginning at the leaves and progressing
toward the root. This parse order corresponds to the reverse of a rightmost derivation.

• Given a right sentential form α, a bottom-up parser must determine what substring of α is
the right-hand side (RHS) of the rule that must be reduced to its LHS to produce the
previous right sentential form.

• A given right sentential form may include more than one RHS from the grammar. The
correct RHS to reduce is called the handle. As an example, consider the following
grammar and derivation (shown twice):
S : aAc S => aAc => aaAc => aabc
A : aA
S => aAc => aaAc => aabc
A : b
• A bottom-up parser can easily find the first handle, b, since it is the only RHS of a rule. After
replacing b by the corresponding LHS, we get aaAc, the previous right sentential form.
Finding the next handle is more difficult because both aAc and aA are potential handles.
17
Bottom-Up Parser (continued)

• A bottom-up parser ﬁnds the handle of a given right sentential form by examining the
symbols on one or both sides of a possible handle.

• The most common bottom-up parsing algorithms are in the LR family. The L speciﬁes a
left-to-right scan and the R speciﬁes that a rightmost derivation is generated.

• Time Complexity

• Parsing algorithms that work for any grammar are ineﬃcient. The worst-case
complexity of common parsing algorithms is O(n3), making them impractical for use in
compilers.

• Faster algorithms work for only a subset of all possible grammars. These algorithms are
acceptable as long as they can parse grammars that describe programming languages.

• Parsing algorithms used in commercial compilers have complexity O(n).

18
Recursive-Descent Parsing

• A recursive-descent parser consists of a collection of functions, many of which are recursive; it produces
a parse tree in top-down order.

• A recursive-descent parser has one function for each nonterminal in the grammar.

• Consider the expression grammar below (written in EBNF - extended BNF notation):

<expr> : <term> {(+|-) <term>} These rules can be used to construct a recursive-descent
<term> : <factor> {(*|/) <factor>} function named expr that parses arithmetic expressions.

<factor> : ID | INT_CONSTANT |( <expr> )

The lexical analyzer is assumed to be a function named
lex. It reads a lexeme and puts its token code in the
global variable next_token. Token codes are deﬁned as
named constants.

19
Recursive-Descent Parsing (continued)
• Writing the recursive-descent functions are quite simple

• We assume two global variables: lexer_object and next_token; Initially the ﬁrst token is
retrieved into next_token and then the function for the start symbol is called:

import sys
from Lexer import *

next_token = None
l = None

def main():
global next_token
global l
l = Lexer(sys.argv[1])
next_token = l.lex()
expr()
if next_token.get_token().value == TokenTypes.EOF.value:
print(“PARSE SUCCEEDED”)
else:
print(“PARSE FAILED”)
20
Recursive-Descent Parsing (continued)
• The function for <expr> is shown below. For each terminal symbol on the RHS of the
rule, the current value of next_token is matched to that terminal and for each non-
terminal the corresponding function is called. When the function exits, it is made sure that
next_token contains the value of the next token beyond what matches <expr>

# expr
# Parses strings in the language generated by the rule:
# <expr> : <term> {(+|-) <term>}
def expr():
global next_token
global l
print("Enter <expr>")
term()
while next_token.get_token().value == TokenTypes.ADD.value or \
next_token.get_token().value == TokenTypes.SUB.value:
next_token = l.lex()
term()
print("Exit <expr>")
21
Recursive-Descent Parsing (continued)

The function for <term> is similar to the function for <expr>

# term
# Parses strings in the language generated by the rule:
# <term> : <factor> {(*|/) <factor>}
def term():
global next_token
global l
print("Enter <term>")
factor()
while next_token.get_token().value == TokenTypes.MUL.value or \
next_token.get_token().value == TokenTypes.DIV.value:
next_token = l.lex()
factor()
print("Exit <term>")

22
Recursive-Descent Parsing (continued)
# factor
# Parses strings in the language generated by the rules: def error(s):
# <factor> -> ID print("SYNTAX ERROR: "+s)
# <factor> -> INT_CONSTANT
# <factor> -> ( <expr> )
def factor():
global next_token
global l The function for <factor> checks to see if the
print("Enter <factor>") next_token matches ID or INT_CONSTANT; if
if next_token.get_token().value == TokenTypes.ID.value or \
next_token.get_token().value == TokenTypes.INT.value: matched, the function exits.
next_token = l.lex()
else: # if the RHS is ( <expr> ), pass over (, call expr, check for )
if next_token.get_token().value == TokenTypes.LPAREN.value:
next_token = l.lex()
expr()
if next_token.get_token().value == TokenTypes.RPAREN.value:
next_token = l.lex() otherwise it matches a left parenthesis, then
else: calls the function for <expr> and then matches
error("Expecting RPAREN")
the right parenthesis. This function also makes
sys.exit(-1)
else: sure next_token contains the next token
error("Expecting LPAREN") beyond the match for <factor>
sys.exit(-1)
print("Exit <factor>")
23
Show Demo
$ python3 Parser.py "(sum + 20)/30"
Next Token is: TokenTypes.LPAREN, Next lexeme is (
Enter <expr>
Enter <term>
Enter <factor>
Next Token is: TokenTypes.ID, Next lexeme is sum
Enter <expr>
Enter <term>
Enter <factor>
Next Token is: TokenTypes.ADD, Next lexeme is +
Recursive-Descent Parsing Exit <factor>
Exit <term>
Next Token is: TokenTypes.INT, Next lexeme is 20
Sample run: Enter <term>
Enter <factor>
Next Token is: TokenTypes.RPAREN, Next lexeme is )
Exit <factor>
Exit <term>
Exit <expr>
Next Token is: TokenTypes.DIV, Next lexeme is /
Exit <factor>
Next Token is: TokenTypes.INT, Next lexeme is 30
Enter <factor>
Next Token is: TokenTypes.EOF, Next lexeme is -1
Exit <factor>
Exit <term>
Exit <expr>
PARSE SUCCEEDED
24
Recursive-Descent Parsing: if-then-else stmt
<ifstmt> ! if ( <boolexpr> ) <statement> [else <statement>]

def ifstmt():
global next_token
global l
if next_token.get_token().value != TokenTypes.IF.value:
error(“Expecting IF”)
else:
next_token = l.lex()
if next_token.get_token().value != TokenTypes.LPAREN.value:
error(“Expecting LPAREN”)
else:
next_token = l.lex()
boolexpr()
if next_token.get_token().value != TokenTypes.RPAREN.value:
error(“Expecting RPAREN”)
else:
next_token = l.lex()
statement()
if next_token.get_token().value == TokenTypes.ELSE.value:
next_token = l.lex()
statement()
25

R16 Question Papers Flat
No ratings yet
R16 Question Papers Flat
12 pages
Compiler Design Lab
100% (1)
Compiler Design Lab
15 pages
Lecture - Note - Unit - 6 - Turing Machine
No ratings yet
Lecture - Note - Unit - 6 - Turing Machine
51 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
L4 Syntax-Analysis
No ratings yet
L4 Syntax-Analysis
50 pages
Unit-3 Knowledge Representation BTech MS N HI L14 L22 PDF
No ratings yet
Unit-3 Knowledge Representation BTech MS N HI L14 L22 PDF
91 pages
CH 04
No ratings yet
CH 04
46 pages
PL Özet (1,3,4)
No ratings yet
PL Özet (1,3,4)
8 pages
CH 4
No ratings yet
CH 4
46 pages
Experiment-1 Problem Definition
No ratings yet
Experiment-1 Problem Definition
28 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
8 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Lec 2
No ratings yet
Lec 2
30 pages
MMW Midterm Reviewer
No ratings yet
MMW Midterm Reviewer
10 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
Spccexp 3 New
No ratings yet
Spccexp 3 New
5 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Lecture 6 (Strings in Java)
No ratings yet
Lecture 6 (Strings in Java)
33 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
L3 FSM
No ratings yet
L3 FSM
20 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Pub - Elements of Computation Theory PDF
No ratings yet
Pub - Elements of Computation Theory PDF
428 pages
Unit 2
No ratings yet
Unit 2
61 pages
Compiler Construction Lexical Analysis
No ratings yet
Compiler Construction Lexical Analysis
63 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Classical Logic
No ratings yet
Classical Logic
3 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Three Grades of Modal Involvement (Quine)
No ratings yet
Three Grades of Modal Involvement (Quine)
19 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
CD Lab Manual
No ratings yet
CD Lab Manual
52 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
Elementary Java Programming
No ratings yet
Elementary Java Programming
214 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Coq Introduction Coq Introduction
No ratings yet
Coq Introduction Coq Introduction
35 pages
Properties of Context-Free Languages: Reading: Chapter 7
No ratings yet
Properties of Context-Free Languages: Reading: Chapter 7
61 pages
Java Programs
No ratings yet
Java Programs
7 pages
Automata and Compiler Design: D.Rahul
No ratings yet
Automata and Compiler Design: D.Rahul
638 pages
Introduction To Automata Theory
No ratings yet
Introduction To Automata Theory
14 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
AI 09 Inference in First Order Logic
No ratings yet
AI 09 Inference in First Order Logic
27 pages
On Neutrosophic Sets and Topology
No ratings yet
On Neutrosophic Sets and Topology
9 pages
CSCI-2400 Models of Computation
No ratings yet
CSCI-2400 Models of Computation
36 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
JSON
No ratings yet
JSON
4 pages
PHP - Arrays: Numeric Array
No ratings yet
PHP - Arrays: Numeric Array
4 pages
ALC Prev - 2023
No ratings yet
ALC Prev - 2023
2 pages
Ast
No ratings yet
Ast
82 pages
CD Question Bank
No ratings yet
CD Question Bank
44 pages
Certificate Declaration: Topic Name
No ratings yet
Certificate Declaration: Topic Name
16 pages
FR UB4 EVak AESawh
No ratings yet
FR UB4 EVak AESawh
41 pages
12.2 Resolution Theorem Proving
No ratings yet
12.2 Resolution Theorem Proving
32 pages
06 Kleene Theorem
No ratings yet
06 Kleene Theorem
3 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
Lab Session 1 - Lexical Analyzer
No ratings yet
Lab Session 1 - Lexical Analyzer
4 pages
Rolling Hash (Rabin-Karp Algorithm) : Objective
No ratings yet
Rolling Hash (Rabin-Karp Algorithm) : Objective
4 pages
AFL Syllabus CO PO Text Book
No ratings yet
AFL Syllabus CO PO Text Book
3 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet

03LexicalAndSyntaxAnalysis 1

Uploaded by

03LexicalAndSyntaxAnalysis 1

Uploaded by

Lexical and Syntax Analysis

• The Lexical Analyzer tokenizes the input program

• In earlier compilers, entire input used to be read by Lexical analyzer

getChar: read the next character from the input

getNonBlank: skip white space

lookup: ﬁnd the token for single character lexemes

Shows how to recognize one lexeme; The

The diagram includes actions on each edge.

Next, we will look at a Python program that

from Lexer import *

macbook-pro:handCodedLexerRecursiveDescentParser raj$ python3 LexerTest.py

• Syntax analysis is often referred to as parsing.

Terminal symbols — Lowercase letters at the beginning of the alphabet (a, b, …)

Nonterminal symbols — Uppercase letters at the beginning of the alphabet (A, B, …)

Terminals or nonterminals — Uppercase letters at the end of the alphabet (W, X, Y, Z)

Strings of terminals — Lowercase letters at the end of the alphabet (w, x, y, z)

Mixed strings (terminals and/or nonterminals) — Lowercase Greek letters (α, β, γ, δ)

• The actions taken by a top-down parser correspond to a leftmost derivation.

• The most common top-down parsing algorithms are closely related:

• An alternative is to use a parsing table rather than code.

• Parsing algorithms used in commercial compilers have complexity O(n).

<factor> : ID | INT_CONSTANT |( <expr> )

The function for <term> is similar to the function for <expr>

You might also like