Python Lex-Yacc: Language Tool For Python CS 550 Programming Languages
Python Lex-Yacc: Language Tool For Python CS 550 Programming Languages
Alexander Gutierrez
May 12, 2016
Python Lex-Yacc
● Python Lex-Yacc (PLY) is a version of lex and yacc written in the Python
interpreted programming language
● Uses the lexer to generate tokens to feed to the parser, thereby acting as a
compiler
2
Where to use?
● Since it is a tool that uses Python, you will need to install Python if you don’t
have it in your environment
● If you don’t want to bother with installing Python, tux already has it and PLY!
3
Python on tux.cs.drexel.edu
4
Using PLY on tux.cs.drexel.edu
$ ls
calc.py ply-3.8
5
The Bigger Picture
● Just like Flex/Bison, we can use PLY to (relatively) easily implement our own
programming language
● To do this, we need to write a python file that includes instruction manuals for
PLY
● For lex.py, we need to determine what tokens our language consists of and
how each token can be described using a regular expression
● For yacc.py, we need to create an LALR(1) grammar that takes these tokens
and executes code
● PLY will create both a lexer object and a parser object at run-time which we
can use as our compiler
6
Calculator Example
● The code for this example can be found included with PLY:
○ ply-3.8/example/calc/calc.py
● Yes, we can have both our lex and yacc definitions in the same file (though not
necessary)
● This example looks at simple arithmetic calculator
● First, we will look at the regular expressions we give to lex.py
● Next, we will look at the grammar we give to yacc.py
● Finally, we will run the code and test on input
7
calc.py > Part 1/2 of lex definitions
tokens = (
'NAME','NUMBER',
)
# Tokens
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t
def t_newline(t):
r'\n+'
t.lexer.lineno += t.value.count("\n")
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
9
calc.py > Part 1/4 of yacc definitions
precedence = (
('left','+','-'),
('left','*','/'),
('right','UMINUS'),
)
# dictionary of names
names = { }
10
calc.py > Part 2/4 of yacc definitions
def p_statement_assign(p):
'statement : NAME "=" expression'
names[p[1]] = p[3]
def p_statement_expr(p):
'statement : expression'
print(p[1])
def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression'''
if p[2] == '+' : p[0] = p[1] + p[3]
elif p[2] == '-': p[0] = p[1] - p[3]
elif p[2] == '*': p[0] = p[1] * p[3]
elif p[2] == '/': p[0] = p[1] / p[3]
def p_expression_uminus(p):
"expression : '-' expression %prec UMINUS"
p[0] = -p[2]
11
calc.py > Part 3/4 of yacc definitions
def p_expression_group(p):
"expression : '(' expression ')'"
p[0] = p[2]
def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]
def p_expression_name(p):
"expression : NAME"
try:
p[0] = names[p[1]]
except LookupError:
print("Undefined name '%s'" % p[1])
p[0] = 0
12
calc.py > Part 4/4 of yacc definitions
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOF")
while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
yacc.parse(s)
13
Multiple lexers/parsers
lexer = lex.lex()
parser = yacc.yacc()
while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
parser.parse(s,lexer)
14
Running on tux
$ ls
calc.py ply-3.8
● We can create and run our lexer and parser by simply invoking python on our
definitions file:
$ python calc.py
Generating LALR tables
calc >
● Since we have code that executes to take input, we are given the prompt that
we specified. Another thing to notice is that it created other files:
$ ls
calc.py parser.out parsetab.py ply-3.8
15
parser.out
● Debugging these conflicts is out of the scope of this presentation, but can
generally be solved from the understanding of LR parsing gained in this course
16
parsetab.py
● Mostly useful to prevent rerunning the entire construction process each time
we want to use our new language (remember python is interpreted, so without
this it would have to do compiler-compiling on every run)
● It uses some sort of hash and stores it in _lr_signature so that it can detect
if there was significant enough change to the parsing definitions to warrant
reconstruction
● Most of the time this will just be read directly the next time you run your parser
17
Using Our New Language
● We can test to make sure it works by running our definitions file and giving it
input:
$ python calc.py
calc > 3 * 5
15
calc > x=2-1
calc > x
1
calc > x+9
10
calc > 3 - + 2
Syntax error at '+'
2
calc >
18
Summary
19
Reference
● http://www.dabeaz.com/ply/
20