0% found this document useful (0 votes)
109 views20 pages

Python Lex-Yacc: Language Tool For Python CS 550 Programming Languages

PLY (Python Lex-Yacc) is a tool that allows users to define their own programming languages by specifying token definitions using regular expressions and grammars. It generates a lexer and parser from these specifications to act as a compiler for the language. The document provides an example of defining a simple calculator language with PLY and running it on the tux server, which already has PLY installed. It describes writing token and grammar rules, generating parsing tables, and testing the language implementation.

Uploaded by

Saif Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views20 pages

Python Lex-Yacc: Language Tool For Python CS 550 Programming Languages

PLY (Python Lex-Yacc) is a tool that allows users to define their own programming languages by specifying token definitions using regular expressions and grammars. It generates a lexer and parser from these specifications to act as a compiler for the language. The document provides an example of defining a simple calculator language with PLY and running it on the tux server, which already has PLY installed. It describes writing token and grammar rules, generating parsing tables, and testing the language implementation.

Uploaded by

Saif Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Python Lex-Yacc

Language Tool for Python


CS 550 Programming Languages

Alexander Gutierrez
May 12, 2016
Python Lex-Yacc

● Python Lex-Yacc (PLY) is a version of lex and yacc written in the Python
interpreted programming language

● Attempts to be a faithful recreation of lex and yacc

● It reads regular expressions to define tokens in order to create a lexer like in


lex

● It reads an LALR(1) grammar and associated rule actions to create a parser

● Uses the lexer to generate tokens to feed to the parser, thereby acting as a
compiler

2
Where to use?

● Download PLY from their website:


○ http://www.dabeaz.com/ply/
● The latest version (ply-3.8) works best on Python 2.6+ or Python 3.0+

● Since it is a tool that uses Python, you will need to install Python if you don’t
have it in your environment

● Versions of ply at 3.0 or above (ply-3.0+) support both Python 2 or Python 3


(both are maintained versions of the Python programming language with some
differences)

● If you don’t want to bother with installing Python, tux already has it and PLY!

3
Python on tux.cs.drexel.edu

● Both Python 2.7.6 and Python 3.4.3 are available on tux


● Invoking the Python 2.7.6 interpreter:
○ Command name: python (or python2)
○ Both of these are symlinks. The interpreter lives at /usr/bin/python2.7

● Invoking the Python 3.4.3 interpreter:


○ Command name: python3
○ Also a symlink. This interpreter lives at /usr/bin/python3.4

4
Using PLY on tux.cs.drexel.edu

● tux already has PLY configured! I will cover it anyway.


● Download the latest version of PLY
○ http://www.dabeaz.com/ply/
● Extract the archive and you will get a directory called ply-3.8, put this
wherever you want
● In this directory, the py lex and py yacc live at
○ ply-3.8/ply/lex.py
○ ply-3.8/ply/yacc.py
● We will be importing these as python modules
● As for your token and grammar file(s), I suggest simply placing them in the
same directory that contains ply-3.8
● My working directory looks like this:

$ ls
calc.py ply-3.8

5
The Bigger Picture

● Just like Flex/Bison, we can use PLY to (relatively) easily implement our own
programming language

● To do this, we need to write a python file that includes instruction manuals for
PLY

● For lex.py, we need to determine what tokens our language consists of and
how each token can be described using a regular expression

● For yacc.py, we need to create an LALR(1) grammar that takes these tokens
and executes code

● PLY will create both a lexer object and a parser object at run-time which we
can use as our compiler

6
Calculator Example

● The code for this example can be found included with PLY:
○ ply-3.8/example/calc/calc.py

● Yes, we can have both our lex and yacc definitions in the same file (though not
necessary)
● This example looks at simple arithmetic calculator
● First, we will look at the regular expressions we give to lex.py
● Next, we will look at the grammar we give to yacc.py
● Finally, we will run the code and test on input

7
calc.py > Part 1/2 of lex definitions

tokens = (
'NAME','NUMBER',
)

literals = ['=','+','-','*','/', '(',')']

-- ALTERNATIVE -- (note: literals checked lastly in matching)


tokens = (
'NAME','NUMBER',
'PLUS','MINUS','TIMES','DIVIDE','EQUALS',
'LPAREN','RPAREN',
)
# Tokens
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_EQUALS = r'='
t_LPAREN = r'\('
t_RPAREN = r'\)'
8
calc.py > Part 2/2 of lex definitions

# Tokens
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t

t_ignore = " \t"

def t_newline(t):
r'\n+'
t.lexer.lineno += t.value.count("\n")

def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)

# Build the lexer


import ply.lex as lex
lex.lex()

9
calc.py > Part 1/4 of yacc definitions
precedence = (
('left','+','-'),
('left','*','/'),
('right','UMINUS'),
)

# dictionary of names
names = { }

10
calc.py > Part 2/4 of yacc definitions
def p_statement_assign(p):
'statement : NAME "=" expression'
names[p[1]] = p[3]

def p_statement_expr(p):
'statement : expression'
print(p[1])

def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression'''
if p[2] == '+' : p[0] = p[1] + p[3]
elif p[2] == '-': p[0] = p[1] - p[3]
elif p[2] == '*': p[0] = p[1] * p[3]
elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
"expression : '-' expression %prec UMINUS"
p[0] = -p[2]

11
calc.py > Part 3/4 of yacc definitions
def p_expression_group(p):
"expression : '(' expression ')'"
p[0] = p[2]

def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]

def p_expression_name(p):
"expression : NAME"
try:
p[0] = names[p[1]]
except LookupError:
print("Undefined name '%s'" % p[1])
p[0] = 0

12
calc.py > Part 4/4 of yacc definitions
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOF")

import ply.yacc as yacc


yacc.yacc()

while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
yacc.parse(s)

13
Multiple lexers/parsers
lexer = lex.lex()
parser = yacc.yacc()

while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
parser.parse(s,lexer)

14
Running on tux

● My working directory looks like this:

$ ls
calc.py ply-3.8

● We can create and run our lexer and parser by simply invoking python on our
definitions file:

$ python calc.py
Generating LALR tables
calc >

● Since we have code that executes to take input, we are given the prompt that
we specified. Another thing to notice is that it created other files:

$ ls
calc.py parser.out parsetab.py ply-3.8

15
parser.out

● This is a helpful file we can use in debugging.


● It is generated when we create our parser, but does not contain any code

● It is simply a debug output that expresses the grammar that yacc.py


understood

● This can be useful if you have shift/reduce and reduce/reduce conflicts

● The file contains a pretty-printed grammar (your grammar, hopefully), terminals


and nonterminals, and the states that the machine enters

● Debugging these conflicts is out of the scope of this presentation, but can
generally be solved from the understanding of LR parsing gained in this course

16
parsetab.py

● This file contains the parsing table used by your parser


● This is also generated when we create our parser
● Do not edit this file

● Mostly useful to prevent rerunning the entire construction process each time
we want to use our new language (remember python is interpreted, so without
this it would have to do compiler-compiling on every run)

● It uses some sort of hash and stores it in _lr_signature so that it can detect
if there was significant enough change to the parsing definitions to warrant
reconstruction

● Most of the time this will just be read directly the next time you run your parser

17
Using Our New Language

● We can test to make sure it works by running our definitions file and giving it
input:
$ python calc.py
calc > 3 * 5
15
calc > x=2-1
calc > x
1
calc > x+9
10
calc > 3 - + 2
Syntax error at '+'
2
calc >

18
Summary

● Use PLY on tux (already installed and configured)

● Design your own language by creating tokenization instructions via regular


expressions and a grammar

● Implement the language by giving PLY these instructions to generate a lexical


analyzer and parser respectively through the use of python

19
Reference

PLY (Python Lex-Yacc)

● http://www.dabeaz.com/ply/

20

You might also like