Write Compiler
Write Compiler
(with PLY)
Dave Beazley
http://www.dabeaz.com
• What is a compiler?
• A program that processes other programs
• Typically implements a programming lang.
• Examples:
• gcc, javac, SWIG, Doxygen, Python
Compiler Design
in lexing parsing typecheck codegen out
Illegal Character
Parsing
• Makes sure input is structurally correct
b = 40 + 20*(2+3)/37.5
NUM NUM
Parsing
• Detects syntax errors
b = 40 + “hello” (Syntax OK)
b = 3 * 4 7 / (Syntax error)
• Example: + operator
+
LHS RHS
2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
* 37.5
20 +
2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
20 +
2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
MUL R2, R3, R2 ; R2 = 20*(2+3)
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
MUL R2, R3, R2 ; R2 = 20*(2+3)
LOAD R3, 37.5
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
MUL R2, R3, R2 ; R2 = 20*(2+3)
LOAD R3, 37.5
DIV R2, R3, R2 ; R2 = 20*(2+3)/37.5
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
MUL R2, R3, R2 ; R2 = 20*(2+3)
LOAD R3, 37.5
DIV R2, R3, R2 ; R2 = 20*(2+3)/37.5
ADD R1, R2, R1 ; R1 = 40+20*(2+3)/37.5
Code Generation
• Processing the parse tree in some way
• Usually a traversal of the parse tree
b = 40 + 20*(2+3)/37.5 +
40 /
LOAD R1, 40
LOAD R2, 20 * 37.5
LOAD R3, 2 20 +
LOAD R4, 3
ADD R3, R4, R3 ; R3 = (2+3) 2 3
MUL R2, R3, R2 ; R2 = 20*(2+3)
LOAD R3, 37.5
DIV R2, R3, R2 ; R2 = 20*(2+3)/37.5
ADD R1, R2, R1 ; R1 = 40+20*(2+3)/37.5
STORE R1, “b”
Comments
lex
scanner.c
Lex/Yacc Big Picture
scanner.l parser.y
token grammar
specification specification
lex yacc
scanner.c parser.c
Lex/Yacc Big Picture
scanner.l parser.y
token grammar
specification specification
lex yacc
scanner.c parser.c
lex yacc
scanner.c parser.c
mycompiler
Lex/Yacc Comments
• Code generators
• Create a parser from a specification
• Classic versions create C code.
• Variants target other languages
PLY
• Python Lex-Yacc
• 100% Python version of lex/yacc toolset
• History:
- Late 90’s. “Write don’t you rewrite SWIG in Python?”
- 2000 : “No! Now stop bugging me about it!”
- 2001 : Dave teaches a compilers course at UofC. An experiment.
Students write a compiler in Python.
- 2001 : PLY-1.0 developed and released.
- 2002 - 2005 : Occasional maintenance and bug fixes.
- 2006 : Major update to PLY-2.x (in progress).
def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t
def t_NUMBER(t):
r’\d+’
t.value = int(t.value)
return t
def p_assign(p):
‘’’assign : NAME EQUALS expr’’’
def p_expr(p):
‘’’expr : expr PLUS term
| expr MINUS term
| term’’’
def p_term(p):
‘’’term : term TIMES factor
| term DIVIDE factor
| factor’’’
def p_factor(p):
‘’’factor : NUMBER’’’
stack input
X = 3 + 4 * 5 $end
Action:
stack input
X = 3 + 4 * 5 $end
Action:
stack input
NAME
= 3 + 4 * 5 $end
‘X’
Action: shift
stack input
NAME
= 3 + 4 * 5 $end
‘X’
Action:
stack input
NAME
= 3 + 4 * 5 $end
‘X’
Action:
stack input
NAME EQUALS
3 + 4 * 5 $end
‘X’ ‘=’
Action: shift
stack input
NAME EQUALS
3 + 4 * 5 $end
‘X’ ‘=’
Action:
stack input
NAME EQUALS
3 + 4 * 5 $end
‘X’ ‘=’
Action:
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
Action: shift
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
def t_NUMBER(t):
Action: shift r’\d+’
t.value = int(t.value)
return t
Grammar PLY Rules
(1) assign : NAME EQUALS expr -> p_assign(p)
(2) expr : expr PLUS term -> p_expr(p)
(3) | expr MINUS term
(4) | term
(5) term : term TIMES factor -> p_term(p)
(6) | term DIVIDE factor
(7) | factor
(8) factor : NUMBER -> p_factor(p)
LR Example: Step 4
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
Action:
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
Action:
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ None
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ None
This is None because
Action: reduce using rule 8
p_factor() didn’t do anything.
More later.
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ None
Action:
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ None
Action:
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ None
stack input
NAME EQUALS term
+ 4 * 5 $end
‘X’ ‘=’ None
stack input
NAME EQUALS term
+ 4 * 5 $end
‘X’ ‘=’ None
Action:
stack input
NAME EQUALS term
+ 4 * 5 $end
‘X’ ‘=’ None
Action:
stack input
NAME EQUALS term
+ 4 * 5 $end
‘X’ ‘=’ None
stack input
NAME EQUALS expr
+ 4 * 5 $end
‘X’ ‘=’ None
stack input
NAME EQUALS expr
+ 4 * 5 $end
‘X” ‘=’ None
Action:
stack input
NAME EQUALS expr
+ 4 * 5 $end
‘X’ ‘=’ None
Action: ????
stack input
NAME EQUALS expr
+ 4 * 5 $end
‘X’ ‘=’ None
Action: shift
stack input
NAME EQUALS expr PLUS
4 * 5 $end
‘X’ ‘=’ None ‘+’
Action: shift
stack input
NAME EQUALS expr PLUS NUMBER
4 * 5 $end
‘X’ ‘=’ None ‘+’
Action: shift
stack input
NAME EQUALS expr PLUS NUMBER
* 5 $end
‘X’ ‘=’ None ‘+’ 4
stack input
NAME EQUALS expr PLUS factor
* 5 $end
‘X’ ‘=’ None ‘+’ None
stack input
NAME EQUALS expr PLUS term TIMES
* 5 $end
‘X’ ‘=’ None ‘+’ None
Action: shift
stack input
NAME EQUALS expr PLUS term TIMES NUMBER
5 $end
‘X’ ‘=’ None ‘+’ None ‘*’
Action: shift
stack input
NAME EQUALS expr PLUS term TIMES NUMBER
$end
‘X’ ‘=’ None ‘+’ None ‘*’ 5
stack input
NAME EQUALS expr PLUS term TIMES factor
$end
‘X’ ‘=’ None ‘+’ None ‘*’ None
stack input
NAME EQUALS expr PLUS term
$end
‘X’ ‘=’ None ‘+’ None
stack input
NAME EQUALS expr
$end
‘X’ ‘=’ None
stack input
assign
$end
None
Action: Done.
def p_expr_plus(p):
‘’’expr : expr PLUS term’’’
p[0] = p[1] + p[3]
def p_term_mul(p):
‘’’term : term TIMES factor’’’
p[0] = p[1] * p[3]
def p_factor(p):
‘’’factor : NUMBER’’’
p[0] = p[1]
LR Example: Step 4
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
Action:
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
Action:
stack input
NAME EQUALS NUMBER
+ 4 * 5 $end
‘X’ ‘=’ 3
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ 3
stack input
NAME EQUALS factor
+ 4 * 5 $end
‘X’ ‘=’ 3
def p_expr_plus(p):
‘’’expr : expr PLUS term’’’
p[0] = (‘+’,p[1],p[3])
def p_term_mul(p):
‘’’term : term TIMES factor’’’
p[0] = (‘*’,p[1],p[3])
def p_factor(p):
‘’’factor : NUMBER’’’
p[0] = (‘NUM’,p[1])
Ambiguous Grammars
def p_assign(p):
‘’’assign : NAME EQUALS expr’’’
def p_expr(p):
‘’’expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| NUMBER’’’
3 + 4 * 5
? ?
+ *
3 * + 5
4 5 3 4
Ambiguous Grammars
• Multiple possible parse trees
• Is reported as a “shift/reduce conflict”
yacc: Generating LALR parsing table...
yacc: 16 shift/reduce conflicts
stack input
NAME EQUALS expr PLUS expr * 5 $end
stack input
NAME EQUALS expr PLUS expr * 5 $end
stack input
NAME EQUALS expr PLUS expr * 5 $end
stack input
NAME EQUALS expr PLUS expr * 5 $end
stack input
NAME EQUALS expr PLUS expr * 5 $end
def p_expr(p):
‘’’expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| NUMBER’’’
Error handling/recovery
• Syntax errors first fed through p_error()
def p_error(p):
print “Syntax error”
stack input
NAME EQUALS expr PLUS expr
5 $end
‘X’ ‘=’ 3 ‘+’ 4
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS expr PLUS expr
5 $end
‘X’ ‘=’ 3 ‘+’ 4
syntax error
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS expr PLUS expr
5 $end
‘X’ ‘=’ 3 ‘+’ 4
syntax error
def p_error(p):
print “Syntax error”
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS expr PLUS expr error
$end
‘X’ ‘=’ 3 ‘+’ 4
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS expr PLUS error
$end
‘X’ ‘=’ 3 ‘+’
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS expr error
$end
‘X’ ‘=’ 3
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS error
$end
‘X’ ‘=’
Grammar
(1) assign : NAME EQUALS expr
(2) | NAME EQUALS error
(3) expr : expr PLUS expr
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
NAME EQUALS error
$end
‘X’ ‘=’
Grammar
(1) assign : NAME EQUALS expr def p_assign_err(p):
(2) | NAME EQUALS error ‘assign : NAME EQUALS error’
(3) expr : expr PLUS expr print “Bad assignment”
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Error recovery
stack input
assign
$end
Grammar
(1) assign : NAME EQUALS expr def p_assign_err(p):
(2) | NAME EQUALS error ‘assign : NAME EQUALS error’
(3) expr : expr PLUS expr print “Bad assignment”
(4) | expr MINUS expr
(5) | expr TIMES expr
(6) | expr DIVIDE expr
(7) | NUMBER
Debugging Output
Rule 1 statement -> NAME = expression (1) statement -> NAME = expression .
Rule 2 statement -> expression (3) expression -> expression . + expression
Rule 3 expression -> expression + expression (4) expression -> expression . - expression
Rule 4 expression -> expression - expression (5) expression -> expression . * expression
Rule 5 expression -> expression * expression (6) expression -> expression . / expression
Rule 6 expression -> expression / expression
Rule 7 expression -> NUMBER $end reduce using rule 1 (statement -> NAME = expression .)
+ shift and go to state 7
Terminals, with rules where they appear - shift and go to state 6
* shift and go to state 8
* : 5 / shift and go to state 9
+ : 3
- : 4
/ : 6
= : 1 state 11
NAME : 1
NUMBER : 7 (4) expression -> expression - expression .
error : (3) expression -> expression . + expression
(4) expression -> expression . - expression
Nonterminals, with rules where they appear (5) expression -> expression . * expression
(6) expression -> expression . / expression
expression : 1 2 3 3 4 4 5 5 6 6
statement : 0 ! shift/reduce conflict for + resolved as shift.
! shift/reduce conflict for - resolved as shift.
Parsing method: LALR ! shift/reduce conflict for * resolved as shift.
! shift/reduce conflict for / resolved as shift.
state 0 $end reduce using rule 4 (expression -> expression - expression .)
+ shift and go to state 7
(0) S' -> . statement - shift and go to state 6
(1) statement -> . NAME = expression * shift and go to state 8
(2) statement -> . expression / shift and go to state 9
(3) expression -> . expression + expression
(4) expression -> . expression - expression ! + [ reduce using rule 4 (expression -> expression - expression .) ]
(5) expression -> . expression * expression ! - [ reduce using rule 4 (expression -> expression - expression .) ]
(6) expression -> . expression / expression ! * [ reduce using rule 4 (expression -> expression - expression .) ]
(7) expression -> . NUMBER ! / [ reduce using rule 4 (expression -> expression - expression .) ]
state 1
class MyParser:
def p_assign(self,p):
‘’’assign : NAME EQUALS expr’’’
def p_expr(self,p):
‘’’expr : expr PLUS term
| expr MINUS term
| term’’’
def p_term(self,p):
‘’’term : term TIMES factor
| term DIVIDE factor
| factor’’’
def p_factor(self,p):
‘’’factor : NUMBER’’’
def build(self):
self.parser = yacc.yacc(object=self)
Summary
• Mailing list/group
http://groups.google.com/group/ply-hack
Reduce/Reduce Conflict Explained
stack input
NAME EQUALS NUMBER $end
stack input
NAME EQUALS NUMBER $end
stack input
assign $end
stack input
assign $end
NAME EQUALS NUMBER
stack input
assign $end
NAME EQUALS expr