6 Ast
6 Ast
DESIGN
Abstract Syntax Trees
Source Code
(Character stream)
if (b == 0) { a = 1; }
Lexical Analysis
Token stream:
if ( b == 0 ) { a = 0 ; }
Parsing
Abstract Syntax Tree:
If
Analysis &
Eq Assn None Transformation
b 0 a 1
Backend
Assembly Code
l1:
cmpq %eax, $0
jeq l2
jmp l3
l2:
… 2
The Structure of Programs
• We can use a grammar to generate a parser that
recognizes terms in that grammar
• A parser gives us a derivation, a sequence of productions
we can apply to build the input program
• But what do we do with that derivation?
— 3 1 —
1 2 2 3
3
From Derivations to Parse
Trees S
• Tree representation of the
derivation E + S
• Internal nodes are
( S ) E
nonterminals
– Children are parts of the E + S 5
production used on that Parse Tree
nonterminal 1 E + S
• Leaves of the tree are 2 E
terminals
– In-order traversal yields the ( S )
input sequence of tokens
E + S
• (1 + 2 + (3 + 4)) +S5⟼ E + S | E 3 E
E ⟼ number |
(S) 4 4
From Parse Trees to Abstract
S
Syntax
• Parse tree: • Abstract syntax tree
“concrete syntax”
E + S (AST): +
( S ) E
+ 5
E + S 5
1 E + S
1 +
2 E 2 +
( S )
3 4
E + S • Hides, or abstracts,
3 E unneeded
information
4 5
From Parse Trees to Abstract
Syntax
• Internal nodes are • Abstract syntax tree
operations (plus, times, (AST): +
if-then-else, while-loop,
…)
• Leaves are ids and + 5
literals (numbers,
strings, …) 1 +
• Nonterminals don’t
appear at all!
2 +
• Captures logical
3 4
structure of programs
• Hides, or abstracts,
• Goal for parser: take in unneeded
program code, output information
an AST 6
7
Abstract Syntax Trees in Code
• A struct type for each nonterminal
• Each struct has an enum field indicating which production
was used, and a union for associated data (from terminals
with contents, like ID or NUM) and child nodes (from
nonterminals)
S⟼S+E | E
E ⟼ number |
(S)
S⟼S+E | E
E ⟼ number |
(S)
node->kind = E_num;
node->data.val = val;
return node;
}
9
Abstract Syntax Trees in Code
• Write a constructor for the node type of each production
S⟼S+E | E
E ⟼ number |
(S)
node->kind = E_parens;
node->data.child = s;
return node;
}
10
Abstract Syntax Trees in Code
• Write a constructor for the node type of each production
S⟼S+E | E
E ⟼ number |
(S)
E_node* NumNode(int val){ … }
E_node* ParenNode(S_node* s){ … }
S_node* PlusNode(S_node* lhs, E_node* rhs){ … }
S_node* ENode(E_node* e){ … }
2 3 11
Abstract Syntax Trees in Code
E_node* NumNode(int val){ … }
E_node* ParenNode(S_node* s){ … }
%union { … S⟼S+E | E
E_node* e; E ⟼ number |
S_node* s; } (S)
exp:
NUM { $$ =
NumNode($1); }
| LPAREN stm RPAREN { $$ = ParenNode($2); }
stmt:
stmt PLUS exp { $$ = PlusNode($1, $3); }
| exp { $$ = 12
13
Syntactic Sugar
• Sometimes two productions are for the same basic operation:
stmt:
IF LPAREN exp RPAREN stmt ELSE stmt
| IF LPAREN exp RPAREN stmt // no else
stmt
• And just pass NULL for the else stmt when there isn’t one:
IF LPAREN exp RPAREN stmt { return IfNode($3, $5, NULL); }
• Method:
1. Write a file (.y) that describes the grammar and disambiguates
it
2. Write AST nodes (in normal code) representing program
structure, and add actions to the grammar that build the AST
node for each production
3. Use a parser generator to turn that grammar into a shift-reduce
automaton, giving us a yyparse function that takes programs
and returns ASTs
15
• Next up: analyzing the meaning of a program
16