0% found this document useful (0 votes)
23 views16 pages

6 Ast

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

6 Ast

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

CS 473: COMPILER

DESIGN
Abstract Syntax Trees
Source Code
(Character stream)
if (b == 0) { a = 1; }
Lexical Analysis
Token stream:

if ( b == 0 ) { a = 0 ; }
Parsing
Abstract Syntax Tree:
If
Analysis &
Eq Assn None Transformation
b 0 a 1

Backend
Assembly Code
l1:
cmpq %eax, $0
jeq l2
jmp l3
l2:
… 2
The Structure of Programs
• We can use a grammar to generate a parser that
recognizes terms in that grammar
• A parser gives us a derivation, a sequence of productions
we can apply to build the input program
• But what do we do with that derivation?

• How do we represent the structure of programs?


S⟼S–S | 1 – 2 – 3 could be
number
“do (1 minus 2) minus 3” or “do 1
minus (2 minus 3)”
S⟼S–S⟼S–3⟼S–S–3⟼… S⟼S–S⟼1–S
—…
⟼1–S–S⟼ —

— 3 1 —

1 2 2 3
3
From Derivations to Parse
Trees S
• Tree representation of the
derivation E + S
• Internal nodes are
( S ) E
nonterminals
– Children are parts of the E + S 5
production used on that Parse Tree
nonterminal 1 E + S
• Leaves of the tree are 2 E
terminals
– In-order traversal yields the ( S )
input sequence of tokens
E + S

• (1 + 2 + (3 + 4)) +S5⟼ E + S | E 3 E
E ⟼ number |
(S) 4 4
From Parse Trees to Abstract
S
Syntax
• Parse tree: • Abstract syntax tree
“concrete syntax”
E + S (AST): +
( S ) E
+ 5
E + S 5

1 E + S
1 +

2 E 2 +
( S )
3 4
E + S • Hides, or abstracts,
3 E unneeded
information
4 5
From Parse Trees to Abstract
Syntax
• Internal nodes are • Abstract syntax tree
operations (plus, times, (AST): +
if-then-else, while-loop,
…)
• Leaves are ids and + 5
literals (numbers,
strings, …) 1 +
• Nonterminals don’t
appear at all!
2 +
• Captures logical
3 4
structure of programs
• Hides, or abstracts,
• Goal for parser: take in unneeded
program code, output information
an AST 6
7
Abstract Syntax Trees in Code
• A struct type for each nonterminal
• Each struct has an enum field indicating which production
was used, and a union for associated data (from terminals
with contents, like ID or NUM) and child nodes (from
nonterminals)
S⟼S+E | E
E ⟼ number |
(S)

struct E_node { // corresponds to E


enum { E_num, E_parens } kind;
union { int val; S_node* child; } data;
}

struct S_node { // corresponds to S


enum { S_plus, S_E } kind;
union { struct { S_node* left; E_node* right; } addition;
E_node* e; } data; 8
Abstract Syntax Trees in Code
• Write a constructor for the node type of each production

S⟼S+E | E
E ⟼ number |
(S)

struct E_node { // corresponds to E


enum { E_num, E_parens } kind;
union { int val; S_node* child; } data;
}

E_node* NumNode(int val){


E_node* node = malloc(sizeof(E_node));

node->kind = E_num;
node->data.val = val;
return node;
}
9
Abstract Syntax Trees in Code
• Write a constructor for the node type of each production

S⟼S+E | E
E ⟼ number |
(S)

struct E_node { // corresponds to E


enum { E_num, E_parens } kind;
union { int val; S_node* child; } data;
}

E_node* ParenNode(S_node* s){


E_node* node = malloc(sizeof(E_node));

node->kind = E_parens;
node->data.child = s;
return node;
}
10
Abstract Syntax Trees in Code
• Write a constructor for the node type of each production

S⟼S+E | E
E ⟼ number |
(S)
E_node* NumNode(int val){ … }
E_node* ParenNode(S_node* s){ … }
S_node* PlusNode(S_node* lhs, E_node* rhs){ … }
S_node* ENode(E_node* e){ … }

• Exercise: Write code for the AST of 1 + (2 + 3).


• Hint: Start from the bottom. The AST for 2 is NumNode(2). What’s
the AST for 2 + 3?
+ PlusNode(
ENode(NumNode(1)), ParenNode(
1 () PlusNode(
ENode(NumNode(2)), NumNode(3))))
+

2 3 11
Abstract Syntax Trees in Code
E_node* NumNode(int val){ … }
E_node* ParenNode(S_node* s){ … }

• Now we can use the constructors in the actions of our parser:

%union { … S⟼S+E | E
E_node* e; E ⟼ number |
S_node* s; } (S)

%type <e> exp


%type <s> stmt

exp:
NUM { $$ =
NumNode($1); }
| LPAREN stm RPAREN { $$ = ParenNode($2); }

stmt:
stmt PLUS exp { $$ = PlusNode($1, $3); }
| exp { $$ = 12
13
Syntactic Sugar
• Sometimes two productions are for the same basic operation:
stmt:
IF LPAREN exp RPAREN stmt ELSE stmt
| IF LPAREN exp RPAREN stmt // no else
stmt

• Then we can make one AST node for both:


stmt_node* IfNode(exp_node* cond, stmt_node* then,
stmt_node* else);

• And just pass NULL for the else stmt when there isn’t one:
IF LPAREN exp RPAREN stmt { return IfNode($3, $5, NULL); }

• Result: the user can use multiple different program structures,


but they look the same to the compiler
– Other examples: for loops vs. while loops, less-than vs. greater-
than

• Exercise: add x++ statements to parse2.y, syntactic sugar for 14


x
Parsing Summary
• Goal: take the sequence of tokens produced by lexing, and turn
them into an AST, to make the structure of the program clear

• Method:
1. Write a file (.y) that describes the grammar and disambiguates
it
2. Write AST nodes (in normal code) representing program
structure, and add actions to the grammar that build the AST
node for each production
3. Use a parser generator to turn that grammar into a shift-reduce
automaton, giving us a yyparse function that takes programs
and returns ASTs

• Lexer generators and parser generators work together to define


the syntax of a language
• Lexing and parsing together turn program text into a
representation that suggests meaning/behavior of programs

15
• Next up: analyzing the meaning of a program
16

You might also like