0% found this document useful (0 votes)
13 views

Module1 1

Uploaded by

Rudhhi Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module1 1

Uploaded by

Rudhhi Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CSE309N

Chapter 2
A Simple One – Pass Compiler

Chapter 2
CSE309N
The Entire Compilation Process
 Grammars for Syntax Definition
 Syntax-Directed Translation
 Parsing - Top Down & Predictive
 Pulling Together the Pieces
 The Lexical Analysis Process
 Symbol Table Considerations
 A Brief Look at Code Generation
 Concluding Remarks/Looking Ahead

Chapter 2
CSE309N
Overview

Programming Language can be defined by describing


1. The syntax of the language
1. What its program looks like
2. We use CFG or BNF (Backus Naur Form)
2. The semantics of the language
1. What its program mean
2. Difficult to describe
3. Use informal descriptions and suggestive examples

Chapter 2
CSE309N
Grammars for Syntax Definition
 A Context-free Grammar (CFG) Is Utilized to
Describe the Syntactic Structure of a Language
 A CFG Is Characterized By:
1. A Set of Tokens or Terminal Symbols
2. A Set of Non-terminals
3. A Set of Production Rules
Each Rule Has the Form
NT  {T, NT}*
4. A Non-terminal Designated As
the Start Symbol
Chapter 2
CSE309N
Grammars for Syntax Definition
Example CFG

list  list + digit


list  list - digit
list  digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(the “|” means OR)
(So we could have written
list  list + digit | list - digit | digit )

Chapter 2
CSE309N
Information
 A string of tokens is a sequence of zero or more tokens.
 The string containing with zero tokens, written as , is called
empty string.
 A grammar derives strings by beginning with the start symbol
and repeatedly replacing the non terminal by the right side of a
production for that non terminal.
 The token strings that can be derived from the start symbol form
the language defined by the grammar.

Chapter 2
CSE309N
Grammars are Used to Derive Strings:

Using the CFG defined on the earlier slide, we can


derive the string: 9 - 5 + 2 as follows:
list  list + digit P1 : list  list + digit

 list - digit + digit P2 : list  list - digit

 digit - digit + digit P3 : list  digit


 9 - digit + digit P4 : digit  9

 9 - 5 + digit P4 : digit  5

 9-5+2 P4 : digit  2

Chapter 2
CSE309N
Grammars are Used to Derive Strings:

This derivation could also be represented via a Parse Tree


(parents on left, children on right)
list  list + digit list
 list - digit + digit
 digit - digit + digit
list + digit
 9 - digit + digit - 2
list digit
 9 - 5 + digit
 9-5+2 5
digit
9

Chapter 2
CSE309N
Defining a Parse Tree

 A parse tree pictorially shows how the start symbol of a


grammar derives a string in the language.
 More Formally, a Parse Tree for a CFG Has the Following
Properties:
 Root Is Labeled With the Start Symbol
 Leaf Node Is a Token or 
 Interior Node Is a Non-Terminal
 If A  x1x2…xn, Then A Is an Interior; x1x2…xn Are
Children of A and May Be Non-Terminals or Tokens

Chapter 2
CSE309N
Other Important Concepts
Ambiguity
Two derivations (Parse Trees) for the same token string.

string string
-
+ string string
string string
+ string
string - string 2 9 string

9 5 5 2

Grammar:
string  string + string | string – string | 0 | 1 | …| 9

Why is this a Problem ?

Chapter 2
CSE309N
Other Important Concepts
Associativity of Operators
Left vs. Right

list right

list + digit letter = right


2 a =
list - digit letter right
5 b
digit letter
9 c

list  list + digit | right  letter = right | letter


| list - digit | digit letter  a | b | c | …| z
digit  0 | 1 | 2 | …| 9
Chapter 2
CSE309N
Embedding Associativity
 The language of arithmetic expressions with + -
 (ambiguous) grammar that does not enforce
associativity
string  string + string | string – string | 0 | 1 | …| 9

 non-ambiguous grammar enforcing left


associativity (parse tree will grow to the left)
string  string + digit | string - digit | digit
digit  0 | 1 | 2 | …| 9

 non-ambiguous grammar enforcing right


associativity (parse tree will grow to the right)
string  digit + string | digit - string | digit
digit  0 | 1 | 2 | …| 9

Chapter 2
CSE309N
Other Important Concepts
Operator Precedence
What does ( )
9+5*2 Typically * / is precedence
mean? + - order

This can be expr  expr + term | expr – term | term


incorporated term  term * factor | term / factor | factor
into a grammar factor  digit | ( expr )
via rules: digit  0 | 1 | 2 | 3 | … | 9

Precedence Achieved by:


expr & term for each precedence level

Rules for each are left recursive or associate to the left


Chapter 2
CSE309N
Syntax for Statements

stmt  id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end

Ambiguous Grammar?

Chapter 2
CSE309N
Syntax-Directed Translation
 Associate Attributes With Grammar Rules and Translate as Parsing
occurs

 The translation will follow the parse tree structure (and as a result the
structure and form of the parse tree will affect the translation).

 First example: Inductive Translation.


 Infix to Postfix Notation Translation for Expressions
 Translation defined inductively as: Postfix(E) where E is an
Expression.

Rules
1. If E is a variable or constant then Postfix(E) = E
2. If E is E1 op E2 then Postfix(E)
= Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op
3. If E is (E1) then Postfix(E) = Postfix(E1)

Chapter 2
CSE309N
Examples

Postfix( ( 9 – 5 ) + 2 )
= Postfix( ( 9 – 5 ) ) Postfix( 2 ) +
= Postfix( 9 – 5 ) Postfix( 2 ) +
= Postfix( 9 ) Postfix( 5 ) - Postfix( 2 ) +
=95–2+

Postfix(9 – ( 5 + 2 ) )
= Postfix( 9 ) Postfix( ( 5 + 2 ) ) -
= Postfix( 9 ) Postfix( 5 + 2 ) –
= Postfix( 9 ) Postfix( 5 ) Postfix( 2 ) + –
=952+–

Chapter 2
CSE309N
Syntax-Directed Definition

 Each Production Has a Set of Semantic Rules


 Each Grammar Symbol Has a Set of Attributes
 For the Following Example, String Attribute “t” is
Associated With Each Grammar Symbol

expr  expr – term | expr + term | term


term  0 | 1 | 2 | 3 | … | 9

 recall: What is a Derivation for 9 + 5 - 2?


list  list - digit  list + digit - digit  digit + digit - digit
 9 + digit - digit  9 + 5 - digit  9 + 5 - 2
Chapter 2
CSE309N
Syntax-Directed Definition (2)

 Each Production Rule of the CFG Has a Semantic


Rule
Production Semantic Rule
expr  expr + term expr.t := expr.t || term.t || ‘+’
expr  expr – term expr.t := expr.t || term.t || ’-’
expr  term expr.t := term.t
term  0 term.t := ‘0’
term  1 term.t := ‘1’
…. ….
term  9 term.t := ‘9’

 Note: Semantic Rules for expr define t as a


“synthesized attribute” i.e., the various copies of t
obtain their values from “children t’s”
Chapter 2
CSE309N
Semantic Rules are Embedded in Parse Tree

expr.t =95-2+

expr.t =95- term.t =2

expr.t =9 term.t =5

term.t =9

9 - 5 + 2
 It starts at the root and recursively visits the children of
each node in left-to-right order
 The semantic rules at a given node are evaluated once all
descendants of that node have been visited.
 A parse tree showing all the attribute values at each node
is called annotated parse tree. Chapter 2
CSE309N
Translation Schemes
Embedded Semantic Actions into the right sides of
the productions.
A translation scheme is
expr  expr + term {print(„+‟)}
like a syntax-directed
definition except the
 expr - term {print(„-‟)}
order of evaluation of
 term the semantic rules is
term  0 {print(„0‟)} explicitly shown.
term  1 {print(„1‟)}
expr
… {print(„+‟)}
+
term  9 {print(„9‟)} expr term

- {print(„-‟)} 2 {print(„2‟)}
expr term
5 {print(„5‟)}
term
9 {print(„9‟)}
Chapter 2

You might also like