100% found this document useful (4 votes)
3K views

Principles of Programming Languages - ASU 2014

Course Description Formal lexical, syntactic and semantic descriptions, compilation and implementation issues, and theoretical foundations for several programming paradigms. Outline of Topics 1. Introduction: abstractions; language paradigms; language definition; compilation versus interpretation. 2. Lexical Analysis: regular expressions; deterministic finite automata. 3. Syntax: BNF, EBF, syntax and diagrams; parse trees and abstract syntax trees; one-token look-ahead parsing; recursive- descent parsers; 4. Basic Semantics: data Types and type matching; symbol tables; binding; scope; allocation and storage; variables; pointers. 5. Intermediate Representation 6. Programming Paradigms 7. Object-Oriented Programming 8. Functional Programming: functional algorithms; tail-recursion; lambda calculus – conversions, Church-Rosser theorem, fixed-points. 9. Logical Programming: Horn clause logic, resolution and unification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
100% found this document useful (4 votes)
3K views

Principles of Programming Languages - ASU 2014

Course Description Formal lexical, syntactic and semantic descriptions, compilation and implementation issues, and theoretical foundations for several programming paradigms. Outline of Topics 1. Introduction: abstractions; language paradigms; language definition; compilation versus interpretation. 2. Lexical Analysis: regular expressions; deterministic finite automata. 3. Syntax: BNF, EBF, syntax and diagrams; parse trees and abstract syntax trees; one-token look-ahead parsing; recursive- descent parsers; 4. Basic Semantics: data Types and type matching; symbol tables; binding; scope; allocation and storage; variables; pointers. 5. Intermediate Representation 6. Programming Paradigms 7. Object-Oriented Programming 8. Functional Programming: functional algorithms; tail-recursion; lambda calculus – conversions, Church-Rosser theorem, fixed-points. 9. Logical Programming: Horn clause logic, resolution and unification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 479

CSE340 - Principles of

Programming Languages
Lecture 01:

Course Presentation

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Definitions

CSE340 - Principles of
Programming Languages

Tell a computer what to do

Method of communication consisting of the use of


signs or words in a structured and conventional way

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Language Levels

C++
Java
Fortran C

High-Level Language

Assembly Language

Machine Language

Hardware

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Machine Language

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Assembly Language

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


High-Level Languages
compilation execution

// sorce code Lexer


int x; Virtual Machine
(interpreter)
int foo () { Parser
read (x);
print (5);
}
Semantic Analyzer 5
main () {
foo ();
} Code Generation

X,E,G,O,O
01001010101000010
#e1,I,I,0,7
01010100101010010
@
10100100000011011
OPR 19, AX
11010010110101111
STO x, AX
00010010101010010
LIT 5, AX
10101001010101011
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX Assembler
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
Language Paradigms

Procedural program = algorithms + data

Object-Oriented program = objects + messages

Functional program = functions ° functions

Logic Programming program = facts + rules

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Calendar

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Calendar

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Grading

Exams (2) 20% + 20% 40%

Final Comprehensive
20% 20%
Exam

Programming
10% + 10% + 10% + 10% 40%
Assignments (4)

100%

97 A+ 86 B+ 74 C+

93 A 82 B 70 C

89 A- 78 B-

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Text book

Chapter 1. Introduction
Chapter 6. Syntax
Chapter 7. Basic Semantics
Chapter 8. Data Types
Chapter 9. Expressions and Statements
Chapter 10. Procedures

Chapter 3. Functional Programming


Chapter 4. Logic Programming
Chapter 5. OO Programming

Chapter 12. Formal Semantics

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Homework

Read the Syllabus of the course

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 02:

Introduction

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Teaching Assistants

• Bolun Li

[email protected]
Graduate Student / Computer Science MS

• Steven Lombardi

[email protected]
Undergraduate Student / Computer Science BS

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Instructor

• Javier Gonzalez-Sanchez

[email protected]
Graduate Teaching Associate

www.javiergs.com

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Keywords

Input:
Symbols
Lexical
Level Output:
Words
Paradigm Input:
Analysis Words
Syntax
Language
Output:
Translate or Sentences
Execute
Input:
Semantic
Sentences

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Analysis

int x = 5;

float y = "hello;

String @z = "9.5";

int x = cse340;

if ( x > 14) while (5 == 5) if (int a) a = 1;

x = x; for ( ; ; );

y = 13.45.0;

int me = 99999000001111222000000111111222223443483045830948;

while { x != 9} ();

int {x} = 10;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


High-Level Languages
compilation execution

// sorce code Lexer


int x; Virtual Machine
(interpreter)
int foo () { Parser
read (x);
print (5);
}
Semantic Analyzer 5
main () {
foo ();
} Code Generation

X,E,G,O,O
01001010101000010
#e1,I,I,0,7
01010100101010010
@
10100100000011011
OPR 19, AX
11010010110101111
STO x, AX
00010010101010010
LIT 5, AX
10101001010101011
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX Assembler
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
Keywords

Alphabet Symbol

Lexical String Word

Token

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Lexical Analysis

int x = 5;

float y = "hello;

String @z = "9.5";

int x = cse340;

if ( x > 14) while (5 == 5) if (int a) a = 1;

x = x; for ( ; ; );

y = 13.45.0;

int me = 99999000001111222000000111111222223443483045830948;

while { x != 9} ();

int {x} = 10;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Lexical Analysis | Steps

a) Read a text FILE line by line

b) For each LINE:

• Read character by character.


• Create sets of consecutive characters (STRING). Try to
group the bigger amount of characters as possible.
• Start a new set each time that you need. Take care
of: Whitespace, Delimiter, Operator, End of Line and
others special characters.

c) For each STRING: verify if it is a valid WORD.

d) Create a VECTOR and store the STRINGs and WORDs.


Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9
Lexical Analysis

int x = 5; float y = "hello;


String@z="9.5”;intx=cse340;if(x>
14) while

(5 == 5) if (int a) a = 1; x = x;
for ( ; ; );y = 13.45.0;int me
=99999000001111222000000111111222
223443483045830948;while { x !=
9} ();int {x} = 10;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Lexical Analysis
# of STRINGS

9
int x = 5; float y = "hello;
12
String@z="9.5”;intx=cse340;if(x>
3
14) while

(5 == 5) if (int a) a = 1; x = x; 18

for ( ; ; );y = 13.45.0;int me 12

=99999000001111222000000111111222 2

223443483045830948;while { x != 6

9} ();int {x} = 10; 12

3
”hello "world" bye"
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11
Keywords

Alphabet Symbol

Lexical String Word

Regular
Expression

Token Rules

Deterministic
Finite
Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Homework

Review the following topics:

Regular Expressions (Text Book: Chapter 6)


and Deterministic Finite Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 03:

Lexical Analysis

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Keywords

Alphabet Symbol

Lexical String Word

Regular
Expression

Token Rules

Deterministic
Finite
Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Regular expression

§ A rule to describe finite combination of symbols


(sequences) that are considered well-formed.

§ Regular expression has symbols and operators

§ Symbols are defined in the alphabet

§ The operators used in regular expressions are: * (0 or


more), + (1 or more), ? (0 or 1), | (or). Besides those
we can use [ ] to enclose sets of symbols without
enumerating all of them, such as [0-9] or [A-Z]. Also,
we can use parenthesis.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Regular expression | Examples

Token Regular Expression (rule) Examples (words)


foobarOne (a | b) {a, b}

foobarTwo (a | b)(a | b) {aa, bb, ba, ab}

foobarThree a* { , a, aa, aaa, aaaa, ... }

foobarFour (a | b)* { , a, b, aa, bb, ...abba ...}

foobarFive a+ { a, aa, aaa, aaaa, ... }

foobarSix [a-z]+ {hello, world, etc, …}

number [0-9]+ {1934, 0101, 33, 12321…}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Regular expression | Examples

Token1 Regular Expression (rule) Example (word)


digit 0 | 1 | 2 | 3 | ... | 9 3

integer digit+ 1945

fraction .digit+ .55

exponent e(+|-)?digit+ e+210

floatDraftOne integer(fraction?) (exponent?) 340.08e-14

floatDraftTwo {[-+]?([0-9]+\.?[0-9]*|\.[0-9]+)([eE][-+]?[0-9]+)?}

binary 0b(0|1)+ 0b1010

1. These definitions are NOT fully complete or correct. They purpose is only to exemplify RE. For
instance 07 match as an integer, which will NOT be the case for our language.
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5
Regular expression | Operators

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Deterministic Finite Automata

§ It is a finite state machine that accepts/rejects finite


strings of symbols and produces a unique result for
each input string.

§ In the automaton, there are three states (denoted


graphically by circles) and transition arrows
connecting one state with other.

§ Upon reading a symbol, a DFA jumps


deterministically from a state to another by
following the transition arrow.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


DFA | Examples

binary

0b(0|1)+

string

“([a-z] | [A-Z] | [0-9])*”

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


DFA | Examples

{‘}
{‘} {.} Char

Operator

{+,-,*,/,%, {.}
<,>,=,!,…}
{“} {“}
Start
{(, ), {, }, [, ]}
String

Delimiter {a-z}

{0-9}
{\.} {_}
ID

{$}
{0-9}
Integer Float
{$, _, 0-9, a-z}

{\.} {0-9} {0-9}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Additional Examples
Regular Expressions and Deterministic Finite Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Handwritten notes

Regular expression

Regular expression

Regular expression

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Handwritten notes

Deterministic
Finite Automa

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Handwritten notes
Regular expression

Regular expression

-9

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Handwritten notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Handwritten notes

Is this correct?

Is this correct?

Is this correct?
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15
Handwritten notes

Error

Correct

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Regular expressions - Examples

Token Regular Expression (rule) Examples (words)


foobar4 {"a", "b", "c"}* {ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc",
"ca", "cb", "cc", ...}

foobar5 {"ab", "c"}* {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab",
"ababc", "abcab”, ...}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Regular expressions - Examples

Define a regular expression for each case

a) URLs

b) Email addresses

c) ZIP codes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


DFA- Examples

Define a DFA for each case

a) URLs

b) Email addresses

c) ZIP codes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Ours Tokens

Which tokens are needed for a programming language?

a) Reserved words

b) Special Symbols: Operators and delimiters

c) Identifiers

d) Literals or constants

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


Drafting a Lexer

§ Keywords =

§ Operator =

§ Delimiter =

§ ID =

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Drafting a Lexer

§ Float =

§ Integer =

§ Hexadecimal =

§ Octal =

§ Binary =

§ String =

§ Char =
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22
Homework

Define the necessary lexical rules for a programming language

Express these rules using a DFA and Regular Expressions

Share them on Blackboard and discuss their correctness with your classmates.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 04:

Lexer Implementation 1

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Review

Given the following token definitions (using regular expressions)


!
t1 = aabb!
t2 = aab!
t3 = (a | b) *!

1. Are the following strings correct?


aaba!
a!
aab!
∑!

2. Which are the token for each of them?

4. Which symbols are in the alphabet ?

3. Create a DFA that represents the previous rules.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Review

1. How many words:

-5
-5.5e-5
5-5

2. Which is the difference between these regular expressions?

[0-9]+.[0-9]+

[0-9]+\.[0-9]+

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Programming a Lexer

Regular
Expresion

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Programming a Lexer

Regular
Expresion

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Using IF-ELSE

It is not a good idea!

February 13th, 2008 by Rich Sharpe. Posted in Software Quality, Software Quality Metrics

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Using a State Machine

1. Put the DFA in a Table

b 0 1 ... whitespace,  quotation  


Delimiter,  operator,  

mark

S0 SE S1 SE SE Stop
S0 S1 S2 S3
S1 S2 SE SE SE Stop

S2 SE S3 S3 SE Stop

S3 SE S3 S3 SE Stop

SE SE SE SE SE Stop

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Using a State Machine

2. Put the Table in Java


// constants b 0 1 ... Delimiter,  operator,  
whitespace,  quotation  
mark

private static final int ZERO = 1; S0 SE S1 SE SE Stop


private static final int ONE = 2;
private static final int B = 0; S1 S2 SE SE SE Stop

private static final int OTHER = 3;
S2 SE S3 S3 SE Stop
private static final int DELIMITER = 4;
private static final int ERROR = 4; S3 SE S3 S3 SE Stop
private static final int STOP = -2;
SE SE SE SE SE Stop

// table as a 2D array

private static int[][] stateTable = {


{ERROR, 1, ERROR, ERROR, STOP},
{ 2, ERROR, ERROR, ERROR, STOP},
{ERROR, 3, 3, ERROR, STOP},
{ERROR, 3, 3, ERROR, STOP},
{ERROR, ERROR, ERROR, ERROR, STOP}
};
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8
Using a State Machine
void splitLine (line) {
state = S0;
STEP 3. Algorithm String string ="";
do {
l = line.readNextLetter();
b 0 1 ... go = calculateNextState(state, l);
if( go != STOP ) {
string = string + l;
S0 SE S1 SE SE Stop state = go;
}
} while (line.hasLetters() && go != STOP);

S1 S2 SE SE SE Stop if (state == S3)

print (“It is a BINARY number”);


else
print (“error”);
S2 SE S3 S3 SE Stop
if( isDelimiter(currentChar))
print (“Also, there is a DELIMITER”);
else if (isOperator(currentChar) )
print (“Also, there is an OPERATOR”);
S3 SE S3 S3 SE Stop
// loop
if (line.hasLetters() ))
SE SE SE SE SE Stop splitLine( line – string );

}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Programming Assignment #1

1. Read a File; Split the lines using the


System.lineSeparator

2. For each line read character by character and use the


character as an input for the state machine

3. Concatenate the character, creating the largest


STRING possible. Stop when a delimiter, white space,
operator, or quotation mark and the current state
allowed. If there are more characters in the line, create
a new line with those characters and go to step 2.

4. For each STRING and WORD report its TOKEN or ERROR


as correspond.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Homework

Define the necessary lexical rules for a programming language

Express these rules using a DFA and Regular Expressions

Share them on Blackboard and discuss their correctness with your classmates.

Remember: Using a DETERMINISTIC Finite Automata

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 05:

Lexer Implementation 2

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Review

1. how to exclude keywords from Identifiers?

2. The dot:
foobar1 = .
foobar1 = .*
foobar1 = .+
foobar1 = \.

3. What is the problem here?

string = " \.* "

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Review

4. Rules and Sub-rules

URL definition:

https?:// ([a – z]+ | [A – Z]+ | [0 – 9]+ | - | . | _ | ~ | %21


| %23 | %24 | %26 | %27 | %28 | %29 | %2A | %2B | %2C | %3A |
%3B | %3D | %3F | %40 | %5B | %5D )* %2F ([a – z]+ | [A – Z]+ |
[0 – 9]+ | - | . | _ | ~ | ! | # | $ | & | ‘ | ( | ) | * | +
| , | / | : | ; | = | ? | @ | [ | ] )+

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Review

5. Making things shorter

ZIP definition:

[1-9][1-9][1-9][1-9][1-9](-[1-9][1-9][1-9][1-9])?

[0-9]{5}(-[0-9]{4})?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Programming Assignment #1

• Only BINARY, DELIMITER, and OPERATOR are implemented. You will implement the rest of the
required tokens (rules).

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Programming Assignment #1

* Lexer.java is the only file that you are allowed to modify

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Code | Token.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Code | Gui.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Code | Lexer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Code | input.txt

hello;world cse340 asu 2013/05/31 // end

boolean $xx= ((((((((23WE + 44 - 3 / 2 % 45 <=17) > 0xfffff.34.45;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Code | output.txt
IDENTIFIER hello
DELIMITER ; ERROR 23WE
IDENTIFIER world OPERATOR +
IDENTIFIER cse340 INTEGER 44
OPERATOR -
IDENTIFIER asu
INTEGER 2013 INTEGER 3
OPERATOR / OPERATOR /
INTEGER 2
OCTAL 05
OPERATOR / OPERATOR %
INTEGER 31 INTEGER 45
OPERATOR / OPERATOR <
OPERATOR =
OPERATOR /
IDENTIFIER end INTEGER 17
KEYWORD boolean DELIMITER )
OPERATOR >
IDENTIFIER $xx
OPERATOR = ERROR 0xfffff.34.45
DELIMITER ( DELIMITER ;
DELIMITER (
DELIMITER (
DELIMITER (
DELIMITER (
DELIMITER (
DELIMITER (
DELIMITER (

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Homework

Programming Assignment #1

Develop a Lexical Analyzer by coding a DFA

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 06:

Closing with Lexical Analysis

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Errata #1

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Errata #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Errata #3

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Programming Assignment #1

?
70%

?
100%

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Programming Assignment #1

?
100%

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Programming Assignment #1

?
70%

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Programming Assignment #1

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Programming Assignment #1

§ Firstname_Lastname_P1.zip
§ Compile and Run
§ Recognize BINARY approx. 20%
§ Recognize DELIMITER and OPERATOR

§ Recognize INTEGER
§ Recognize OCTAL
§ Recognize HEXADECIMAL approx. 40%
§ Recognize IDENTIFIER

§ Recognize STRING
§ Recognize CHAR
§ Recognize KEYWORD approx. 40%
§ Recognize FLOAT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Review | Lexical Analysis

Are the following STRINGS correct or not? Why?

§ 000000005

§ 000000009

§ 000000009.1

§ 000000005

§ 000000005.1

§ 0x0000002

§ 0123456789
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10
Review | Lexical Analysis

Are the following STRINGS correct or not? Why?

§ 1.2e---3++

§ $50

§ float ________________ = 5;

§ double x = 000000.1;

§ '''a'

§ '\''b'

§ '\'b'
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11
Review | Lexical Analysis

Are the following STRINGS correct or not? Why?

§ " \\\\\\\\\\a"

§ "Hello""world"

§ abc"Hello"

§ ''’

§ '\x’

§ ’\a'

§ ’\w’

§ "\\\"

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


State Transition Table
for our Lexer

(step by step)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


BINARY

INTEGER

OCTAL

HEXADECIMAL

IDENTIFIER

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Lexer – Step by Step

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Lexer – Step by Step

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Lexer – Step by Step
0
$
_
[1]

columns
[2-7]
[8-9]
[A]
B
[C-F]
[G-W]
X
[Y-Z]

1 = INTEGER [a-z] = [A] B [C-F] [G-W] X [Y-Z]


2 = INTEGER
3 = IDENTIFIER
states

5 = OCTAL
7
8
=
=
INTEGER
IDENTIFIER
[a-f] = [A] B [C-F]
9 = BINARY
10 = HEXADECIMAL
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17
Lexer – Step by Step
0 $ _ [1] [2-7] [8-9] A B [C-F] [G-W] X [Y-Z] ... Delimiter, operator,
whitespace, quotation
mark

S0 S1 S3 S3 S2 S2 S2 S3 S3 S3 S3 S3 S3 SE Stop
S1 S5 SE SE S5 S5 SE SE S4 SE SE S6 SE SE Stop
S2 S7 SE SE S7 S7 S7 SE SE SE SE SE SE SE Stop
S3 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 SE Stop
S4 S9 SE SE S9 SE SE SE SE SE SE SE SE SE Stop
S5 S5 SE SE S5 S5 SE SE SE SE SE SE SE SE Stop
S6 S10 SE SE S10 S10 S10 S10 S10 S10 SE SE SE SE Stop
S7 S7 SE SE S7 S7 S7 SE SE SE SE SE SE SE Stop
S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 S8 SE Stop
S9 S9 SE SE S9 SE SE SE SE SE SE SE SE SE Stop
S10 S10 SE SE S10 S10 S10 S10 S10 S10 SE SE SE SE Stop
SE SE SE SE SE SE SE SE SE SE SE SE SE SE Stop

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Homework

Review Recursion

Solve the Problem Set #1 in preparation for your exam

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 07:

Syntactic Analysis 1

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Next Step

þ Lexical Analysis ☐ Syntactic Analysis

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Question

For each cases indicate whether it is possible or not to


generate a regular expression or a DFA.

i. Detect the balance of N parenthesis in a string


that has N parenthesis nested and any characters
in between the parenthesis.

ii. Is it possible to detect binary strings with the same


quantity of 0’s and 1’s (it does not matter the order
or sequence).

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Where are we now?

After lexical analysis, we have a series of tokens.

But we can not:

I. define a regular expression matching all


expressions with properly balanced parentheses.

II. i.e., define a regular expression matching all


functions with properly nested block structure.

void a () { b (c); for (;;) {a=(-(1+2)+5); } }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Where are we now?

Now, we want to:

I. Review the structure described by that series of


tokens

II. Report errors if those tokens do not properly


encode a structure

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


High-Level Languages
compilation execution

// sorce code Lexer


int x; Virtual Machine
(interpreter)
int foo () { Parser
read (x);
print (5);
}
Semantic Analyzer 5
main () {
foo ();
} Code Generation

X,E,G,O,O
01001010101000010
#e1,I,I,0,7
01010100101010010
@
10100100000011011
OPR 19, AX
11010010110101111
STO x, AX
00010010101010010
LIT 5, AX
10101001010101011
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX Assembler
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
Outline
Symbols
Rules
Token
Lexical Analysis
(Lexer)
Regular Expression
Tools
DFA
Language
Terminal
Grammar
(Rules)
Non-terminal
Syntactic
Analysis
BNF
(Parser)
(Backus-Naur Form)
Tools

Syntax Diagrams

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Grammar | Example

Describe all legal arithmetic expressions using


addition, subtraction, multiplication, and division with
integer values

E à E OP E

E à integer

OP à + | - | * | /

E à ( E )

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Grammar | Definition

A Grammar is a collection of
four elements:
E à E OP E
§ Set of nonterminal symbols
(uppercase)
E à integer
§ Set of terminal symbols
(lowercase). Terminals can be
OP à + | - | * | / tokens or specific words

§ Set of production rules saying


E à ( E ) how each nonterminal can
be converted by a string of
terminals and nonterminals,

§ A start symbol

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 9


Grammar | Derivation
5 / 20
integer operator integer

E E à E OP E
⇒ E OP E
⇒ integer OP E E à integer
⇒ integer / E
⇒ integer / integer OP à + | - | * | /

E à ( E )

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 10


Grammar | Derivation
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter

E E à E OP E
⇒ E OP E
⇒ integer OP E E à integer
⇒ integer * E
⇒ integer * (E) OP à + | - | * | /
⇒ integer * (E OP E)
⇒ integer * (integer OP E) E à ( E )
⇒ integer * (integer + E)
⇒ integer * (integer + integer)

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 11


Grammar | Derivation
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter

E E à E OP E
⇒ E OP E
⇒ E OP (E) E à integer
⇒ E OP (E OP E)
⇒ E OP (E OP integer) OP à + | - | * | /
⇒ E OP (E + integer)
⇒ E OP (integer + integer) E à ( E )
⇒ E * (integer + integer)
⇒ integer * (integer + integer)

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 12


Derivations

§ A leftmost derivation is a derivation in which


each step expands the leftmost
nonterminal.

§ A rightmost derivation is a derivation in


which each step expands the rightmost
nonterminal.

§ Derivation will be very important when we


talk about parsing.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Grammar | Example

H2 O Notation 1:

Comp → Mix | Mix Num | Comp Comp


C O2 (S O4)3 Mix → Elem | ( Comp )
Elem → H|O|C|S|Na|Cl| ...
Num → 1|2|3|4| ...
Na Cl

S O3 Notation 2:

<Comp> → <Mix>|<Mix><Num> | <Comp><Comp>


<Mix> → <Elem> | ( <Comp> )
<Elem> → H|O|C|S|Na|Cl| ...
<Num> → 1|2|3|4| ...

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 14


Grammar | Derivation

C O 2

Comp Comp → Term | Term Num | Comp Comp


⇒ Comp Comp
⇒ Term Comp Term → Elem | ( Comp )

⇒ Elem Comp
Elem → H|O|C|S|Na|Cl| ...
⇒ C Comp
⇒ C Term Num Num → 1|2|3|4| …
⇒ C Elem Num
⇒ CO Num
⇒ CO2

Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 15


What about this?

BLOCK → STMT | { STMTS } | { }

STMTS → STMT | STMT STMTS

STMT → EXPR; |
if (EXPR) BLOCK |
while (EXPR) BLOCK |
BLOCK |
. . .

EXPR → EXPR + EXPR |


EXPR – EXPR |
EXPR * EXPR |
identifier |
integer |
...
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16
Homework

Using the rules in the previous slide, apply derivation to show that the following
expression is syntactically correct

while ( 5 ) { if ( 6 ) { } }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Homework

Review Recursion

Solve the Problem Set #1 in preparation for your exam

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 08:

Syntactic Analysis II

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Outline

Non-terminal
Grammar
(Rules)
Terminal

Syntactic Parse
Language Analysis Derivation
Tree
(Parser)

BNF

Tools

Syntax Diagrams

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Grammar | Derivation
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter

E E à E OP E
⇒ E OP E
⇒ integer OP E E à integer
⇒ integer * E
⇒ integer * (E) OP à + | - | * | /
⇒ integer * (E OP E)
⇒ integer * (integer OP E) E à ( E )
⇒ integer * (integer + E)
⇒ integer * (integer + integer)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Grammar | Derivation
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter

E E à E OP E
⇒ E OP E
⇒ E OP (E) E à integer
⇒ E OP (E OP E)
⇒ E OP (E OP integer) OP à + | - | * | /
⇒ E OP (E + integer)
⇒ E OP (integer + integer) E à ( E )
⇒ E * (integer + integer)
⇒ integer * (integer + integer)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Parse Tree

§ A parse tree is a tree encoding the steps in a


derivation.

§ Internal nodes represent nonterminal symbols used


in the production.

§ Inorder walk of the leaves contains the generated


string.

§ Encodes what productions are used, not the order


in which those productions are applied.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Parse Tree

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Parse Tree

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Goal

Goal of syntax analysis:

§ Recover the structure described by a series of


tokens.

§ Recover a parse tree for the given input.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


The problem
5 * 7 + 20
Integer operator integer operator integer

E à E OP E

E à integer

OP à + | - | * | /

E à ( E )

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


A serious problem

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Ambiguity

• A grammar is said to be ambiguous if there is at


least one string with two or more parse trees.

• Note that ambiguity is a property of grammars, not


languages.

We will review this topic in the next lecture

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Our Tools
Backus-Naur Form (BNF)
Extended Backus-Naur Form (EBNF)

Syntax Diagrams

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


BNF (Backus-Naur Form)

Formal, mathematical way to specify grammars

All the previous examples, where we use:

à or ::= is defined as
| or operator
<nonterminal> or use uppercases
terminal (lowercases)

* John Backus and Peter Naur

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


EBNF

Extended BNF include notation to indicate:

• 0 or more occurrences {…}


• 1 or more occurrences +
• 0 or 1 occurrences […]
• Use of parentheses for grouping ( )

* Niklaus Wirth
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14
Example

Grammar rule for calling a method:

§ draw(x, y, z);

§ print (a, b, c, d);

§ done();

§ foobar(one, two, three, four, five);

§ sqrt(x);

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


BNF vs EBNF

BNF
<call_method> à identifier(<identifiers>); | identifier();
<identifiers> à identifier | identifier,<identifiers>

EBNF
<call_method> à identifier ('('<identifiers>')' | '('')' ) ';'
<identifiers> à identifier | identifier,<identifiers>

EBNF
<call_method> à identifier'('[<identifiers>]')' ';'
<identifiers> à identifier | identifier,<identifiers>

EBNF
<call_method> à identifier'('[<identifiers>]')' ';'
<identifiers> à identifier { ,identifier }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Syntax Diagrams

Call_method

Identifiers

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Is it BNF or EBNF?

BLOCK → STMT | { STMTS } | { }

STMTS → STMT | STMT STMTS

STMT → EXPR; |
if (EXPR) BLOCK |
while (EXPR) BLOCK |
BLOCK |
. . .

EXPR → EXPR + EXPR |


EXPR – EXPR |
EXPR * EXPR |
identifier |
integer |
...
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18
Syntax Diagrams

BLOCK STMTS

STMT EXPR

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Is it BNF or EBNF?

BLOCK → STMT | '{' { STMT } '}'

STMT → EXPR; |
if '(' EXPR ')' BLOCK |
while '(' EXPR ')' BLOCK |
BLOCK |
. . .

EXPR → EXPR '+' EXPR |


EXPR '–' EXPR |
EXPR '*' EXPR |
identifier |
integer |
...
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20
Syntax Diagrams

BLOCK

STMT EXPR

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Homework

Create a Parse Tree for the following expression.


Use the rules stated in the previous lecture

while ( 5 ) { if ( 6 ) { } }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 09:

Grammars 1

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Ambiguity

• A grammar is said to be ambiguous if there is at


least one string with two or more parse trees.

• Note that ambiguity is a property of grammars, not


languages.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Ambiguity

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Solution

• If a grammar can be made unambiguous at all, it is


usually made unambiguous through layering.

• Have exactly one way to build each piece of the


string.

• Have exactly one way of combining pieces back


together.

• Recursive constructions

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Layering

Root
Start symbol

Rule 1

Rule 2

Rule 3
Layers
Rule 4

Rule 5

.....

Leaf (Terminals, i.e., Tokens)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Layering

inputs:
§ 1+2+3<4*5
§ 1*2+3+4<5
§ 1<2+3+4*5
§ 1+2<3*4+5
§ 1+2*3<4+5

1 2 3 4 5
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
Exercise

Provide a Grammar that is Not ambiguous


for arithmetic expressions

10 * 20 + 15

Precedence of operators and Associativity

þ (10 * 20) + 15

ý 10 * (20 + 15)
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7
Exercise | Precedence

New Grammar: Original Grammar:

<E> à <__> + <__> E à E OP E

< > à <__> - <__> E à integer

< > à <__> * <__> OPà + | - | * | /

< > à <__> / <__>

< > à - <__>

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Exercise | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Exercise | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Exercise | Precedence

<E> à <A> | <A> {'+' <A>} | <A> {'-' <A>}

<A> à <B> | <B> {'*' <B>} | <B> {'/' <B>}

<B> à '-'<C> | <C>

<C> à integer

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Exercise | Precedence

<E> à <A> {(’+'|'-’) <A>}

<A> à <B> {('*'|'/') <B>}

<B> à ['-'] <C>

<C> à integer

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Syntax Diagrams

E B

A C

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Exercise 1

Include rules for handling parenthesis into the previous


grammar. The grammar should accept as correct the
following expressions:

10 * 20 + 15
(10 * 20) + 15
10 * (20 + 15)
(10) * (20) + (15)
(10 * 20 + 15)
10 * (20) + 15

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Exercise 1 | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Syntax Diagrams

E B

A C

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Exercise 2 | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Exercise 2 | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Exercise 2

Include rules to accept variables names (identifiers) in


expressions. The grammar should accept as correct the
following expressions:

A * 20 + time
(x * y) + 15
10 * (ASU + cse340)
(10) * (20) + (15)
(hello * world + Arizona)
10 * (counter) + 15

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Syntax Diagrams

E B

A C

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


Exercise 3

Provide a Grammar that is

Not ambiguous

with Precedence of operators and Associativity for this:

10 + 20 > 15 & -10 != 1 | 20 / ( 10 + 1) < 5

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Exercise 3 | Note

Precedence of operators

|
&
!
< > == != <= >=
+-
*/
-
()

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


Exercise 3 | Hand written notes

...

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


Exercise 3 | Hand written notes

{ }
{ }
{ } { } { } { }...
{ } { }
{ } { }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24


Exercise 3

EXPRESSION

Y R

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


Exercise 3

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 26


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 10:

Grammars II

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours by appointment
Exam 1 | Review

Draw a DFA equivalent to the regular expression

email2 = character+ \. character+ @ character+ \. domain

Note:

a*

a+ = a a*

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Exam 1 | Review

• Which are the tokens that are been defined?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Exam 1 | Review

Create a grammar to validate well-written references:

Byron Lahey, Audrey Girouard, Winslow Burleson, and Roel


Vertegaal. 2011. Understanding the use of bend gestures in
mobile devices with flexible electronic paper displays. In
Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (CHI '11). ACM, New York, NY, USA,
1303-1312.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Exam 1 | Review

Create a grammar to validate well-written references:

Byron Lahey, Audrey Girouard, Winslow Burleson, and Roel Vertegaal.


2011.
Understanding the use of bend gestures in mobile devices with flexible electronic
paper displays.

In Proceedings of the SIGCHI Conference on Human Factors in


Computing Systems (CHI '11).

ACM,

New York, NY, USA,

1303-1312.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Exam 1 | Review

Terminals Non-terminals

<AUTORS>
string <YEAR>
number <TITLE>
<CONFERENCE>
. <PUBLISHER>
, <ADDRRESS>
<PAGES>
-

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Exam 1 | Review

Rules

<REFERENCE> à
<AUTORS> à
<YEAR> à
<TITLE> à
<CONFERENCE> à
<PUBLISHER> à
<ADDRRESS> à
<PAGES> à

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Review | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Review | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Review | Hand written notes

Syntax diagram

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Review | Hand written notes

Parse tree

(it is incomplete)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Exam 1 | Review

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Exam 1 | Review

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Review | Hand written notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 11:

Parser Implementation I

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Parser

Grammar
AA

BNF B

EBNF

Parser

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Parser | Grammar 1

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer|
identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Parser | Grammar 2

<PROGRAM> à '{' <BODY> '}'

<BODY> à {<EXPRESSION>';'}

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}

<E> à <A> {(’+'|'-’) <A>}


<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>

<C> à integer|
identifier|'(' <EXPRESSION> ')'
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4
Parser | Input and Output

0;

1 + 2;

3 * (4 + hello);

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Parser | Step by Step

For each rule in the grammar {

§ Step 1. left-hand side (new method)

§ Step 2. right-hand side (loops, ifs, call methods)

§ Step 3. identify errors (terminals)

§ Step 4. synchronize errors (first and follow sets)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Parser

public class Parser {

private static Vector<Token> tokens;


private static int currentToken;

public static void RULE_PROGRAM () {}


public static void RULE_BODY () {}
public static void RULE_EXPRESSION () {}
public static void RULE_X () {}
public static void RULE_Y () {}
public static void RULE_R () {}
public static void RULE_E () {}
public static void RULE_A () {}
public static void RULE_B () {}
public static void RULE_C () {}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7

}
Parser

PROGRAM

public static void RULE_PROGRAM() {

if (tokens.get(currentToken).getWord().equals(“{”)) {
currentToken++;
else
error(1);

RULE_BODY();

if (tokens.get(currentToken).getWord().equals(“}”))
currentToken++;
else
error(2);
}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8
Parser

BODY

public static void RULE_BODY() {

while (!tokens.get(currentToken).getWord().equals(“}”)) {

RULE_EXPRESSION();

if (tokens.get(currentToken).getWord().equals(“;”))
currentToken++;
else
error(3);

}
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Parser

EXPRESSION

public static void RULE_EXPRESSION() {

RULE_X();

while (tokens.get(currentToken).getWord().equals(“|”)) {
currentToken++;
RULE_X();
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Parser

public static void RULE_X() {

RULE_Y();

while (tokens.get(currentToken).getWord().equals(“&”)) {
currentToken++;
RULE_Y();
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Parser

public static void RULE_Y() {

if (tokens.get(currentToken).getWord().equals(“!”)) {
currentToken++;
}

RULE_R();

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Parser

R
public static void RULE_R() {

RULE_E();

while ( tokens.get(currentToken).getWord().equals(“<”)
|tokens.get(currentToken).getWord().equals(“>”)
|tokens.get(currentToken).getWord().equals(“==”)
|tokens.get(currentToken).getWord().equals(“!=”)
) {
currentToken++;
RULE_E();
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Parser

public static void RULE_E() {

RULE_A();

while (tokens.get(currentToken).getWord().equals(“-”)
| tokens.get(currentToken).getWord().equals(“+”)
) {
currentToken++;
RULE_A();
}

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14
Parser

public static void RULE_A() {

RULE_B();

while (tokens.get(currentToken).getWord().equals(“/”)
| tokens.get(currentToken).getWord().equals(“*”)
) {
currentToken++;
RULE_B();
}

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15
Parser

public static void RULE_B() {

if (tokens.get(currentToken).getWord().equals(“-”)) {
currentToken++;
}

RULE_C();

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Parser
C

public static void RULE_C() {


if (tokens.get(currentToken).getToken().equals(“integer”)) {
currentToken++;
} else if (tokens.get(currentToken).getToken().equals(“identifier”)) {
currentToken++;
} else if (tokens.get(currentToken).getWord().equals(“(”)) {
currentToken++;
RULE_EXPRESSION();
if (tokens.get(currentToken).getWord().equals(“)”)) {
currentToken++;
} else error(4);
}
} else { error (5); }
}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17
Homework

Programming Assignment 2
Level 1

Review and Understand the Source Code


posted in Blackboard. Specially, particularly the use of
DefaultMutableTreeNode)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Homework

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Homework

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


Homework

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Homework

Programming Assignment 2
Level 2

Modify the Source Code


to include the rules PROGRAM and BODY, EXPRESSION, X, Y, R
(from Grammar 2)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 12:

Parser Implementation II

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-21
Office Hours: By appointment
Review

Programming Assignment 2
Level 1

Review and Understand the Source Code


posted in Blackboard.
Particularly the use of DefaultMutableTreeNode)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Review

* Parser.java is the only file that you are allowed to modify


Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3
Review

Programming Assignment 2
Level 2

Modify the Source Code


to include the rules PROGRAM and BODY, EXPRESSION, X, Y, R
(from Grammar 2)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Review

<PROGRAM> à '{' <BODY> '}'

<BODY> à {<EXPRESSION>';'}

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}

<E> à <A> {(’+'|'-’) <A>}


<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>

<C> à integer|
identifier|'(' <EXPRESSION> ')'
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5
Programming Assignment 2
Level 3

The complete grammar for our language

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Assignment 2 | Language

Expressions ARITHMETIC OPERATORS { +, -, *, /, =}


(operators)
LOGIC OPERATORS { &, |, ! }
Actions RELATIONAL OPERATORS {<, >, ==, !=}
Instructions
KEYWORD {return, print}

Control
Structures KEYWORD { if, else, while, switch, case }

Language KEYWORD { void, int, char, string, float, boolean }

KEYWORD { true, false }


BINARY
INTEGER
Data FLOAT
HEXADECIMAL
OCTAL
CHAR
STRING
IDENTIFIER

Delimiter : ; , () {} []
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7
Assignment 2 | Grammar

<PROGRAM> à '{' <BODY> '}’

<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|<WHILE>|<IF>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Assignment 2 | Input

Are there syntactical errors?

{
float a;
x = 0;
int x;
y = 1 + 1;
x = (0b11) +(05 – 0xFF34);
while (2 == "hi") {
a = 2 > (4 + Y);
if (true) { if( 2 + 2 ) {} else {} }
}
print ("hello" + "world");
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Assignment 2 | Input

Are there syntactical errors?

{
int x;
x = 5;
x = 05;
x = 0x5ff;
x = 5.55;
x = "five";
x = ’5';
x = false;
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Assignment 2 | Input

Are there syntactical errors?

{
int x;
float x;
string x;
char x;
void x;
boolean x;
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Assignment 2 | Input

Are there syntactical errors?

{
x = "hello" + "world" – 'w' * 5 / 3.4;
x = y – hello & 0xffff | 05;
x = -7;
x = !y;
x = (cse340 + cse310) / cse101 ;
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Assignment 2 | Code

------------ program(root);

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Assignment 2 | Code

PROGRAM

public static void RULE_PROGRAM() {

if (tokens.get(currentToken).getWord().equals(“{”)) {
currentToken++;
else
error(1);

RULE_BODY();

if (tokens.get(currentToken).getWord().equals(“}”))
currentToken++;
else
error(2);
}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14
Assignment 2 | Code
public static void RULE_BODY() {
BODY
while (!tokens.get(currentToken).getWord().equals(“}”)) {
if (tokens.get(currentToken).getToken().equals(“identifier”)) {
RULE_ASSIGNMENT();
if (tokens.get(currentToken).getWord().equals(“;”)) {
currentToken++;
else error(3);
} else if (tokens.get(currentToken).getToken().equals(“int”) | ...) {
RULE_VARIABLE();
if (tokens.get(currentToken).getWord().equals(";")) {
currentToken++;
else error(3);
} else if (tokens.get(currentToken).getWord().equals(“while”)) {
RULE_WHILE();
} else if (tokens.get(currentToken).getWord().equals(“if”)) {
RULE_IF();
} else if (tokens.get(currentToken).getWord().equals(“return”)) {
RULE_RETURN();
if (tokens.get(currentToken).getWord().equals(“;”)) {
currentToken++;
else
error(3);
} else error(4);
}

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15
Assignment 2 | Code

ASSIGNMENT

public static void RULE_ASSIGNMENT() {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16
Assignment 2 | Code
VARIABLE public static void RULE_VARIABLE() {

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Assignment 2 | Code

WHILE

public static void RULE_WHILE() {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18
Assignment 2 | Code

IF

public static void RULE_IF() {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19
Assignment 2 | Code

RETURN

public static void RULE_RETURN() {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20
Assignment 2 | Code

PRINT

public static void RULE_PRINT () {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21
Assignment 2 | Code
C public static void RULE_C() {

}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22
PREDICTIVE
DESCENDENT
RECURSIVE

PARSER
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23
Concepts

{
int a;
a = 0xFF + 0b111; PREDICTIVE
while (a != 05) {
if (true) { DESCENDENT
a = 2.5e-1 / 7;
} else {
a = 'A’;
while(true) {
RECURSIVE
}
}
}

}
print ("hello"); PARSER
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24
Homework

Programming Assignment #2

(Complete Levels 1 to 3)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 13:

Parsing Techniques I

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-38
Office Hours: By appointment
Assignment 2

• Understand de provided source code


1 (3 rules)

• Program the rules PROGRAM, BODY, EXPRESSION, X, Y, R, E, C


2 (11 rules)

• Program the full set of rules in the grammar


3 (16 rules)

• Report syntactical errors (one error and stop)


4 PANIC MODE

• Implement error synchronization


5 ERROR RECOVERY

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Programming Assignment 2
Level 4

Handling Syntactical Errors (part 1):


Error messages

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Assignment 2 | Grammar

<PROGRAM> à '{' <BODY> '}’

<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|<WHILE>|<IF>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Error Synchronization
PROGRAM
  RETURN  
BODY  
PRINT   A

  EXPRESSION   B
 
X  
C

   
Y
ASSIGNMENT

VARIABLE

   
 
R

   
WHILE E

IF
 
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5
Assignment 2

Input:
{}

Output:
Build successful

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Assignment 2

Input:
{
hello word
}

Output:
Line 2: expected =
Line 2: expected ;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Assignment 2

Input:
{
int x
int
int int x;
}

Output:
Line 2: expected ;
Line 3: expected identifier
Line 3: expected ;
Line 4: expected identifier

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Assignment 2

Input:
{
x = a;
x = 0x36AW;
x = ((((((((((y))))))))));
x = (5+(4-(3+(5+5/(2+(3+(1+(77+(1-(y)))))))))) + “hello” + ‘q’;
if (a < b) {} else {}
if (a < b) {
if (a < b) {
} else {
}
}
}

Output:
Line 3: expected value, identifier or (

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Parser | Error Points

Line N: expected {

PROGRAM

Line N: expected }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Parser | Error Points

Line N: expected ;

BODY

Line N: expected identifier or keyword

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Parser | Error Points

ASSIGNMENT

Line N: expected =

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Parser | Error Points

VARIABLE

Line N: expected identifier

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Parser | Error Points

Line N: expected )

WHILE

Line N: expected (

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Parser | Error Points

Line N: expected ( Line N: expected )

IF

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Parser | Error Points

RETURN

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Parser | Error Points

Line N: expected ( Line N: expected )

PRINT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Parser | Error Points

EXPRESSION

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Parser | Error Points
C

Line N: expected value, identifier or (

Line N: expected )
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19
Assignment 2
public static void error(int err) {
int n = tokens.get(currentToken).getLine();
switch (err) {
case 1: gui.writeConsole("Line” + n + ": expected {”); break;
case 2: gui.writeConsole("Line” + n + ": expected }”); break;
case 3: gui.writeConsole("Line” + n + ": expected ;”); break;
case 4:
gui.writeConsole("Line” +n+": expected identifier or keyword”);
break;
case 5:
gui.writeConsole("Line” +n+": expected =”); break;
case 6:
gui.writeConsole("Line” +n+": expected identifier”); break;
case 7:
gui.writeConsole("Line” +n+": expected )”); break;
case 8:
gui.writeConsole("Line” +n+": expected (”); break;
case 9:
gui.writeConsole("Line” +n+": expected value, identifier, (”);
break;
}
} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20
Updates
1. In Gui.java: make the method writeConsole public

2. In Gui.java: add this as a second parameter for the method Parser.run()

3. In Parser.java:
add the attribute gui (line 15)
add a second parameter to the method run (line 17)
initialize gui (line 18)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Programming Assignment 2
Level 5

Handling Syntactical Errors (part 2):


Error Recovery

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


Error Recovery

Input:
{ We will not allow multi-
line expressions; line 3
x = a; and 4 should not be
x = 1 + ( considered as follow:
x = (y;)
if (a < b + ) {} else {} x= 1 + ( x = ( y;)
if (a < b) {
if (a < b) {
} else {

}
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


Error Synchronization
PROGRAM
  RETURN  
BODY  
PRINT   A

  EXPRESSION   B
 
X  
C

   
Y
ASSIGNMENT

VARIABLE

   
 
R

   
WHILE E

IF
 
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24
Error Recovery

Output:
Line 3: expected value, identifier, (
Line 3: expected )
Line 3: expected ;
// move to the next line

Line 4: expected )
Line 4: expected identifier or keyword
// infinite loop or end

Line 5: expected value, identifier, (


// simple

Line 12: expected }


// reported by program

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


Error Recovery

At this point:

• Errors do not increment currentToken.

• currentToken increase when the token is used


(added to the tree).

• Error recovery is about ignoring tokens

• How to know which tokens should be ignored?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 26


Error Recovery

to be continued...

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 27


Homework

Programming Assignment #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 28


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 14:

Parsing Techniques II

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-38
Office Hours: By appointment
Error Recovery

Input:
{ We will not allow multi-
line expressions; line 3
x = a; and 4 should not be
x = 1 + ( considered as follow:
x = (y;)
if (a < b + ) {} else {} x= 1 + ( x = ( y;)
if (a < b) {
if (a < b) {
} else {

}
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Error Recovery
PROGRAM
  RETURN  
BODY  
PRINT   A

  EXPRESSION   B
 
X  
C

   
Y
ASSIGNMENT

VARIABLE

   
 
R

   
WHILE E

IF
 
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3
Error Recovery

Output:
Line 3: expected value, identifier, (
Line 3: expected )
Line 3: expected ;
// move to the next line

Line 4: expected )
Line 4: expected identifier or keyword
// infinite loop or end

Line 5: expected value, identifier, (


// simple

Line 12: expected }


// reported by program

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Error Recovery

At this point:

• Errors do not increment currentToken.

• currentToken increase when the token is used


(added to the tree).

• Error recovery is about ignoring tokens

• How to know which tokens should be ignored?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Parser | Error Recovery

Line N: expected {
currentToken++;
Searching for
FIRST(BODY) or }

PROGRAM

Line N: expected }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Parser | Error Recovery

Line N: expected ;
BODY

Line N: expected identifier or keyword


currentToken++;
Searching for
FIRST(BODY) or FOLLOW(BODY)
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7
Parser | Error Recovery

ASSIGNMENT

Line N: expected =
currentToken++;
Searching for
FIRST(EXPRESSION) or FOLLOW(EXPRESSION)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Parser | Error Recovery

VARIABLE

Line N: expected identifier

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Parser | Error Recovery
currentToken++;
Searching for
FIRST(PROGRAM) or FOLLOW(PROGRAM)
Line N: expected )

WHILE

Line N: expected (
currentToken++;
Searching for
FIRST(EXPRESSION) or )

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Parser | Error Recovery

currentToken++; currentToken++;
Searching for Searching for
FIRST(EXPRESSION) or ) FIRST(PROGRAM) or
FOLLOW(PROGRAM)
Line N: expected ( Line N: expected )

IF

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Parser | Error Recovery

RETURN

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Parser | Error Recovery

currentToken++;
Searching for
FIRST(EXPRESSION) or )

Line N: expected ( Line N: expected )

PRINT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Parser | Error Recovery

EXPRESSION

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Parser | Error Recovery
C

Line N: expected value, identifier or (

Line N: expected )
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15
Parser | Error Recovery
Rule FIRST set FOLLOW set
PROGRAM { EOF

BODY FIRST (PRINT) U FIRST (ASIGNMENT) U FIRST(VARIABLE) U FIRST }


(WHILE) U FIRST(IF) U FIRST (RETURN)
PRINT print ;

ASSIGNMENT identifier ;

VARIABLE int, float, boolean, void, char, string ;

WHILE while } U FIRST(BODY)

IF if } U FIRST(BODY)

RETURN return ;

EXPRESSION FIRST(X) ), ;

X FIRST(Y) | U FOLLOW(EXPRESSION)

Y ! U FIRST(R) & U FOLLOW(X)

R FIRST(E) FOLLOW(Y)

E FIRST (A) !=, ==, >, < U FOLLOW(R)

A FIRST (B) -, + U FOLLOW(E)

B - U FIRST (C) *, /, U FOLLOW(A)

C integer, octal, hexadecimal, binary, true, false, string, char, float, identifier, ( FOLLOW(B)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Parser | Error Recovery

If ( tokens.get(currentToken).getLine() <
tokens.get(currentToken+1).getLine() ) {

// go back until reaching RULE_BODY()

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Homework

Programming Assignment #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages

Lecture 15:

Parsing Techniques III

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-38
Office Hours: By appointment
Parser | Error Recovery
Rule FIRST set FOLLOW set
PROGRAM { EOF

BODY FIRST (PRINT) U FIRST (ASIGNMENT) U FIRST(VARIABLE) U FIRST }


(WHILE) U FIRST(IF) U FIRST (RETURN)
PRINT print ;

ASSIGNMENT identifier ;

VARIABLE int, float, boolean, void, char, string ;

WHILE while } U FIRST(BODY)

IF if } U FIRST(BODY)

RETURN return ;

EXPRESSION FIRST(X) ), ;

X FIRST(Y) | U FOLLOW(EXPRESSION)

Y ! U FIRST(R) & U FOLLOW(X)

R FIRST(E) FOLLOW(Y)

E FIRST (A) !=, ==, >, < U FOLLOW(R)

A FIRST (B) -, + U FOLLOW(E)

B - U FIRST (C) *, /, U FOLLOW(A)

C integer, octal, hexadecimal, binary, true, false, string, char, float, identifier, ( FOLLOW(B)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Calculating the First Set

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


FIRST set

Definition

FIRST (a) is the set of tokens that can begin the construction a.

Example

<E> → <A> {+ <A>}


<A> → <B> {* <B>}
<B> → -<C> | <C>
<C> → integer

FIRST(E) = {-, integer}


FIRST(A) = {-, integer}
FIRST(B) = {-, integer}
FIRST(C) = {integer}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


FIRST set

Define FIRST (BODY)


FIRST(BODY) =
FIRST (PRINT) U FIRST (ASSIGNMENT) U FIRST(VARIABLE) U FIRST(WHILE) U
FIRST(IF) U FIRST(RETURN)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


FIRST set

Define FIRST (C)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


FIRST set

Define FIRST (A)

Define FIRST (B)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


FIRST set

<S> → <A><B><C>
<S> → <F>
<A> → <E><F>d
<A> → a
<B> → a<B>b
<B> → ε
<C> → c<C>
<C> → d
<E> → e<E>
<E> → <F>
<F> → <F>f
<F> → ε
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8
Calculate the FIRST set

1. FIRST(X) = {X}
if X is a terminal

2. FIRST(ε) = {ε}.
note that this is not covered by the first rule because
ε is not a terminal.

3. If A → Xα, add FIRST(X) − {ε} to FIRST(A)

4. If A → A1A2A3 ...AiAi+1 ... Ak and


ε ∈ F IRST (A1) and ε ∈ FIRST (A2) and . . . and ε ∈ FIRST (Ai),
then add FIRST (Ai+1) − {ε} to FIRST (A).

5. If A → A1A2A3 ...Ak and


ε ∈ FIRST(A1) and ε ∈ FIRST(A2) and... and ε ∈ FIRST(Ak),
then add ε to FIRST(A).

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Calculate the FIRST set

loop

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


FIRST set

S → ABC rule FIRST  set  -­‐‑  evolution


S → F
S ø {a, ε} {a, ε, e, f} {a, ε, e, f, d}
A → EFd
A → a A ø {a} {a, e} {a, e, f, d}
B → aBb
B → ε B ø {a, ε}
C → cC
C → d C ø {c, d}
E → eE
E → F E ø {e} {e, ε} {e, ε, f}
F → Ff
F → ε F ø {ε} {ε, f}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


FIRST set | Exercise

<X> → <A> | <A> a


<A> → <B> | <B> b
<B> → <C><D><E> | c d e | <C> c <D> d <E> e
<C> → one
<D> → two
<E> → three

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


FIRST set | Exercise

<X> → <A> OPTION 1:


<X> → <A> a FIRST(X) = {c, one}
<A> → <B> FIRST(A) = {c, one}
<A> → <B> b FIRST(B) = {c, one}
FIRST(C) = {one}
<B> → <C><D><E>
FIRST(D) = {two}
<B> → c d e
FIRST(E) = {three}
<B> → <C> c <D> d <E> e
<C> → one
OPTION 2:
<D> → two
FIRST(X) = {c, one, ε}
<E> → three
FIRST(A) = {b, c, one}
FIRST(B) = {c, one}
FIRST(C) = {one}
FIRST(D) = {two}
FIRST(E) = {three}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


FIRST set | Exercise

<X> → <A> FIRST(X) = {c, one, three, two}


FIRST(A) = {c, one, three, two}
<X> → <A> a FIRST(B) = {c, one, three, two}
<A> → <B> FIRST(C) = {one, ε}
FIRST(D) = {two, ε}
<A> → <B> b FIRST(E) = {three}
<B> → <C><D><E>
<B> → c d e
<B> → <C> c <D> d <E> e
<C> → one
<C> → ε
<D> → two
<D> → ε
<E> → three

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Calculating the Follow Set

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


FOLLOW set
Definition

FOLLOW (a) is the set of tokens that can follow the construction a.

Example

<E> → <A> {+ <A>}


<A> → <B> {* <B>}
<B> → -<C> | <C>
<C> → integer

5 + 4 + -7 * 12 + 75
5 + 4 + ((-7) * 12) + 75

What follows <C> ?


What follows <A> ?
What follows <E> ?
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16
FOLLOW set
Definition

FOLLOW (a) is the set of tokens that can follow the construction a.

Example

<E> → <A> {+ <A>}


<A> → <B> {* <B>}
<B> → <C> | <C>
<C> → integer

FOLLOW(E) = {$} // $ represents end of input, i.e., EOF


FOLLOW(A) = {+, $}
FOLLOW(B) = {*, +, $}
FOLLOW(C) = {*, +, $}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


FOLLOW set

Define FOLLOW (BODY)


FIRST(BODY) = }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


FOLLOW set

Define FOLLOW (C)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


FOLLOW set

<S> → <A><B><C>
<S> → <F>
<A> → <E><F>d
<A> → a
<B> → a<B>b
<B> → ε
<C> → c<C>
<C> → d
<E> → e<E>
<E> → <F>
<F> → <F>f
<F> → ε
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20
Calculate the FOLLOW set

1. First put $ (the end of input marker) in Follow(S) (S is the start


symbol)

2. If there is a production A → aBb,


(where a can be a whole string)
then everything in FIRST(b) except for ε is placed in FOLLOW(B).
(apply the rule 4 in calculate FIRST set)

3. If there is a production A → aB,


then add FOLLOW(A) to FOLLOW(B)

4. If there is a production A → aBb,


where FIRST(b) contains ε,
then add FOLLOW(A) to FOLLOW(B)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Calculate the FOLLOW set

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


FOLLOW set

S → ABC rule FOLLOW  set  -­‐‑  evolution FIRST sets:

S → F S={a,ε,e,f,d}
S {eof}
A={a, e, f, d}
A → EFd B={a, ε}
A → a A {a} {a, c, d} C= {c, d}
E={e, ε, f}
B → aBb F={ε,f}
B → ε B {c, d} {c, d, b}

C → cC
C → d C {eof}

E → eE
E {f} {f, d}
E → F
F → Ff F {eof} {eof, d} {eof, d, f}

F → ε

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


Another Example

<E> → <T> {+<T>}


<T> → <F> {*<F>}
<F> → (<E>) | integer

FIRST (E) = {(, integer}


FIRST (T) = {(, integer}
FIRST (F) = {(, integer}

FOLLOW(E) = {$, )}
FOLLOW(T) = {$, ),+ }
FOLLOW(F) = {$, ),+, * }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24


Prediction Rules

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


Prediction Rules

Rule 1.
It should always be possible to choose among several
alternatives in a grammar rule.

FIRST(R1) FIRST(R2) FIRST(R3)... FIRST(Rn) = Ø

BODY

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 26


Prediction Rules

Rule 1.1
The FIRST sets of any two choices in one rule must not
have tokens in common in order to implement a single-
symbol look ahead predictive parser.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 27


Prediction Rules

Rule 2.
For any optional part, no token beginning the optional part
can also come after the optional part.

FIRST(RULE) != FOLLOW(RULE)

BODY PROGRAM

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 28


Homework

Programming Assignment #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 29


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 16:

Grammars III

Javier Gonzalez-Sanchez

[email protected]
BYENG M1-38
Office Hours: By appointment
Chomsky Hierarchy

*Noam Chomsky
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2
Chomsky Hierarchy

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Chomsky Hierarchy

Type Name Example Use Recognizing Parsing


Automata Complexity

0 Recursively Turing Machine Undecidable


Enumerated

1 Context Linear Bounded NP Complete


Sensitive Automata

2 Context Free Arithmetic Pushdown O(n3)


Expression Automata

x = a + b * 75

3 Regular Identifier Finite O(n)


Automata
a110

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Chomsky Hierarchy

Type 0 - Recursively Enumerated


structure: α → β
where α and β are any string of terminals and nonterminals

Type 1 - Context-sensitive
structure: αXβ → αγβ
where X is a non-terminal, and α,β,γ are any string of terminals and nonterminals, (γ
must be non-empty).

Type 2 - Context-free
structure: X → γ|ε
where X is a nonterminal and γ is any string of terminals and nonterminals (may be
empty). It is discouraged to use only one nonterminal as γ.

Type 3 – Regular
structure: X → αY | α |ε
where X,Y are single nonterminals, and α is a string of terminals;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Regular Grammar

Type 3 – Regular
structure: X → αY | α |ε
where X,Y are single nonterminals, and α is a string of terminals;

<DIGIT> → integer| | integer<DIGIT>

<Q0> → a<Q1> | b<Q0>


<Q1> → a<Q1> | b<Q0> | ε

<S> → a<S> |b<A>


<A> → c<A> | ε

<A> → ε
<A> → a
<A> → <B>a
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6

 
Regular Grammar  

The Regular Grammars are either left of right:

Right Regular Grammars: Left Regular Grammars:

Rules of the forms Rules of the forms


<A> → ε <A> → ε
<A> → a <A> → a
<A> → a<B> <A> → <B>a

A,B: nonterminals and A,B: nonterminals and


a: terminal a: terminal

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Regular Grammar

• Key point: The derivation process is “linear”

• Example of Derivation process:

<S> → a<S> | b<A>


<A> → c<A> | ε
 
The grammar is equivalent to the regular expression a*bc*

S → aS → aaS → …
→ a…aS →a…abA →a…abcA
→ a…abccA → …
→ a…abc…c

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Context-Free Grammar

Type 2 - Context-free
structure: X → γ|ε

where X is a nonterminal and γ is any string of terminals and


nonterminals (may be empty). It is discouraged to use only one
nonterminal as γ.

<S> → <A><S> | ε
<A> → 0 <A> 1| <A>1 | 0 1

<S> → <NP><VP>
<NP> → the <N>
<VP> → <V><NP>
<V> → sings | eats
<N> → cat | song | canary  

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Context-Sensitive Grammar

Type 1 - Context-sensitive
structure: αXβ → αγβ

where X is a non-terminal, and α,β,γ are any string of terminals


and nonterminals, (γ must be non-empty).

<S>→ a<S><B><C>
<S> → a<B><C>
<C><B> → <H><B>
<H><B> → <H><C>
<H><C> → <B><C>
a<B> → a b
b<B> → b b
b<C> → b c
c<C> → c c
*The language generated by this grammar is {anbncn|n ≥ 1} .
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10
Chomsky Hierarchy

Type 0 - Recursively Enumerated


structure: α → β
where α and β are any string of terminals and nonterminals

Type 1 - Context-sensitive
structure: αXβ → αγβ
where X is a non-terminal, and α,β,γ are any string of terminals and nonterminals, (γ
must be non-empty).

Type 2 - Context-free
structure: X → γ|ε
where X is a nonterminal and γ is any string of terminals and nonterminals (may be
empty). It is discouraged to use only one nonterminal as γ.

Type 3 – Regular
structure: X → αY | α |ε
where X,Y are single nonterminals, and α is a string of terminals;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Exercise

G1 { G3 {
<s> à<A> <s>à <B><A><B>
<s> à<A><A><B> <s>à <A><B><A>
<A> àaa <A>à <A><B>
<A>a à<A><B>a <A>à a<A>
<A><B> à<A><B><B> <A>à ab
<B>b à<A><B>b <B>à <B><A>
<B> àb <B>à b
} }

G2 { G4 {
<s> à b<s> <s>à <A><B>
<s> à a<A> <A>à a<A>
<s> à b <A>à a
<A>à a<s> <B>à <B>b
<A>à b<A> <B>à b
<A>à a <A><B>à <B><A>
} }

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Examples

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Types of Grammars

<E> → e<E>
<E> → <F>
<F> → <F>f
<F> → ε

Is this a regular grammar (X → αY | α |ε) ?

Is this a context-free grammar (X → γ|ε ) ?

Is this a context-sensitive grammar (αXβ → αγβ) ?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Types of Grammars

if <condition> then → if “(“ <condition> “)” then

<condition> → <A> “=“ <B>

Is this a regular grammar (X → αY | α |ε) ?

Is this a context-free grammar (X → γ|ε ) ?

Is this a context-sensitive grammar (αXβ → αγβ) ?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Types of Grammars

if “(“ <condition> “)” then → if <condition> then

<condition> → <A> “=“ <B>

Is this a regular grammar (X → αY | α |ε) ?

Is this a context-free grammar (X → γ|ε ) ?

Is this a context-sensitive grammar (αXβ → αγβ) ?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Types of Grammars

<A> → ε

Is this a regular grammar (X → αY | α |ε) ?

Is this a context-free grammar (X → γ|ε ) ?

Is this a context-sensitive grammar (αXβ → αγβ) ?

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Chomsky Hierarchy

Type 0 - Recursively Enumerated


structure: α → β
where α and β are any string of terminals and nonterminals

Type 1 - Context-sensitive
structure: αXβ → αγβ
where X is a non-terminal, and α,β,γ are any string of terminals and nonterminals, (γ
must be non-empty).

Type 2 - Context-free
structure: X → γ|ε
where X is a nonterminal and γ is any string of terminals and nonterminals (may be
empty). It is discouraged to use only one nonterminal as γ.

Type 3 – Regular
structure: X → αY | α |ε
where X,Y are single nonterminals, and α is a string of terminals;

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 17:

Semantic Analysis I

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
Semantic Analysis

Understanding the meaning

i.e.,

Interpreting expressions in their context

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Semantic Analysis

1. Declaration and Unicity. Review for uniqueness and that the variable
has been declared before its usage.

2. Types. Review that the types of variables match the values assigned
to them.

3. Array’s indexes. Review that the indexes are integers.

4. Conditions. Review that all expressions on the conditons return a


boolean value.

5. Return type. Review that the value returned by a method match the
type of the method.

6. Parameters. Review that the parameters in a method match in type


and number with the declaration of the method.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Study Cases
Case 1: Case 4:
int i; int a; int b; int c, d;
char j; int m; char c1, c2;
void method(int n, char c) { int test1(int x, int y) {
int n; short l; return x+y;
i = j; i = m; }
} void main() {
int i; i = a++;
Case 2: i = test1(a, b);
int i, j; i = test1(c1, c2);
void method() { i = test1(a, c1);
int i = 5; } }
int j = i + i;
int i = i + i; Case 5:
} int i, m; boolean j;
public void main() {
Case 3: int m; int a[];
int i, m, k; boolean j; a = new int[j];
void main() { }
if (i>5) { ++i; }
while (i + 1) { ++i; } Case 6:
do {++i; } while (i); int i;
for (i = 0; m; ++i) { void main(int m) {
k++; i++; return i;
} }
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


1. Variable Declaration and Unicity

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Symbol Table
name type scope value
int i;
 i int global
char j; int m;
void method(int n, char c) { j char global
int n; short l;
i = j; i = m; m int global
}
method-­‐‑int-­‐‑char void function
void method() {
n int method-­‐‑int-­‐‑char
int i = 5;
int j = i + i;
} l short method-­‐‑int-­‐‑char

int k;
method void function
int method(int i) {
if (i>5) { ++i; } i int method
while (i + 1) { ++i; }
j int method
do {++i; } while (i);
for (i = 0; m; ++i) { k int global
k++;
} method-­‐‑int int function
}
i Int method-­‐‑int

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Symbol Table
name type scope value
int i;
 i int global
char j; int m;
void method(int n, char c) { j char global
int n; short l;
i = j; i = m; m int global
}
method-­‐‑int-­‐‑char void function
void method() {
n int method-­‐‑int-­‐‑char
int i = 5;
int j = i + i;
} l short method-­‐‑int-­‐‑char

int k;
method void function
int method(int i) {
if (i>5) { ++i; } i int method
while (i + 1) { ++i; }
j int method
do {++i; } while (i);
for (i = 0; m; ++i) { k int global
k++;
} method-­‐‑int int function
}
i Int method-­‐‑int

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Symbol Table
names bindings
int int int
i Method-int method global

int char
j method global

int
m
global

void
method-int-char
function

n int
method-int-char

short
l
method-int-char

void
method
function

int
k global

int
method-int function

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Exercise

int a, b; char c, d; float e,f;

void foo1(int a) {

// float a = 1;
float w = a;
}

void foo2(char b) {

int a = c + d;
}

int foo3() { Create  the  symbol  table

int i = a + b;
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Programming Assignment 3

Level 1

Reviewing Declaration and Unicity

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Symbol Table

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Symbol Table

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Grammar

<PROGRAM> à '{' <BODY> '}’

<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|<WHILE>|<IF>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


A3 :: Parser Update
VARIABLE

void rule_variable( ) {
. . .

if (tokens.get(currentToken1).getType().equals(“identifier”)) {
SemanticAnalizer.CheckVariable(
tokens.get(currentToken-1).getWord(),
tokens.get(currentToken).getWord()
);
currentToken++;
} else {
error (6);
}

. . .

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


A3 :: SemanticAnalyzer.java

public class SemanticAnalizer {

private Hashtable<String, Vector<SymbolTableItem>> symbolTable;

public static void CheckVariable(string type, string id) {

// A. search the id in the symbol table

// B. if !exist then insert: type, scope=global, value={0, false, "", ’’}

// C. else error: “variable id is already defined”

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


A3 :: Review

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Homework

Programming Assignment 3

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 18:

Semantic Analysis II

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
Semantic Analysis

1. Declaration and Unicity. Review for uniqueness and that the variable
has been declared before its usage.

2. Type Matching. Review that the types of variables match the values
assigned to them.

3. Array’s indexes. Review that the indexes are integers.

4. Conditions. Review that all expressions on the conditons return a


boolean value.

5. Return type. Review that the value returned by a method match the
type of the method.

6. Parameters. Review that the parameters in a method match in type


and number with the declaration of the method.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


2. Type Matching

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Type matching | Example 1

int x, y, z;
char p, q, r;
float a, b, c; x =
boolean foo;

void method() {

x = a * c + p;
}
+

a c p
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4
Type matching | Cube

fill one sheet for


each operator in
the language
cube
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5
Type matching | Cube

OPERATOR int float char string boolean void

int

float

char

string

boolean

void

cube

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Type matching | Cube

OPERATOR int float char string boolean void


+
int int float X string X X

float float float X string X X

char X X X string X X

string string string string string string X

boolean X X X string X X

void X X X X X X

cube

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Type matching | Cube

OPERATOR int float char string boolean void


&
int X X X X X X

float X X X X X X

char X X X X X X

string X X X X X X

boolean X X X X boolean X

void X X X X X X

cube

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Type matching | Cube

OPERATOR int float char string boolean void


<
int boolean boolean X X X X

float boolean boolean X X X X



char X X X X X X

string X X X X X X

boolean X X X X X X

void X X X X X X

cube

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Type matching | Cube

OPERATOR int float char string boolean void


=
int OK X X X X X

float OK OK X X X X

char X X OK X X X

string X X X OK X X

boolean X X X X OK X

void X X X X X OK

cube

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Type matching | Cube

OPERATOR int float char string boolean void OPERATOR int float char string boolean void
- +
int float X X X X int int float X string X X

float float float X string X X

char X X X string X X

string string string string string string X

boolean X X X string X X

void X X X X X X

cube (- unary) cube (- binary)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Type matching | Example 1

int x, y, z;
char p, q, r;
float a, b, c; x =
boolean foo;

void method() {

x = a * c + p;
}
+

a c p
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12
Type matching | Example 2
symbol table
int a;

int c (int b) {
return b * 3 * 2 * 1 ;
}

void main () {
a = 1;
boolean a= c(14)/2 > 1;
}

cube for stack


matching types

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Programming Assignment 3

Level 2

Reviewing Type Matching

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Grammar

<PROGRAM> à '{' <BODY> '}’

<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|<WHILE>|<IF>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Assignment 2 | Code
C

1.
// except for the open parenthesis

SemanticAnalizer.pushStack(
tokens.get(currentToken).getToken()
);

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Parser
1. Store in a flag (operatorWasUSed):

Did the operator ‘-’ exist?

2.
if (operatorWasUsed)
String x = SemanticAnalizer.popStack();
String result = SemanticAnalizer.checkCube (x, “-” );
SemanticAnalizer.pushStack(result);
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Parser
1. Store in a flag (twiceHere): 2. Store the operator that
Did we pass this point twice? creates the loop?

3.
if (twiceHere)
String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, operator );


SemanticAnalizer.pushStack(result);

twiceHere = false; // reset the flag


} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18
Parser
1. Store in a flag (twiceHere): 2. Store the operator that
Did we pass this point twice? creates the loop?

3.
if (twiceHere)
String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, operator );


SemanticAnalizer.pushStack(result);

twiceHere = false; // reset the flag


} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19
Parser
1. Store in a flag (twiceHere): 2. Store the operator that
Did we pass this point twice? creates the loop?

3.
if (twiceHere)
String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, operator );


SemanticAnalizer.pushStack(result);

twiceHere = false; // reset the flag


} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20
Parser
1. Store in a flag (operatorWasUSed):

Did the operator ‘-’ exist?

2.
if (operatorWasUsed)
String x = SemanticAnalizer.popStack();
String result = SemanticAnalizer.checkCube (x, “!” );
SemanticAnalizer.pushStack(result);
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Parser
1. Store in a flag (twiceHere):
Did we pass this point twice?

2.
if (twiceHere)
String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, “&” );


SemanticAnalizer.pushStack(result);

twiceHere = false; // reset the flag


} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22
Parser
1. Store in a flag (twiceHere):
Did we pass this point twice?

2.
if (twiceHere)
String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, “&” );


SemanticAnalizer.pushStack(result);

twiceHere = false; // reset the flag


} Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23
Assignment 2 | Code

String x = SemanticAnalizer.popStack();
String y = SemanticAnalizer.popStack();

String result = SemanticAnalizer.checkCube (x, y, “=” );

if (!result.equals(“OK”) {
error(2);
}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24
Homework

Programming Assignment 3

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 19:

Semantic Analysis III

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
Semantic Analysis

1. Declaration and Unicity. Review for uniqueness and that the variable
has been declared before its usage.

2. Types. Review that the types of variables match the values assigned
to them.

3. Array’s indexes. Review that the indexes are integers.

4. Conditions. Review that all expressions on the conditons return a


boolean value.

5. Return type. Review that the value returned by a method match the
type of the method.

6. Parameters. Review that the parameters in a method match in type


and number with the declaration of the method.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


3. Conditions

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Example
symbol table
int a, b;
boolean c;
{
a = 4;
b = a + 1;
IF (a > b) {
print (a);
}
}

cube for stack


matching types

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Programming Assignment 3

Level 3

Reviewing Conditions

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Grammar

<PROGRAM> à '{' <BODY> '}’

<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|<WHILE>|<IF>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’

<EXPRESSION> à <X> {'|' <X>}


<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Conditions

WHILE

String x = SemanticAnalizer.popStack();

if (!x.equals(“boolean”) {
error(3);
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Conditions

IF

String x = SemanticAnalizer.popStack();

if (!x.equals(“boolean”) {
error(3);
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Semantic Analysis

þ Declaration and Unicity. Review for uniqueness and that the variable
has been declared before its usage.

þ Type Matching. Review that the types of variables match the values
assigned to them.

¨ Array’s indexes. Review that the indexes are integers.

þ Conditions. Review that all expressions on the conditons return a


boolean value.

¨ Return type. Review that the value returned by a method match the
type of the method.

¨ Parameters. Review that the parameters in a method match in type


and number with the declaration of the method.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Workshop
SemanticAnalyzer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


SemanticAnalyzer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


SemanticAnalyzer.java

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Homework

Programming Assignment 3

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 20:

Intermediate Code I

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
A Compiler in Action

Words
Lexer Tokens

Analysis Parser Sentences

Semantic   Symbol table


Uniqueness
Analyzer
Type matching

Compilation Code   Translation


Assembly Source Code è Intermediate Code
Generation
Intermediate Code è Machine or Binary Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Source Code

{
int a;
int b;
int c;
int d;
if (a != 5) {
b = c + d;
}
print (a);

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Intermediate (Object) Code
a,int,global 0
b,int,global 0
c,int,global 0 Symbol table
d,int,global 0
#E1,int,label,9
#P,int,label,1
@
lod a, 0
lit 5, 0
opr 14, 0
jmc #e1, false
lod c, 0
lod d, 0 Instructions
opr 2, 0
sto b, 0
lod a, 0
opr 21, 0
opr 1, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Translate Source to Object
a,int,global 0
{ b,int,global 0
c,int,global 0
int a;
d,int,global 0
int b;
#E1,int,label,9
int c; #P,int,label,1
int d; @
lod , 0
lit 5, 0
if (a != 5) {
opr 14, 0
jmc #e1, false
b = c + d; lod c, 0
} lod d, 0
print (a); opr 2, 0
sto b, 0
}
lod a, 0
opr 21, 0
opr 1, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


High-Level Languages
compilation execution

// sorce code Lexer


int x; Virtual Machine
(interpreter)
int foo () { Parser
read (x);
print (5);
}
Semantic Analyzer 5
main () {
foo ();
} Code Generation

X,E,G,O,O
01001010101000010
#e1,I,I,0,7
01010100101010010
@
10100100000011011
OPR 19, AX
11010010110101111
STO x, AX
00010010101010010
LIT 5, AX
10101001010101011
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX Assembler
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
High-Level Languages
compilation execution

// sorce code Lexer


int x; Virtual Machine
(interpreter)
int foo () { Parser
read (x);
print (5);
}
Semantic Analyzer 5
main () {
foo ();
} Code Generation

X,E,G,O,O
01001010101000010
#e1,I,I,0,7
01010100101010010
@
10100100000011011
OPR 19, AX
11010010110101111
STO x, AX
00010010101010010
LIT 5, AX
10101001010101011
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX Assembler
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7
A Simple Virtual Machine

Memory







CPU

ALU




Register




Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8
A Simple Virtual Machine


Memory
       program


Code






sto 0, s
sto 0, d
sto 0, c


sto 0, d


lod s, 0
lit “s”, 0
Symbol  

opr 14, 0


jmc #a1, false
lod b, 0 Table



CPU


ALU




Register




Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9
A Simple Virtual Machine


Memory
       program


Code






sto 0, s
sto 0, d
sto 0, c


sto 0, d


lod s, 0
lit “s”, 0
Symbol  

opr 14, 0


jmc #a1, false
lod b, 0 Table



CPU
PC

ALU




Register



Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Instructions




instruction first second
parameter parameter


Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Instructions

• LIT <value>, <register_id>


Put a constant value in a CPU register

Examples:

LIT 5, 0
LIT 5.5, 0
LIT 'a', 0
LIT ”hello”, 0
LIT true, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Instructions

• LOD <variable>, <register_id>


Search for <variable> in the symbol table
Read its value
Put the value of <variable> in the CPU register

Examples:

LOD a, 0
LOD hello, 0
LOD cse340, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Instructions

• STO <variable>, <register_id>


Read a value from the CPU register
Search for <variable> in the symbol table
Store the value into the variable

Examples:

STO a, 0
STO hello, 0
STO cse340, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Example

Source Code:
{
int a; int b;
a = 10;
b = a;

Symbol Table:
Type Name Scope Value
int a global 0
int b global 0

Intermediate (Object) Code:


LIT 10,0
STO a,0
LOD a,0
STO b,0
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15
Instructions

• OPR <operation>, <register_id>


Read one or two values from the CPU register
Do the operation <operation>
Put the result into the CPU register

Examples:

OPR 1, 0 ; return
OPR 2, 0 ; add
OPR 3, 0 ; subtract

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Operations
Number Action
0 Exit program
1 Return
2 ADD: POP A, POP B, PUSH B+A
3 SUBTRACT: POP A, POP B, PUSH B-A
4 MULTIPLY: POP A, POP B, PUSH B*A
5 DIVIDE: POP A, POP B, PUSH B/A
6 MOD: POP A, POP B, PUSH (B mod A)
7 POW: POP A, POP B, PUSH (A to the power B)
8 OR: POP A, POP B, PUSH (B or A)
9 AND: POP A, POP B, PUSH (B and A)
10 NOT: POP A, PUSH (not A)
11 TEST GREATER THAN: POP A, POP B, PUSH (B>A)
12 TEST LESS THAN: POP A, POP B, PUSH (B<A)
13 TEST GREATER THAN OR EQUAL: POP A, POP B, PUSH (B>=A)
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17
Operations
Number Action
14 TEST LESS THAN OR EQUAL: POP A, POP B, PUSH (B<=A)
15 TEST EQUAL: POP A, POP B, PUSH (B=A)
16 TEST NOT EQUAL: POP A, POP B, PUSH (B<>A)
17
18 clear screen
19 read a value from the standard input
20 print a value to the standard output
21 print a value to the standard output and a newline character

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Example
{ Type Name Scope Value
int a; int a global 0
int b; int b global 0
a = 10;
LIT 10, 0
b = a; STO a, 0
a = a * 10; LOD a, 0
b = 2 + 3 + 4; STO b, 0
} LOD a, 0
LIT 10, 0
OPR 4, 0 ; multiply
STO a, 0
LIT 2, 0
LIT 3, 0
OPR 2, 0 ; add
LIT 4, 0
OPR 2, 0 ; add
STO b, 0
OPR 1,0
OPR 0,0
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19
to be continued...

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 21:

Intermediate Code II

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
Review
8
hello

{ Virtual Machine
int a; (interpreter)
int b;
a = 5;
a, int, global, 0
b = 9; b, int, global, 0
a = a + b / 3; @
print (a); LIT 5, 0
STO a, 0
print ("hello"); LIT 9, 0
} STO b, 0
LOD a, 0
LOD b, 0
Lexer
LIT 3, 0
OPR 4, 0
Parser OPR 2, 0
STO a, 0
LOD a, 0
OPR 21,0
Semantic Analyzer Code Generation LOD "hello", 0
OPR 21,0
OPR 0, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 2


Review

{ Lexer
int a;
int b; Parser
boolean foo;

a = 10 + 20 + 30 + 40; Semantic Analyzer Code Generation


print (a);

foo = 340 > 126;


print (foo);

a = a / 2;
print ("total:" + a);

return;
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Review

100
true
total: 50

Virtual Machine
(interpreter)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Assignment #4

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Assignment #4
To show
proficiency assignment #1
building a or
descendent Lexer.jar
parser

Programming Lexer
workshops
Parser

Semantic Analyzer Code Generation

Do not required.

Bonus points include


Semantic Analysis from Following
assignment #3 lectures

Deadline: December 4
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6
Assignment #4 | Grammar
<PROGRAM> à '{' <BODY> '}’
<BODY> à {<PRINT>';'|<ASSIGNMENT>';'|<VARIABLE>';’|
<WHILE>|<IF>|<SWITCH>|<RETURN>';'}
<ASSIGNMENT> à identifier '=' <EXPRESSION>
<VARIABLE> à ('int'|'float'|'boolean'|'char’|'string'|'void')identifier
<WHILE> à 'while' '(' <EXPRESSION> ')' <PROGRAM>
<IF> à 'if' '(' <EXPRESSION> ')' <PROGRAM> ['else' <PROGRAM>]
<RETURN> à 'return'
<PRINT> à ’print’ ‘(‘ <EXPRESSION> ‘)’
<EXPRESSION> à <X> {'|' <X>}
<X> à <Y> {'&' <Y>}
<Y> à ['!'] <R>
<R> à <E> {('>'|'<'|'=='|'!=') <E>}
<E> à <A> {(’+'|'-’) <A>}
<A> à <B> {('*'|'/') <B>}
<B> à ['-'] <C>
<C> à integer | octal | hexadecimal | binary | true | false |
string | char | float | identifier|'(' <EXPRESSION> ')'

<SWITCH> à 'switch' '(' id ')' '{' <CASES> [<DEFAULT>] '}'


<CASES> à ('case' (integer|octal|hexadecimal|binary) ':' <PROGRAM>)+
<DEFAULT> à 'default' ':' <PROGRAM>
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7
Assignment #4 | Compiler

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Assignment #4 | Compiler
Bonus Points

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Assignment #4 | VM
Use it to test your compiler.

No changes required

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Assignment #4 | Code

Add this lines to your Parser.run()

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Assignment #4 | Code

The CodeGenerator.java file

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Assignment #4 | Grading

• Bonus 10 %
Implement SWITCH statement: (a) parser; and (b)
code generation.

• Bonus 10 %
Integration: (a) graphical user interface including
token table, parse tree, and symbol table; (b)
syntactic errors handling and recovery; and (c)
semantic analysis.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Assignment #4 | Grading

assignment bonus

0–100% 0–20%

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Implementing

Intermediate Code Generation

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Code Generation
1 * 2 > 3 + 4 * 5 LIT 1, 0
LIT 2, 0
OPR 4, 0
LIT 3, 0
LIT 4, 0
LIT 5, 0
OPR 4, 0
OPR 2, 0
> OPR 11, 0

* *

1 2 3 4 5
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16
Code

PROGRAM

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Code

BODY

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Code

PRINT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Code

ASSIGNMENT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


Code

VARIABLE

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Code

WHILE

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


Code

IF

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


Code

RETURN

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24


Code

EXPRESSION

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 26


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 27


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 28


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 29


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 30


Code

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 31


Code
C

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 32


Homework

Assignment #4

(LIT, LOD, STO, OPR)

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 33


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.
CSE340 - Principles of
Programming Languages
Lecture 22:

Intermediate Code III

Javier Gonzalez-Sanchez
BYENG M1-38
Office Hours: By appointment
Review

Programming Assignment 4
Level 1

LIT, LOD, STO, OPR

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 3


Code

OPR 0, 0

PROGRAM

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 4


Code

BODY

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 5


Code

OPR 21, 0

PRINT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 6


Code

STO identifier, 0

ASSIGNMENT

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 7


Code

VARIABLE

identifier, <type>

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 8


Code

OPR 1, 0

RETURN

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 9


Parser

EXPRESSION

if (twice) {
OPR 8, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 10


Parser

if (twice) {
OPR 9, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 11


Parser

if (operatorUsed) {
OPR 10, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 12


Parser

if (twice) {
OPR <operatorNumber>, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 13


Parser

if (twice) {
OPR <operatorNumber>, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 14


Parser

if (twice) {
OPR <operatorNumber>, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 15


Parser

LIT 0, 0

if (operatorUsed) {
OPR 3, 0
}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 16


Assignment 2 | Code
C

LIT <value>, 0

LOD identifier, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 17


Review

Programming Assignment 4
Level 2

JMP, JMC

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 18


Instructions

• JMP <line>, 0
Put the value <line> in the program counter;
thus, the next line to be executed will be <line>

Examples:

JMP 1, 0
JMP 14, 0
JMP 75, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 19


Instructions

• JMC <line>, <condition>


Read one value from the CPU register
If the value is equal to <condition>, put the value
<line> in the program counter; thus, the next line to
be executed will be <line>

Examples:

JMC 1, true
JMC 14, false
JMC 75, true

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 20


Example

{ int a global 0
int a; int b; int b global 0
@

a = 10;
while (a>1){ LIT 10, 0
print (a); STO a, 0
} LOD a, 0
return; LIT 1, 0
} OPR 11, 0 ; >
JMC #e1, false
LOD a, 0
OPR 20, 0 ; print
JMP #e2, 0
OPR 1, 0
OPR 0, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 21


Labels

• Compiler creates variables and adds them to the


symbol table to remember positions in the code.

• This is useful for loops and conditions.

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 22


Labels | how while works

while (a < b) { label#1


if (a < b) {
//code
//code
}
} else
goto label#2
}
goto label#1
label #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 23


Labels | how if works

if (a < b) { if (a < b) {
if (a>b) goto label #1

//code1 //code1

} else { } else
goto label#2
label #1

// code2 // code2

} }
label #2

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 24


Example | while
{ int a global 0
int a; int b; int b global 0
int #e1 global 10
a = 10; int #e2 global 3
while (a>1){
print (a);
LIT 10, 0
}
STO a, 0
return; LOD a, 0
} LIT 1, 0
OPR 11, 0 ; >
JMC #e1, false
LOD a, 0
OPR 20, 0 ;print
JMP #e2, 0
OPR 1, 0
OPR 0, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 25


Example | if
{ int a global 0
int b global 0
int a; int b;
int #e1 global 10
int #e2 global 12
a = 10;
if (a>1) {
LIT 10, 0
print (a);
STO a, 0
} else {
LOD a, 0
print (b); LIT 1, 0
} OPR 11, 0 ; >
return; JMC #e1, false
} LOD a, 0
OPR 20, 0 ;print
JMP #e2, 0
LOD b, 0
OPR 20, 0 ;print
OPR 1, 0
OPR 0, 0

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 26


Exercise

{
int a; int b;
a = 10;
while (a>1) {
if (a != 0) {
print (a);
} else {
print (b);
}
a = a -1;
}
}
Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 27
Handwritten notes

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 28


Code

WHILE

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 29


Code

IF

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 30


What about a switch-case

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 31


Handwritten notes
Add the symbol table here…

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 32


Homework

Translate this source code to intermediate code:

int a, b;
switch (a) {

case 1: { b = 11; break;}


case 2: { b = 22; break;}
case 3: { b = 33; break;}
default:{ b = 99; break;}

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 33


Homework

Assignment #4

Javier Gonzalez-Sanchez | CSE340 | Fall 2014 | 34


CSE340 - Principles of Programming Languages

Javier Gonzalez-Sanchez
[email protected]
Fall 2014

Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.

You might also like