0% found this document useful (0 votes)

15 views36 pages

Lec02 Programming Language Specification

Uploaded by

ASTROHALT C4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views36 pages

Lec02 Programming Language Specification

Uploaded by

ASTROHALT C4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

COMPILER

Specification

1
Lecture Objectives

• To understand the significance in specifying

programming languages
• To describe the formal methods used in
specifying programming languages
• To recognise the processing requirements in
compiling programs based on programming
language specifications

2
Language Processors: Why do we need them?
Programmer Programmer
Compute surface area of
Concepts and Ideas
a triangle?
Java Program

How to bridge the JVM Byte code

“semantic gap” ?

JVM Interpreter

X86 Processor
0101001001...
Hardware Hardware
3
Programming Language Specification
• Why?
– A communication device between people who need to have
a common understanding of the PL:
• language designer, language implementer, user
• What to specify?
– Specify what is a ‘well formed’ program
• syntax
• contextual constraints (also called static semantics):
– scoping rules
– type rules
– Specify what is the meaning of (well formed) programs
• semantics (also called runtime semantics)

4
Programming Language Specification
• Why?
• What to specify?
• How to specify ?
– Formal specification: use some kind of
precisely defined formalism
– Informal specification: description in English.

– Usually a mix of both (e.g. Java specification)

• Syntax => formal specification using CFG/BNF
• Contextual constraints and semantics => informal
5
Syntax Definition

Syntax: the form of expressions, statements and

programming units.
Grammar: a formal set of rules that describes a
valid syntax of a language
Context Free Grammar (CFG): formal way of
describing syntax.
Backus-Naur Form (BNF): a particular way of
expressing CFGs.

6
How do we start?

• Input: Source program

• Need to convert the sequence of characters
(stream) into representations that can be
processed (tokens)

7
Lexemes

Lexemes are the lowest level syntactic units.

Example:
val = (int)(xdot + y*0.3) ;

In the above statement, the lexemes are

val, = , (, int, ), (, xdot, +, y, *, 0.3, ), ;,

8
Tokens
The category of lexemes are tokens.
• Identifiers: Names chosen by the programmer.
val, xdot, y

• Keywords: Names chosen by the language designer to

help syntax and structure.
int, return, void.

(Keywords that cannot be used as identifiers are

known as RESERVED WORDS)

9
Tokens (Contd.)

• Operators: Identify actions.

+, &&, !

• Literals: Denote values directly.

3.14, -10, ‘a’, true, null

• Punctuation Symbols: Supports syntactic

structure.
(, ), ;, {, }

10
Tokens (Contd.)
• Integers: 2 1000 -20
• Floating-point: 2.0 -0.010 .02
• Symbols: $ # @ { } << >> [ ]
• Strings: “x” “He said, I love Compilers”
• Comments: /* Hi and Bye */

11
Token Structure (Example)

12
What do we do with tokens?

• The sequence of tokens must conform to the

grammar of the language
• The tokens has to be checked with the
specifications given in the grammar

13
Grammars

Like natural languages (English), programming

languages are described by their grammar

It is essential to know the grammar of the source and

target languages when writing a compiler

Context Free Grammar (CFG): formal way of describing syntax

Backus-Naur Form (BNF): a particular way of expressing CFGs

14
Context Free Grammar

The Components of CFG

1. A set of tokens, known as terminal symbols

2. A set of nonterminals
3. A set of productions, LHS → RHS
4. A designation of one of the nonterminals as the start
symbol

Context Free Grammar can be used to help

guide the translation of programs

15
Grammar, Formally
Grammar G of a programming language is a four tuples (quadruple),
G = (T, N, S, P) where:
T is a finite set of terminal symbols <assign>→<ident> = <expr>
<ident> →A | B | C
N is a finite set of non-terminal symbols
<expr> → <ident> + <expr>
S is the start symbol | <ident> * <expr>
P is a finite set of production rules | ( <expr> )
| <ident>

T = { =, A, B, C, *, +, (, ) }
N = { <assign>, <ident>, <expr> }
S = { <assign> }
P = { <assign> → <ident> = <expr>, <ident> → A | B | C,
<expr> → <ident> + <expr> | <ident> * <expr> | ( <expr> ) | <ident> }
16
Production rules

* Consists of a nonterminal (LHS), an arrow (-> or

::=), and a sequence of tokens (terminals) and/or
nonterminals (RHS)
* Describes how the non-terminal LHS can be
expanded into the RHS
* Productions with the same LHS can have their RHS
combined, using a vertical bar (‘|’)

17
Backus Naur Form (BNF)
* Useful for describing the syntax of programming languages
if-else statement in Java

if (expression) statement else statement

Tokens
The structuring rule for if-else
Terminals

stat → if (exp) stat else stat

Nonterminals
Can have the form Production

18
list → list + digit
Logical OR in BNF list → list – digit
list → digit
Tokens digit → 0
digit → 1
+ – 0123456789
digit → 2
digit → 3
digit → 4
Nonterminals digit → 5
digit → 6
list digit digit → 7
OR digit → 8
digit → 9

list → list + digit | list – digit | digit

digit → 0|1|2|3|4|5|6|7|8|9

a string containing zero tokens, written as ε

empty string
19
Logical OR in BNF is denoted by |
digit→ 0|1|2|3|4|5|6|7|8|9 <digit> ::= 0|1|2|3|4|5|6|7|8|9

if_stmt → if expr then stmt

| if expr then stmt else stmt

<if_stmt> ::= if <expr> then <stmt>

| if <expr> then <stmt> else <stmt>

sign → + | − <sign> ::= + | −

20
Recursive Rules in BNF

A BNF rule is recursive if LHS appears on RHS.

<ident_list> ::= <identifier>

| <identifier> , <ident_list>

<integer> ::= <digit>

| <digit> <integer>

21
Extended BNF
• [ ] Optional element:
<if_stmt> ::= if (<logic_expr>) <stmt> [ else <stmt>]
<real_num> ::= [<int_num>] . <int_num>

• { } Unspecified number of repetitions

<ident_list> ::= <identifier> { , <identifier> }

• ( …| …) Multiple choice options. A single element must be

chosen from a group. “for” loop in Pascal:

<for_stmt> ::= for <var> := <expr> (to | downto) <expr> do <stmt>

EBNF enhances the readability and writability of BNF

22
Parse Tree
Parse tree shows how the start symbol of a grammar derives a
string in the language.
Parse trees describe the hierarchical structure of sentences.
Parser: carries out the parsing.
Parsing: is the process of determining if a string of tokens can
be generate by a grammar.
Parse Tree: is graphical (tree) proof showing the steps in
derivation of a string from the start symbol. It has the
following properties
A
1. The root is labeled by the start symbol
2. Each leaf is labeled by a token or by ε. X Y Z
3. Each interior node is labeled by a nonterminal A → XYZ
23
Parse Tree
Parse tree (concrete syntax tree) differs from the Abstract
Syntax Tree (AST)
The AST does not contain superficial distinctions of form, unimportant for
translation
Parse Tree for string 1 + 1 - 0

AST for string 1 + 1 - 0

+ 0

1 1

Parse Tree Syntax Tree

24
Example 1
list → list + digit | list – digit | digit
digit → 0|1|2|3|4|5|6|7|8|9 list

list
digit
9-5+2

list digit
_
digit 5
+ 2

Parse Tree for 9–5+2 9

25
Example 2
Parse Tree for A=B*C
<assign>

<assign> ::= <ident> = <expr> <ident> = <expr>

C
26
Derivation
Derivation is a mechanism by which the rules of a grammar
can be repeatedly applied to generate a sentence.
At each stage, a nonterminal is replaced by the RHS of a
rule, till finally the whole sentence is generated.

A = B * C

<assign>→<ident>=<expr> <assign> <ident> = <expr>

<ident> →A|B|C  A = <expr>
<expr> → <ident>+<expr>
 A = <ident> * <expr>
| <ident>*<expr>
| ( <expr> )  A = B * <expr>
| <ident>  A = B * <ident>
 A = B * C
27
Example
<exp> ::= <exp> <op> <exp> | (<exp> ) | <number>
<op> ::= + | - | *
<number> ::= {0..9}+

derivation for (34-3)*42:

<exp> => <exp> <op> < exp >

=> (<exp> ) <op> < exp >
=> (<exp> <op> <exp> ) <op> < exp >
=> (<number> <op> <exp> ) <op> < exp >
=> (34 <op> <exp> ) <op> < exp >
=> (34 – <exp> ) <op> < exp >
=> (34 – <number> ) <op> < exp >
=> (34 – 3 ) <op> < exp >
=> (34 – 3 ) * < exp >
=> (34-3)* <number>
=> (34 – 3 ) * 42
28
Invalid Sentence <assign> <ident> = <expr>
 A = <expr>
 A = <ident> * <expr>
A = B * C *  A = B * <expr>
 A = B * <ident>
 A = B * C
<assign>→<ident>=<expr>
 invalid
<ident> →A|B|C
<expr> → <ident>+<expr>
| <ident>*<expr> <assign> <ident> = <expr>
| ( <expr> )  A = <expr>
| <ident>  A = <ident> * <expr>
 A = B * <expr>
 A = B * <ident> * <ident>
 A = B * C * <ident>
 invalid
29
Ambiguity
A grammar that generates a sentence which has two or
more distinct parse trees is said to be an ambiguous
grammar
If we rewrite the grammar as below
<string> ::= <string> + <string> | <string> – <string>
|0|1|2|3|4|5|6|7|8|9
then the sentence 9 – 5 + 2 would have two distinct parse trees,
and therefore the above grammar is ambiguous
9–5+2

(9 – 5) + 2 9 – (5 + 2)
30
Example 1 9–5+2

Two parse trees for 9 – 5 + 2

(9 – 5) + 2 9 – (5 + 2)

string → string + string | string – string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

31
<assign> A = B*C+A <assign>
Example 2
<ident> = <expr> <ident> = <expr>

<expr> + <expr> <expr> <expr>

A A *

<expr> <expr> <ident> <ident> <expr> <expr>

* +

<ident> <ident> A B <ident> <ident>

• If the tokens can be matched against the

grammar, the parse tree can be produced
• This means the source programs is
syntactically correct
• However, most programming languages
have semantic specifications to be checked
in order to be able to generate the right
codes

33
Contextual Constraints
Syntax rules alone are not enough to specify the format of
well-formed programs.

Example 1:
let const m~2
Undefined! Scope Rules
in putint(m + x)

Example 2:
let const m~2 ;
var n:Boolean
in begin
n := m<4;
n := n+1 Type error!
Type Rules
end

34
Semantics
Specification of semantics is concerned with specifying the
“meaning” of well-formed programs.
Terminology:
Expressions are evaluated and yield values (and may or may not
perform side effects).
Commands are executed and perform side effects.
Declarations are elaborated to produce bindings.

Side effects:
• change the values of variables
• perform input/output

35
The End

Language Development in Preschool and Late Childhood
No ratings yet
Language Development in Preschool and Late Childhood
55 pages
Describing Syntax and Semantics: CS 350 Programming Language Design Indiana University - Purdue University Fort Wayne
No ratings yet
Describing Syntax and Semantics: CS 350 Programming Language Design Indiana University - Purdue University Fort Wayne
73 pages
Formal Methods of Describing Syntax - PPL
No ratings yet
Formal Methods of Describing Syntax - PPL
15 pages
UNIT-I Part 2 Describing Syntax and Semantics
No ratings yet
UNIT-I Part 2 Describing Syntax and Semantics
70 pages
Lecture02 Single Slide Handout
No ratings yet
Lecture02 Single Slide Handout
49 pages
Principals of Programming Language 1.2
No ratings yet
Principals of Programming Language 1.2
86 pages
BNF
No ratings yet
BNF
30 pages
CCWeek 05lecture09
No ratings yet
CCWeek 05lecture09
34 pages
1.describing Syntax and Semantics
100% (1)
1.describing Syntax and Semantics
110 pages
CH2-1 To CH2-3
No ratings yet
CH2-1 To CH2-3
79 pages
Lec 03 TPL
No ratings yet
Lec 03 TPL
28 pages
Syntax & Semantics
No ratings yet
Syntax & Semantics
34 pages
Entrepreneurship Process
No ratings yet
Entrepreneurship Process
22 pages
Compiler Design 3
No ratings yet
Compiler Design 3
140 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
100% (2)
Grammar and Parse Trees (Syntax) : What Makes A Good Programming Language?
50 pages
Lecture 11
No ratings yet
Lecture 11
56 pages
Lecture 1 Introduction DR Raheel 19022024 032426pm
No ratings yet
Lecture 1 Introduction DR Raheel 19022024 032426pm
32 pages
Compiler Design CS - 4
No ratings yet
Compiler Design CS - 4
70 pages
Compiler Construction Week 04 Syntax Analysis I)
No ratings yet
Compiler Construction Week 04 Syntax Analysis I)
41 pages
CMP401 Ii
No ratings yet
CMP401 Ii
38 pages
Chapter 4
No ratings yet
Chapter 4
62 pages
Describing Syntax and Semantics
No ratings yet
Describing Syntax and Semantics
6 pages
Cse 3 1 PPL Unit 2
No ratings yet
Cse 3 1 PPL Unit 2
10 pages
CH2 1
No ratings yet
CH2 1
27 pages
Chapter 2
No ratings yet
Chapter 2
47 pages
02 Chapter 03
No ratings yet
02 Chapter 03
38 pages
Chapter-3 Syntax and Semantics
No ratings yet
Chapter-3 Syntax and Semantics
20 pages
cs212 Lect05 63 Inter
No ratings yet
cs212 Lect05 63 Inter
48 pages
4 Parsing
No ratings yet
4 Parsing
32 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
Principles of Programming Languages: Syntax Analysis
100% (1)
Principles of Programming Languages: Syntax Analysis
51 pages
pl10ch3 Update
No ratings yet
pl10ch3 Update
32 pages
Lec4 SyntaxAnalysis
No ratings yet
Lec4 SyntaxAnalysis
41 pages
Chapter No 3 Sytax and Semsetics
No ratings yet
Chapter No 3 Sytax and Semsetics
19 pages
PL 10 CH 3
No ratings yet
PL 10 CH 3
34 pages
CH 03
No ratings yet
CH 03
54 pages
L02 CH 2 Describing Syntax Semantics
No ratings yet
L02 CH 2 Describing Syntax Semantics
34 pages
Topic 2 - Syntax and Semantics Lecture Notes
No ratings yet
Topic 2 - Syntax and Semantics Lecture Notes
50 pages
Compiler Construction (CS4623) : Course Instructor: Ms. Tayyaba Zaheer
No ratings yet
Compiler Construction (CS4623) : Course Instructor: Ms. Tayyaba Zaheer
35 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
SPL Lesson 4 2017
No ratings yet
SPL Lesson 4 2017
28 pages
Chapter 3 - Language Translation Issues
No ratings yet
Chapter 3 - Language Translation Issues
73 pages
Specifying Syntax: Components of A Grammar
No ratings yet
Specifying Syntax: Components of A Grammar
6 pages
Syntax and Symentic
No ratings yet
Syntax and Symentic
52 pages
Lecture 2
No ratings yet
Lecture 2
38 pages
PL 10 CH 3
No ratings yet
PL 10 CH 3
33 pages
Chapter 3 - Describing Syntax and Semantics: CS-4337 Organization of Programming Languages
No ratings yet
Chapter 3 - Describing Syntax and Semantics: CS-4337 Organization of Programming Languages
58 pages
Describing Syntax and Semantics: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Describing Syntax and Semantics: CSE 325/CSE 425: Concepts of Programming Language
46 pages
Describing Syntax and Semantics: CMPS401 Class Notes (Chap03) Page 1 / 25 Dr. Kuo-Pao Yang
No ratings yet
Describing Syntax and Semantics: CMPS401 Class Notes (Chap03) Page 1 / 25 Dr. Kuo-Pao Yang
25 pages
CC-Lec 5 Week 5 Cfgs
No ratings yet
CC-Lec 5 Week 5 Cfgs
29 pages
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
No ratings yet
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
56 pages
Parsing 120903115324 Phpapp02
No ratings yet
Parsing 120903115324 Phpapp02
20 pages
Lecture05-Syntax Analysis-CFG
No ratings yet
Lecture05-Syntax Analysis-CFG
19 pages
CSC441-Lesson 04
No ratings yet
CSC441-Lesson 04
40 pages
CSC 461 - Chapter 03
No ratings yet
CSC 461 - Chapter 03
35 pages
CSE 12 Abstract Syntax Trees
No ratings yet
CSE 12 Abstract Syntax Trees
38 pages
Lecture 3 & 4 (3 Files Merged)
No ratings yet
Lecture 3 & 4 (3 Files Merged)
176 pages
Lecture03 Parsing 1
No ratings yet
Lecture03 Parsing 1
108 pages
Preliminaries
No ratings yet
Preliminaries
45 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
English Grammar Topic Adjevtive
No ratings yet
English Grammar Topic Adjevtive
11 pages
Lesson Plan - Characterization in Their Eyes Were Watching God
No ratings yet
Lesson Plan - Characterization in Their Eyes Were Watching God
10 pages
Verb-Preposition Dominoes
No ratings yet
Verb-Preposition Dominoes
2 pages
Indigenous Grade 1
100% (1)
Indigenous Grade 1
49 pages
Taking Simple Phone Message
No ratings yet
Taking Simple Phone Message
15 pages
Oxymoron Understatement
No ratings yet
Oxymoron Understatement
10 pages
Abstract Nouns
No ratings yet
Abstract Nouns
7 pages
The Ashtádhyáyí of Pá Ini
No ratings yet
The Ashtádhyáyí of Pá Ini
149 pages
Introduction To Specific Learning Disabilities
No ratings yet
Introduction To Specific Learning Disabilities
21 pages
Celta Assignment 3
75% (4)
Celta Assignment 3
3 pages
Socrative
No ratings yet
Socrative
2 pages
Stylistics Notes 2
No ratings yet
Stylistics Notes 2
29 pages
Essence of Indian Traditional Knowledge
No ratings yet
Essence of Indian Traditional Knowledge
145 pages
PARTICULARS 08-23 Ingaru
No ratings yet
PARTICULARS 08-23 Ingaru
8 pages
Certificate of Recognition
No ratings yet
Certificate of Recognition
6 pages
Structure of A Critical Review - UNSW Current Students
No ratings yet
Structure of A Critical Review - UNSW Current Students
3 pages
Cursive Writing Worksheets Grade2 Uppercase Letters ABCDE and F
No ratings yet
Cursive Writing Worksheets Grade2 Uppercase Letters ABCDE and F
20 pages
Enhancing Pre-Intermediate EFL Learners' Reading Comprehension Through The Use of Jigsaw Technique
No ratings yet
Enhancing Pre-Intermediate EFL Learners' Reading Comprehension Through The Use of Jigsaw Technique
16 pages
MYP4-SA BALANCE Criterion D
No ratings yet
MYP4-SA BALANCE Criterion D
3 pages
2022 - 2025 ATP Grade 2 Term 4 EFAL
No ratings yet
2022 - 2025 ATP Grade 2 Term 4 EFAL
12 pages
The Ugly Duckling Lesson Plan
No ratings yet
The Ugly Duckling Lesson Plan
4 pages
КТП 8 34
No ratings yet
КТП 8 34
14 pages
RealLife Masterclass - Pronunciation Guide
No ratings yet
RealLife Masterclass - Pronunciation Guide
7 pages
Penggunaan Ragam Deiksis Pada Naskah Drama Yang Berjudul: "Legenda Keong Mas"
No ratings yet
Penggunaan Ragam Deiksis Pada Naskah Drama Yang Berjudul: "Legenda Keong Mas"
14 pages
Chapter 2
No ratings yet
Chapter 2
62 pages
Researchguide nsfgk12
No ratings yet
Researchguide nsfgk12
101 pages
78 03 OfftoaRockyStart US Student
No ratings yet
78 03 OfftoaRockyStart US Student
4 pages
Phác Thảo Nghiên Cứu - Đỗ Thị Khánh Vân - KLTN K26
No ratings yet
Phác Thảo Nghiên Cứu - Đỗ Thị Khánh Vân - KLTN K26
4 pages
Conversation Analysis - An Overview - ScienceDirect Topics
No ratings yet
Conversation Analysis - An Overview - ScienceDirect Topics
18 pages

Lec02 Programming Language Specification

Uploaded by

Lec02 Programming Language Specification

Uploaded by

COMPILER

• To understand the significance in specifying

How to bridge the JVM Byte code

– Usually a mix of both (e.g. Java specification)

Syntax: the form of expressions, statements and

• Input: Source program

Lexemes are the lowest level syntactic units.

In the above statement, the lexemes are

val, = , (, int, ), (, xdot, +, y, *, 0.3, ), ;,

• Keywords: Names chosen by the language designer to

(Keywords that cannot be used as identifiers are

• Operators: Identify actions.

• Literals: Denote values directly.

• Punctuation Symbols: Supports syntactic

• The sequence of tokens must conform to the

Like natural languages (English), programming

It is essential to know the grammar of the source and

Context Free Grammar (CFG): formal way of describing syntax

The Components of CFG

1. A set of tokens, known as terminal symbols

Context Free Grammar can be used to help

* Consists of a nonterminal (LHS), an arrow (-> or

if (expression) statement else statement

stat → if (exp) stat else stat

list → list + digit | list – digit | digit

a string containing zero tokens, written as ε

if_stmt → if expr then stmt

<if_stmt> ::= if <expr> then <stmt>

sign → + | − <sign> ::= + | −

A BNF rule is recursive if LHS appears on RHS.

<ident_list> ::= <identifier>

<integer> ::= <digit>

• { } Unspecified number of repetitions

• ( …| …) Multiple choice options. A single element must be

<for_stmt> ::= for <var> := <expr> (to | downto) <expr> do <stmt>

EBNF enhances the readability and writability of BNF

AST for string 1 + 1 - 0

Parse Tree Syntax Tree

Parse Tree for 9–5+2 9

<assign> ::= <ident> = <expr> <ident> = <expr>

<assign>→<ident>=<expr> <assign> <ident> = <expr>

derivation for (34-3)*42:

<exp> => <exp> <op> < exp >

Two parse trees for 9 – 5 + 2

string → string + string | string – string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

<expr> + <expr> <expr> <expr>

<expr> <expr> <ident> <ident> <expr> <expr>

<ident> <ident> A B <ident> <ident>

• If the tokens can be matched against the

You might also like