0% found this document useful (0 votes)
8 views7 pages

Lecture 03

Uploaded by

Syed Abdulrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Lecture 03

Uploaded by

Syed Abdulrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Dr.

Atif Ishaq Khan – Assistant Professor GC University, Lahore

Compiler Construction
CS-4207
Lecture – 03

Disclaimer: The Contents of this reader are borrowed from the book(s) mentioned
in the reference section.
Introduction
As we discussed in the previous lecture that structurally compiler has two part: Analysis
that breaks up a source program into constituent pieces and produces an internal
representation for it, called intermediate code, and synthesis, that translates the
intermediate code into the target program. On the basis of this division the Analysis part
is referred as front end of compiler while the synthesis part is referred as back end of
compiler.
Analysis is organized around the “syntax" and “semantics” of the language to be
compiled. The syntax of the language describes the correct form in which its programs
should be written, while, the semantics of a programming language describes what its
programs signify, i.e., what each program does when it is executed. The Syntax is
specified using Context Free Grammar, whereas, the semantics are difficult to express
and we need to informal descriptions and suggestive examples. A grammar oriented
compiling technique is known as Syntax-Directed translation.

A model of front end of compiler [1]


Syntax Directed Translation – An Example

Java code fragment and Simplified intermediate code [1]


1|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

We discussed in the previous lecture that there are intermediate representation of


original code during compilation. The two intermediate representations are syntax tree
and intermediate code. The below are two intermediate code representation for the code
“do i = i +1; while (a[i] < v);”.

 The root represents the entire do-while loop


 The left child represents the body of the loop that assigns a value to variable i.
 The right child represents and condition to test array value against v.
 The code in (b) is intermediate code in three-address form.
What is Syntax?
Syntax defines the grammatical rules or structure of the language. Many definitions of a
sentence's syntax in logic or computer languages take the following pattern.
 A number or identifier is an expression (BASIS)
 If and are expressions then so are (INDUCTION)
+ , − , ∗ , / ,( )
Such inductive definitions are characterized by the fact that they begin with some atomic
constructs (such as a number or an identifier), introduce one or more categorical
constructs (such as an expression), and then give rules for how constructs can be joined
to produce more instances of constructs.
This mode of syntax definition is generalized to the notion of context free grammar. To
define the syntax of the language, Context Free Grammar will be used. Such grammars
defines the hierarchical structure of the programming languages. For example the
following “if” statement can be represented as context free grammar

if (expression) statement else statement


In this example “if” and “else” are keywords, while, for the expression and statement,
we will make use of variable that are called non-terminals in CFG.

stmt → if ( expr) stmt else stmt


Such rules are referred as a production. These are all lexical items in the form of
terminals and non-terminals.

2|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

Context Free Grammar (CFG)


A CFG is a four tuple = ( , , , ), where
1. is a finite set of symbols, sometimes referred as tokens. These are called terminals
or terminal symbols.
2. is a finite set of variables sometimes called as nonterminal or syntactic categories
(corresponding to categorical concepts). Each variable represents a language i.e. a
set of strings.
3. One of the variables, in this case, represents the language being defined; it is
called start symbol. If there is only one variable the same will be the start variable.
4. P is a finite set of productions or rules that represents the recursive definition of the
language.
a) A variable that is being defined by the production. This is often called the head
of production
b) The production symbol →
c) A string of zero or more terminals and variables. This string called the body of
the production, represents one way to form strings in the language of the
variable of the head.
General Form of production rules
→ ……
An Example – List of digits separated + or -
→ +
→ –

→ 0|1|2|3|4|5|6|7|8|9
In this example list and digit are not terminals while + , - , 0 , 1 ,2, 3, 4, 5, 6, 7 8, 9 are
terminals
How to derive a string from the grammar?
The derivation of a string starts with the start symbol , followed by replacing the non-
terminals in the body of the production with some non-terminals or terminals. The process
is repeated until all the non-terminals at the RHS (the body of the production) are replaced
with terminals. The token (terminal) strings that can be derived from the start symbol
form the language defined by the grammar.
Example – Language for Function call
A language defined for the function call. This call also contains a list of parameters and
sometimes this parameter list is empty. The grammar for such function call is defined
below
→ ( )
→ |
→ , |

3|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

Example of Derivation
Let’s derive the string 9 − 5 + 2 as follows [2]

The process inverse to derivation is recognition or parsing.


What is Parsing?
Parsing is the process to determine whether a string of terminals starting from start symbol
is derivable or not. In case the string cannot be derived from the start symbol, a syntax
error will be reported.
A parse tree for the production A → XYZ is of the form

Where the root node is start symbol, leaf nodes are terminals, while the interior nodes
are non-terminals.
Example of Derivation
Consider the example 9 – 5 + 2

Here we have another definition of the language generated by the grammar is that consists
of set of strings that can be generated by some parse tree. The process of finding a parse
tree fro given string of terminals is called parsing of string.
What is ambiguity in grammar?
A grammar can have more than one parse tree generating a given string of terminals. If
there is possibility of more than one derivations for a string then the grammar is said to
be ambiguous. Since a string with more than one parse tree usually has more than one
meaning, we need to design unambiguous grammars for compiling applications, or to use

4|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

ambiguous grammars with additional rules to resolve the ambiguities. The following
grammar is ambiguous
→ +
→ –
→ 0|1|2|3|4|5|6|7|8|9

Now we can generate more than one parse tree for the same string 9 – 5 + 2

Associativity of Operators
Associativity is the left-to-right or right-to-left order for grouping operands to
operators that have the same precedence. An operator's precedence is meaningful only
if other operators with higher or lower precedence are present. Expressions with higher-
precedence operators are evaluated first [3].
Operators are either left-associative or right-associative. If in any expression the operators
of the same precedence are used then there is no issue of associativity.
For example, 9 + 5 + 2 is equivalent to (9 + 5) + 2 and 9 – 5 – 2 is equivalent to
(9 – 5) – 2. The four arithmetic operators, addition, subtraction, multiplication, and
division are left associative. An operand with + sign on its both sides belongs to the
operator to its left.
A grammar [2] such as
→ + |
→ 0 | 1 | 2 | 3 |4 | 5 | 6 | 7 | 8 | 9
Is called left recursive and captures left associativity.
It will parse string 1 + 2 + 3 + 4 as
{{1 + 2} +3 } + 4
In contrast the grammar [2]
→ ∶= | ≔
→ |… … . . |
→ 0 | 1 | 2 | 3 |4 | 5 | 6 | 7 | 8 | 9

5|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

Is called right recursive and captures right associativity.


It will parse the string ∶= ∶= ∶= ∶= 5 as
∶= { ∶= { ∶= { ∶= 5 }}}
The parse tree construction for the left associative operators goes down towards left, while
for the right associative operator is goes down towards right.
Precedence of Operators
The associativity rules doesn’t defines the ambiguity created in case when there are
operators of different precedence in the same expression.
Consider the following language of parenthesis free arithmetical expression over digits
and two operators + and *. The possible grammar is
→ + | ∗ |
→ 0 | 1 | 2 | 3 |4 | 5 | 6 | 7 | 8 | 9
Consider the following example 9 + 5 ∗ 2 that can be interpreted as (9 + 5) ∗ 2 or
9 + (5 ∗ 2). It yields the grouping {9 + 5} ∗ 2 that implies evaluation of string to the
value 28 instead of correct value19. We know that multiplication and division have
higher precedence than addition and subtraction.
For the arithmetic expressions 9 ∗ 5 + 2 9 + 5 ∗ 2 which operator will take out
5 at the time of evaluation? Obviously “*” because it has higher precedence than +.

Solution
We need to define our grammar in such way that the issue of precedence and precedence
must not create any ambiguity.
For example we can take two different variables and to define the two
difference precedence and an extra variable for generating the basic units (digits
and parenthesized expressions) in expressions.

This grammar resolves the issue of associativity as well the precedence and we can nest
the expression using parenthesis.

6|Page
Dr. Atif Ishaq Khan – Assistant Professor GC University, Lahore

Following is the grammar for the subset of java statements.

References
1. Alferd. V. Aho, Monica, S Lam, Ravi Sethi, and Jeffry D. Ullman, “Compilers,
Principles, Techniques and Tools”, Chapter-2, Second Edition, Pearson,
2. Honor Compilers, NYU, Lecture 1, Fall 2009, A. Pnueli
3. https://www.ibm.com/docs/en/xl-c-and-cpp-aix/11.1.0?topic=operators-operator-
precedence-associativity

7|Page

You might also like