Compiler Week 2
Compiler Week 2
By
DD/DD/DDDD
• Grammars are convenient enough for syntax that they are used
by all communities: programmers, implementers, designers.
• Semantic descriptions, however, are seldom both readable enough
to be suitable for a beginner and precise enough to specify a
language fully.
• Several distinct styles of language description have arisen to meet
the conflicting needs of readability and precision.
• Tutorials: A tutorial introduction is a guided tour of language. It
provide impressions of what the main constructs of the language
are and how they are meant to be used.
• The syntax and semantics are introduced gradually, as needed.
• Reference Manuals: A reference manual describing the syntax
and semantics of a language is traditionally organized around the
syntax of the language.
• Formal Definitions: A formal definition is a precise description of
the syntax and semantics of a language; it is aimed at specialists.
English descriptions leave room for conflicting interpretations, so
precise formal notations have been developed. The training and
effort needed to learn such notations are balanced by their
promise for clarifying particularly subtle point.
Expression Notations:
• Expression such as a+b*c have been in use for centuries and were
a starting point for design of programming languages.
• For example, a expression in Fortran can be written as:
(- b + √ b2 – 4 * a * c ) / (2 * a)
Postfix Notation:
Infix Notation:
Mixfix Notation:
+
a b
• An operator and its operands are represented by a node and its
children.
• A tree consists of a node with k ≥ 0 trees as its children.
• When k = 0, a tree consists of just a node, with no children.
• A node with no children is called a leaf.
• The root of a tree is a node with no parent; that is, it is not a child
of any node.
Lexical Syntax:
B*B–4*A*C
• White space in the form of blank, tab, and newline character can
typically be inserted between tokens without changing the
meaning of a program.
• Similarly comments between tokens are ignored.
• Informal descriptions usually suffice for white space, comments
and the correspondence between tokens and their spellings, so
lexical syntax will not be formalized.
• Real numbers are a possible exception.
• The most complex rules in a lexical syntax are typically the ones
describing the syntax of real numbers, because parts of the syntax
are optional.
• The following some of the ways of writing the same number:
• 314.E-2 = 3.14 = 0.314E+1 = 0.313E1
• and leading 0 can sometimes be dropped as: .314E1
Context-Free Grammars:
integer-part fraction
digit
3 . 1 4
• The leaves at the bottom of a parse tree are labeled with terminals
or tokens like 3; tokens represent themselves.
• By contrast, the other nodes of a parse tree are labeled with non-
terminals like real-number and digit; non-terminal represent
language constructs.
• Each node in the parse tree is based on a production, a rule that
defines a non-terminal in terms of a sequence of terminals and
non-terminals.
• The root of the parse tree for 3.14 is based on the following
informally stated production:
• “A real number consists of an integer part, a point, and a fraction
part”.
• Together the tokens, the non-terminals, the productions, and a
distinguished non-terminal, called the starting non-terminal,
constitute a grammar for a language.
• The starting non-terminal may represent a portion of a complete
program when fragments of a programming language are studies.
• Both tokens and non-terminals are referred to as grammar
symbols, or simply symbols.
Variants of Grammars: