Syntax Semantics
Syntax Semantics
Study of PLs include examination of: • A sentence is a string of characters over some alphabet.
• Syntax - the form or structure of the expressions, • A language is a set of sentences.
statements, and program units. – Syntax rules specify which sentences are in the
• Semantics - the meaning of the expressions, language.
statements, and program units. • A lexeme is the lowest level syntactic unit of a language
(e.g., *, sum, begin.)
In a well-designed PL, semantics should follow
– Description of lexemes is given by a lexical specification,
directly from syntax. and separate from the syntactic description of the lang.
Describing syntax is easier than describing semantics. – Lexemes include identifiers, constants, operators and
special words.
• A token is a category of lexemes (e.g., identifier, semicolon,
• Ex: An if statement in C language: or equal_sign) [Example]
if ( <expr> ) <statement>
You can think of progs as strings of lexemes rather than chars.
Chapter 3 Programming Languages 3 Chapter 3 Programming Languages 4
• Recognizers - used in syntax analysis part of Grammars are formal language generation mechanisms
compilers commonly used to describe syntax of PLs.
– A language L that uses alphabet ∑ of characters. Context-Free Grammars (CFG) (mid-1950s)
– We construct a recognition device, R, which is capable • Developed by Noam Chomsky.
of • Defined a class of languages called context-free langs.
• inputting strings of chars. from the alphabet ∑ and • Context-free grammars can describe whole languages,
• indicating whether a given input string is in L or not. with minor exceptions.
• Generators - what we'll study • Regular grammars can describe langs of tokens of PLs.
– A language generator is a device that can be used to Backus-Naur Form (BNF) (1959)
generate the sentences of a language. • Invented by John Backus to describe Algol 58.
– more readable and understandable than recognizers • BNF is equivalent to context-free grammars.
– Lang. recognizers are not useful as a language • BNF is a very natural notation for describing syntax.
description mechanism.
Chapter 3 Programming Languages 5 Chapter 3 Programming Languages 6
1
Fundamentals
Examples
• Each of the strings in the derivation, including start
symbol is called a sentential form. An example grammar for a small language:
<program> -> <stmts >
• A sentence is a sentential form that has only
<stmts > -> <stmt> | <stmt> ; < stmts >
terminal symbols, or lexemes. <stmt> -> <var> = <expr >
• A leftmost derivation is one in which the leftmost <var> -> a | b | c | d
nonterminal in each sentential form is the one that is <expr > -> <term> + <term> | <term> - <term>
expanded: <term> -> <var> | const
Parse Trees
A parse tree is a hierarchical representation of a derivation. A grammar is ambiguous iff it generates a sentential
A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees.
form that has two or more distinct parse trees. • Ex: An ambiguous expression grammar:
<program> <expr > -> <expr > <op> < expr > | const
<op> -> / | -
<stmts>
b
Chapter 3 Programming Languages 11 Chapter 3 Programming Languages 12
2
Following derivation uses the above grammar: Associativity of Operators
<expr > => < expr > - <term> => <term> - <term> Make sure that the associativity is correctly described.
=> const - <term>
=> const - <term> / const
– Ex: A := B + C + A (See Figure 3.4)
=> const - const / const In most cases, associativity of operators is irrelevant:
– In math, + is associative, i.e.,( A+B)+C = A + (B+C)
• Operator associativity can also be indicated by a
grammar: – In computers, + is sometimes not associative.
Ex: Floating-point addition w/limited precision.
<expr > -> <expr > + < expr > | const (ambiguous) – (–) and (/) are not associative either in math or in a
<expr > -> <expr > + const | const (unambiguous)
computer.
A left (right) recursive BNF rule: a rule where its LHS also
appearing at the beginning (end) of its RHS.
– Left recursion specifies left associativity. (as in + - / *)
– Right recursion “ “ right associativity. (as in **)
3
Example
Given the grammar:
Syntax Analyzer <expr> -> <term> {(+| -) <term> }
Lexical Analyzer
Characters Lexemes (Parser) <term> -> <factor >{(*|/)< factor>}
representing Tokens <factor> -> <id> | ( < expr> )
the sentence
Plays the role of a The recursive descent subprogram in C for the second rule:
Front-End
to Parser void term() {
• lexical() gets leftmost token of input and puts it into factor(); /*parse the first factor */
global variable next_token. while (next_token== ast_code || next_token==slash_code) {
lexical(); /* get the next token from the input */
factor(); /* parse the next factor */
Recursive descent parsers, like other top-down parsers, }
cannot be built from left-recursive grammars. }
Static Semantics
void factor () { ( Have nothing to do with meaning but the legal forms of
if (next_token == id_code ) {
programs (syntax rather than semantics.))
lexical();
return; Some characteristics of PLs:
} 1. Context-free but cumbersome (e.g., type checking)
else if ( next_token == left_ paren_code) { – Grammar would become too large to be useful. The
lexical();
size of the grammar determines the size of the parser.
expr();
if ( next_token == right_ paren_code) { 2. Non-Context-free (e.g. variables must be declared before
lexical(); they are used)
return;
else error(); /*expecting right paranthesis */ Because of the inability to describe static semantics with
} Parsers of real compilers report a diagnostic message BNF, a variety of more powerful mechanisms has been
when an error is detected, and recover from the error
so that the parsing process can continue.
described for that task, such as attribute grammars.
else
error(); /*it was neither an id or a left paranthesis */
} Chapter 3 Programming Languages 21 Chapter 3 Programming Languages 22
CFGs cannot describe all of the syntax of programming An attribute grammar is a CFG G = (S, N, T, P)
languages. Additions to CFGs to carry some semantic with the following additions:
info along through parse trees 1. For each grammar symbol x there is a set A(x) of
Attribute grammars are grammars to which have been attribute values.
added:
2. Each rule has a set of functions that define
• Attributes, which are associated with grammar symbols ,
are similar to variables that can be assigned values. certain attributes of the nonterminals in the rule.
• Attribute computation functions (semantic functions) 3. Each rule has a (possibly empty) set of
are associated with grammar rules to specify how predicates to check for attribute consistency.
attribute values are computed.
• Predicate functions, which state some of the syntax and Primary value of AGs:
semantic rules of the language, are associated with 1. Static semantics specification
grammar rules. 2. Compiler design (static semantics checking)
Chapter 3 Programming Languages 23 Chapter 3 Programming Languages 24
4
Attributes and Attribute Computation
Predicate Functions
Functions
Let X0 -> X1 ... Xn be a rule. • A predicate function has the form of a Boolean
Associated with each grammar symbol X is a set of expression on the attribute set {A(X0), ... A(X n)}.
attributes A(X) that consists of two disjoint sets: S(X) & I(X) – Only derivations allowed with an attribute
• Functions of the form S(X0) = f(A(X1), ... A(X n)) define grammar are those in which the predicates
synthesized attributes. associated with every nonterminal are all true.
– used to pass semantic info up a parse tree. – A false predicate function value indicates a
– f is a semantic function and value of X0 depends only violation of the syntax or static semantics rules of
on the values of attributes on that node’s children. the language.
• Functions of the form I(Xj) = f(A(X0), ... , A(X n)), for
i <= j <= n, define inherited attributes.
– used to pass semantic info down a parse tree.
– f is a semantic function and value of Xj depends on the
values of attributes on that node’s parent & siblings.
Chapter 3 Programming Languages 25 Chapter 3 Programming Languages 26
• Parse tree is based on its underlying BNF grammar, Intrinsic attributes are synthesized attributes of leaf
with a possibly empty set of attribute values attached nodes whole values are determined outside the parse
to each node. tree.
• If all the attribute values in a parse tree have been
computed, the tree is said to be fully attributed. Example 1: Ada procedure names.
• Assume that attribute values are computed after the Rule: In Ada language, the name on the end of a
complete unattributed tree has been constructed. procedure should match the procedure’s name.
Syntax rule:
<proc_def> → procedure <proc_name>[1]
<proc_body> end <proc_name>[2];
Semantic rule:
<proc_name>[1].string = <proc_name>[2].string
Chapter 3 Programming Languages 27 Chapter 3 Programming Languages 28
5
Example 3: Simple Expression Attribute Grammar:
1. Syntax rule: <expr > -> <var>[1] + < var>[2]
Expressions of the form: id + id
Semantic rules:
• id's can be either int_type or real_type
<var>[1]. env ← <expr>.env
• types of the two id's must be the same <var>[2]. env ← <expr>.env
• type of the expression must match it's expected type <expr>. actual_type ← <var>[1]. actual_type
Predicate:
BNF: <var>[1]. actual_type = <var>[2]. actual_type
<expr> -> <var> + <var> <expr>. expected_type = <expr>. actual_type
<var> -> id
2. Syntax rule: <var> -> id
1. If all attributes were inherited, the tree could be 1. <expr>. env ← inherited from parent
<expr>. expected_type ← inherited from parent
decorated in top-down order.
2. <var>[1]. env ← <expr>.env
2. If all attributes were synthesized, the tree could be <var>[2]. env ← <expr>.env
decorated in bottom-up order.
3. <var>[1]. actual_type ← lookup (A, <var>[1]. env)
<var>[2]. actual_type ← lookup ( B, <var>[2]. env)
3. In many cases, both kinds of attributes are used,
<var>[1]. actual_type =? <var>[2]. actual_type
and it is some combination of top-down and
bottom-up that must be used. 4. <expr>. actual_type ← <var>[1]. actual_type
<expr>. actual_type =? <expr>. expected_type
No single widely acceptable notation or formalism Due: March 2nd, 1999 Tuesday
for describing semantics, all are complicated and very
theoretical. 1-) Answer the following Review Questions:
2.5, 3.5, 3.9, and 3.12 (Each 10 points)
Three common types:
1. Operational Semantics 2-) Solve the following problems in the Problem Sets:
2. Axiomatic Semantics 2.1, 3.5, 3.7, 3.8 (Each 15 points)
– Based on formal logic (first order predicate calculus)
– Original purpose: formal program verification
3. Denotational Semantics
– Based on recursive function theory
– The most abstract semantics description method.
Chapter 3 Programming Languages 35 Chapter 3 Programming Languages 36