0% found this document useful (0 votes)
8 views

Describing Syntax and Semantics

Uploaded by

amc204849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Describing Syntax and Semantics

Uploaded by

amc204849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Describing Syntax and Semantics (Study of meaning in language)

 The task of providing a concise (brief but comprehensive) yet understandable description of a programming
language is difficult but essential to the language’s success.
 One of the problems in describing a language is the diversity of the people who must understand the description.
 Most new programming languages are subjected to a period of scrutiny by potential users.
 These are the initial evaluators.
 The success of this feedback cycle depends heavily on the clarity of the description
 Programming language implementors decide how the expressions, statements, and program units of a language should
be implemented.
 The difficulty of the implementors’ job is determined by the completeness and precision of the language description.
 The language users must be able to encode by referring a language manual.
 The study of programming languages, like the study of natural languages which can be divided into 2 parts.
o syntax and
 the form or structure of the expressions, statements, and program units
o semantics
 the meaning of the expressions, statements, and program units
 Syntax and semantics provide a language’s definition
 For example, the syntax of a Java while statement is
o while (boolean_expr) statement
o The semantics of this statement form is that when the current value of the Boolean expression is true, the
embedded statement is executed
 The General Problem of Describing Syntax
o A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from
some alphabet.
o The strings of a language are called sentences or statements.
o Sentence is an expression or a program which is dictated by the grammar (or rules) of the programming
language
o The syntax rules of a language specify which strings of characters from the language’s alphabet are in the
language.
o English, for example, has a large and complex collection of rules for specifying the syntax of its sentences.
o By comparison, even the largest and most complex programming languages are syntactically very simple.
o A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin)
o A token is a category of lexemes (e.g., identifier)
o Consider the following Java Statement
o index = 2 * count + 17;
Lexeme Token
index Identifier
= Equal sign – Comparison operator
2 Integer value
* Multiplication operator
Count Identifier
+ Pluss operator
17 Integer value
; Semicolon, end of instruction.

 Language Recognizers
o Language can be defined in 2 distinct ways
o Recognition and by Generation
o A recognition device reads input strings over the alphabet of the language and decides whether the input
strings belong to the language.
o It’s also called as parser
o Example: syntax analysis part of a compiler
o Generators
o A device that generates sentences of a language
o One can determine if the syntax of a particular sentence is syntactically correct by comparing it to the
structure of the generator
 Formal Methods of Describing Syntax
o This section discusses the formal language-generation mechanisms, usually called grammars,
o That are commonly used to describe the syntax of programming languages.
 Backus-Naur Form and Context-Free Grammars (Noam Chomsky and John Backus)
o In the mid-1950s, Chomsky, described 4 classes of grammar that further define 4 classes of language.
o Two of these grammar classes, named context-free and regular, useful to describe the syntax of programming
languages.
o The syntax of all programing languages can be described by Context-free grammar.
o A context-free grammar is a set of recursive rules used to generate patterns of strings.
o A context-free grammar can describe all regular languages.
o Chomsky, focused was on natural language, but later on his work was applied in artificial languages.
 Origins of Backus-Naur Form
o Right when Chomsky started working on Language classes, ALGOL 58 getting designed by ACM-GAMM group
o A paper was introduced specifying programming language syntax
o A revised paper was introduced in 1960 for ALGOL 60 known BNF - Backus-Naur Form
o BNF is a natural notation for describing syntax
 Fundamentals
o A metalanguage is a language that is used to describe another language.
o BNF is a metalanguage for programming languages.
o Consider the following simple Java assignment statement,
o <assign> → <var> = <expression>
o total = subtotal1 + subtotal2
o The abstractions in a BNF description or grammar are called nonterminal symbols, or simply non-terminals,
o Lexemes and tokens of the rules are called terminal symbols, or simply terminals.
o A BNF description, or grammar, is a collection of rules.
o Nonterminals are often enclosed in angle brackets
o Examples of BNF rules:
o <ident_list> → identifier | identifier, <ident_list>
o <if_stmt> → if <logic_expr> then <stmt>
o Select case when productname = 'chai' then 'tea' else productname end
o From Products
o Productname and Products are non-terminals
o Rest, are terminals.
 Grammars and Derivations
o A set of instructions about how to write statements that are valid for that programming language
o Grammar: a finite non-empty set of rules
o A start symbol is a special element of the non-terminals of a grammar
o A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all
terminal symbols)
o A Grammar for a Small Language
o <program> → begin <stmt_list> end
o <stmt_list> → <stmt>
o | <stmt> ; <stmt_list>
o <stmt> → <var> = <expression>
o <var> → A | B | C
o <expression> → <var> + <var>
o | <var> – <var>
o | <var>
 A derivation of a program in this language follows:
o <program> => <stmts> => <stmt>
o => <var> = <expr>
o => a = <expr>
o => a = <term> + <term>
o => a = <var> + <term>
o => a = b + <term>
o => a = b + const
 Parse Tree


 The starting symbol of the grammar must be used as the root of the Parse Tree. Leaves of parse tree represent
terminals.
Rules to Draw a Parse Tree:

o All leaf nodes need to be terminals.


o All interior nodes need to be non-terminals.
 Ambiguity
o The ambiguity in the mathematical calculation gives the idea of precedence.
o A=B+C*A

o
 Operator Precedence
o When an expression includes two different operators, for example, x + y * z,
o One obvious semantic issue is the order of evaluation of the two operators (for example, in this expression is
it add and then multiply, or vice versa?).
o This semantic question can be answered by assigning different precedence levels to operators.
o For example, if * has been assigned higher precedence than + (by the language designer),
o Multiplication will be done first, regardless of the order of appearance of the two operators in the expression.
o A grammar describes the syntactic structure and it can be determined from its parse tree.
o The fact is that an operator in an arithmetic expression is generated lower in the parse tree can be used to
indicate that it has precedence over an operator produced higher up in the tree.
o In the above Figure 3.2, the multiplication operator is generated lower in the tree, which could indicate that it
has precedence over the addition operator in the expression.
o The second parse tree, however, indicates just the opposite.
o It appears, therefore, that the two parse trees indicate conflicting precedence information.
o The below is an unambiguous grammar of
o A = B + (C * D)

o
o Following is the Derivation of above grammar of expression A = B + C * D
o <expr>  <term>  <factor>  <id>  B
o The below Derivation is called Leftmost Derivative

o
o A = B + C *A
o The below is the Rightmost Derivative.
o

o
 Associativity of Operators
o When an expression includes two operators that have the same precedence for example, A / B * C
o A semantic rule is required to specify which should have precedence.
o This rule is named associativity
o For example, the expression A = B + C + A can be written
 A = (A + B) + C OR
 A = A + (B + C)

o
 An Unambiguous Grammar for if-then-else
o The BNF rules for an if-then-else statement are as follows:
o <if_stmt> → if <logic_expr> then <stmt>
o if <logic_expr> then <stmt> else <stmt
o
 Attribute Grammars
o An attribute grammar is a device used to describe more of the structure of a programming language than can
be described with a context-free grammar.
o Its an extension of context-free grammar.
o The extension allows certain language rules to be conveniently described, such as type compatibility.
o Before we go further, we have to clarify the concept of static semantics
o Static Semantics
 Not all the structure of programming language can be described thru BNF
 The example is type compatibility
 In Java, a floating-point value cannot be assigned to an integer type variable, although the
opposite is legal.
 If all of the typing rules of Java were specified in BNF, the grammar would become too
large to be useful.
 As an example of a syntax rule that cannot be specified in BNF,
 Consider the common rule that all variables must be declared before they are referenced.
 It has been proven that this rule cannot be specified in BNF.
 This problem is called static semantics
 Basic Concepts
o Attributes, which are associated with grammar symbols, are similar to variables in the sense that they can
have values assigned to them.
o Attribute computation functions, sometimes called semantic functions, are associated with grammar rules.
o They are used to specify how attribute values are computed.
o Predicate functions, which state the static semantic rules of the language, are associated with grammar rules.
o Predicate functions are functions that return a single TRUE or FALSE

You might also like