CSC 305:
PROGRAMMING
PARADIGM
CHAPTER 2:
Introduction to Language,
Syntax and Semantics
Contents
Describing languages
Sentences,
Language, Lexeme and Tokens)
Describing Syntax
Language
Recognizers, Generators,
Grammars
Describing Semantics
Operational,
Axiomatic, Denotation
Group Work 1.0 (Chapter1)
Find ONE programming language for each
of the paradigm (Imperative, Objectoriented, Logic, Functional).
Explain the language overview and design
process for each of the language.
Present your finding during the next class.
Languages
Method of communication
Spoken and written languages can be described
as a system of symbols (sometimes known as lexemes)
and the grammars (rules) by which the symbols
are manipulated.
.. is a set of Sentences
Sentences is a string of characters over some
alphabet
Programming language is ..
..a system of signs used to communicate a
task/algorithm to a computer, causing the
task to be performed. The task to be
performed is called computation , which
follows absolutely precise and
unambiguous rules.
Syntax and Semantics
Syntax is the form of its expressions,
statements and program units.
Semantics is the meaning of those
expressions, statements and program
units.
Example : while statement in Java
while (boolean_expression) <statement>
Programming Language
Example of a program that adds two integers
and prints: 1 + 1 = 2
#include <stdio.h>
int add(int x, int y)
{
return x + y;
}
Syntax for function
add ( )
int main(void)
Syntax for main ()
{
function
int foo = 1, bar = 1;
printf("%d + %d = %d\n", foo, bar, add(foo, bar));
return 0;
}
Lexeme
..is the lowest level syntax unit of language
Include identifiers, literals, operator and special words
Example :
{
return x + y;
}
1
2
3
4
5
6
7
{
return
x
+
y
;
}
Lexeme
Token
..is a category of lexemes
Example :
{
return x + y;
}
1
2
3
4
5
6
7
{
return
x
+
y
;
}
Lexeme
1
2
3
4
5
6
7
open
keyword
identifier
plus op
identifier
separator
close
Tokens
Language Recognizers
Determines whether given programs are in
the language and syntactically correct.
Example : Compiler
Syntax analyzer is part of compiler.
Also known as parser.
Compiler
Program that
converts entire
source program
into machine
language before
executing it
Compiler process
Source
code
Lexical
Analyzer
Tokenized
code
Syntactic
Analyzer
Parsed
code
Object
code
Semantic
Analyzer
Qualified
code
Code
Generator
optimizer
Final
code
Interpreter
Program that
translates and
executes one
program code
statement at
a time
Does not produce
object program
Language Generators
To generate the sentences of a language
Comparing with the structure of the
generator.
Formal methods for describing syntax is:
Grammars
Grammars
Describe the syntax of programming
language.
Backus-Naur Form and Context-Free
Develop by Noam Chomsky and John
Backus
Grammar classes :
Context-free
grammars Whole PL
Regular grammars Tokens of PL
Context-Free grammars
Context-free grammars are powerful enough to
describe the syntax of most programming languages
The syntax of most programming languages is
specified using context-free grammars.
Context-free grammars are simple enough to allow the
construction of efficient parsing algorithms which, for a
given string, determine whether and how it can be
generated from the grammar.
BNF (Backus-Naur Form) is the most common
notation used to express context-free grammars.
Regular grammars
Is a formal grammars.
The two main categories of formal
grammar:
generative
grammars, which are sets of rules
for how strings in a language can be generated
analytic grammars, which are sets of rules for
how a string can be analyzed to determine
whether it is a member of the language.
Classification of grammars
Chomsky (1959) hierarchy consists of
following :
Type
0 grammar (unrestricted)
Type 1 grammar (context-sensitive)
Type 2 grammar (context free grammar)
Type 3 grammar (regular)
Type 0 grammar (unrestricted)
An unrestricted grammar is a formal grammar G = (N,,P,S),
where N is a set of nonterminal symbols is a set of terminal
symbols, where N and are disjoint, P is a set of production rules of
the form where and are strings of symbols in and is not the
empty string, and is a specially designated start symbol. As the
name implies, there are no real restrictions on the types of
production rules that unrestricted grammars can have.
They generate exactly all languages that
can be recognized by a Turing machine.
These languages are also known as the
recursively enumerable languages.
Type
grammar
(context-sensitive)
A formal1grammar
G = (N, ,
P, S) is context-sensitive if
all rules in P are of the form
A
The name context-sensitive is explained by the and
that form the context of A and determine whether A
can be replaced with or not. This is different from a
context-free grammar where the context of a
nonterminal is not taken into consideration.
Generated the context sensitive languages.
Generated
the context free languages.
Type
2 free
grammar
(context
free grammar
Context
languages are
the theoretical
basis )for
the syntax of most PL.
A context-free grammar G can be defined as a 4-tuple:
G = (Vt,Vn,P,S) where
Vt is a finite set of terminals
Vn is a finite set of non-terminals
P is a finite set of production rules
S is an element of Vn, the distinguished starting non-terminal.
elements of P are of the form
Example :
S x | y | z | S + S | S - S | S * S | S/S | (S)
This grammar can, for example, generate the string
"( x + y ) * x - z * y / ( x + x )".
Type 3 grammar (regular)
In computer science a right regular grammar is a formal
grammar (N, , P, S) such that all the production rules in P are
of one of the following forms:
A a - where A is a non-terminal in N and a is a terminal in
A aB - where A and B are in N and a is in
A - where A is in N and denotes the empty string, i.e. the
string of length 0.
In a left regular grammar, all rules obey the forms
A a - where A is a non-terminal in N and a is a terminal in
A Ba - where A and B are in N and a is in
A - where A is in N and is the empty string.
Grammar
<program> begin <stmt_list> end
<stmt_list> <stmt> | <stmt>;<stmt_list>
<stmt> <var> = <expression>
<var> A | B| C
<expression>
<var> + <var>
| <var> - <var>
| <var>
A program consist of the special word begin followed by a list
of statements separated by semicolons followed by the special
word end
An expression is either single or two variables separated by
either + or operator. The only variable name is A, B and C
Grammar Example
A = B * (A + C)
<assign>
=>
=>
=>
=>
=>
=>
=>
=>
=>
<id> = <expr>
A = <expr>
A = <id> * <expr>
A = B * <expr>
A = B * (<expr>)
A = B * ( <id> + <expr>)
A = B * ( A + <expr>)
A = B * ( A + <id>)
A = B * ( A + C )
BNF
Invented by Noam Chomsky and John Backus.
A BNF specification is a set of derivation rules.
Context free grammars
The
whole programming language is context free
grammars.
Fundamentals:
BNF
is a metalanguage
Example of BNF
<postal-address> ::=
<name-part> <street-address> <zip-part>
This translates into English as:
A postal address consists of a name-part,
followed by a street address part, followed by a
zip-code part.
Example of BNF
<street-address> ::=
[<apt>] <house-num> <street-name> <EOL>
This translates into English as:
A street address consists of an optional
apartment specified, followed by a house
number, followed by a street name, followed by
an end-of-line.
EBNF
Drawback of BNF.
Increase the readability and writability of
the production rules.
New notations which are:
Braces
{ } represents sequences of zero or
more instances of elements.
Brackets [ ] optional elements.
Parenthesis ( ) group of elements.
Parse Tree
Naturally describe the syntactic structure of the
language define.
Every internal node labeled as non-terminal
symbol.
Every leaf is labeled with a terminal symbol
Every subtree describes one abstraction
instances.
Parse Tree Example
A = B * (A + C)
<assign>
<id>
A
<expr>
<id>
<expr>
<expr>
<id>
<expr>
<id>
C
Grammar and Recognizers
A recognizers for the language generated
by the grammar can be algorithmically
constructed.
One of the first syntax analyzer generator
is named yacc (yet another compilercompiler) (Johnson, 1975)
Semantics
The meaning of words and other parts of
languages.
Reveal the meaning of the
syntax/grammar
Categorized as follow:
Static
semantics
Dynamic semantics
Static semantics
An attribute grammar (AG) where it is an
extension from context free grammar (CFG).
AG is a mechanism to formalize syntax for both
Context Free Grammar (CFG) and Context
Sensitive Grammar (CSG).
AG used to defined the static semantics of a
language with features.
Compiler can be done at compile time
Attribute Grammar
A = A + B
Syntax rules : <expr> <var>[2] + <var> [3]
Semantics rules
:
<expr>.actual_type
if
(<var>[2].actual_type = int) and
(<var>[3].actual_type = int)
then
int
else
real
end if
Predicate
: <expr>.actual_type == <expr>.expected_type
Dynamic semantics
Done during the run time.
Several ways to specify DS which is:
By
Common method of describing PL
By
a language references manual
a defining translator
Common method of questioning the behavior of PL
By
a formal definition
Common method of questioning the behavior of PL
by using mathematical methods.
Include operational, axiomatic and denotational
Operational semantics
Example
The
while structure in C Programming
while (expression)
Statement;
Might be defined as following operations:
Evaluate the expression, yielding a value.
If the evaluated is True, run statements and repeat
step 1.
If the evaluated is False, terminate the while
statement.
Axiomatic semantics
Example
Logical
statement called an assertion.
Pre-condition
Post-condition
x = y , y = m
Pre condition
X = 5 , y = 7
z = x; x = y; y = z;
y = n , x = m
Program statement
Post-condition
X = 7 , y = 5
Denotational semantics
Example
Define
PL behavior by applying mathematical
functions to program and program component
to present their meaning.
Definition used double bracket [[ ]] to separate
the syntactic definition from the semantic
definition.
Example
syntactic :expression 2*4 , 5+3, 008 -> integer 8
semantic :
[[2*4]] = [[5+3]] = [[008]] = [[8]]