0% found this document useful (0 votes)
88 views14 pages

Topic 2: Language Design Principles: 2.1 Describing Syntax and Semantics

This document discusses language design principles including describing syntax and semantics, lexical and syntax analysis, and formal methods of describing syntax like context free grammars, Backus-Naur Form, Extended Backus-Naur Form, and attribute grammars. It also discusses the compilation process including lexical analysis, syntax analysis, semantic analysis, code generation, and linking/loading. Finally, it covers topics like variables, data types, and arithmetic errors.

Uploaded by

Muhd Fahmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views14 pages

Topic 2: Language Design Principles: 2.1 Describing Syntax and Semantics

This document discusses language design principles including describing syntax and semantics, lexical and syntax analysis, and formal methods of describing syntax like context free grammars, Backus-Naur Form, Extended Backus-Naur Form, and attribute grammars. It also discusses the compilation process including lexical analysis, syntax analysis, semantic analysis, code generation, and linking/loading. Finally, it covers topics like variables, data types, and arithmetic errors.

Uploaded by

Muhd Fahmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

TOPIC 2: LANGUAGE DESIGN PRINCIPLES

2.1 Describing Syntax and Semantics


A sentence is a string of characters over some alphabet.
A language is a set of sentences or statements.
A lexeme is the lowest level syntactic unit of a language which includes its identifiers, literals, operators
and special words.
A token is a category of lexemes.
A program is a string of lexemes.

2.2 Lexical and Syntax Analysis


The formal rule of how expressions are written or arranged in a programming language is called the
syntax of that language.
Its semantics is the meaning of those expression, statements, and program units.
Syntax and semantics are closely related. Semantic should follow directly from syntax (what the
statement is meant to accomplish).

Language can be defined in two distinct ways:

Language Recognizers Language Generators


Language recognizer need to be designed so that it A language generator is a device that can be used
indicated whether a given input string was or was to generate the sentences of a language
not in the language
Recognizer determines whether given programs A generator for L will produce an arbitrary string in
are in the language and syntactically correct L on demand
Given a string, a recognizer for a language L tells
whether or not the string is in L
Syntax analysis is a part of a compiler. It is a To generate the sentence of a language by
recognizer for the language which the compiler comparing it with the structure of the generator
translates
Syntax analyzer is also known as parser

2.2.1 Formal Methods of Describing Syntax

Context Free Grammar


Context free grammar was developed by Noam Chomsky in the mid-1950s. Language generators, meant
to describe the syntax of natural languages. Define a class of languages called context-free languages.
Context free grammar is useful for describing the syntax of programming languages.
Backus Naur Form (BNF)
Backus Naur Form (BNF) was invented by John Backus to describe Algol 58. BNF is equivalent to
context-free grammars. BNF is a metalanguage used to describe another language. In BNF, abstractions
are used to represent classes of syntactic structures--they act like syntactic variables (also called
nonterminal symbols). The revised method of syntax description BNF is a natural notation for describing
syntax. BNF is simple but sufficiently powerful to describe nearly all of the syntax of programming
languages.

Extended BNF (EBNF)


EBNF is an extended version of BNF. It does not enhance the descriptive power of BNF, only increase its
readability & write ability. 3 extensions which are included in EBNF are:

Brackets () to denotes the Braces {} in the right hand side When a single element must be
optional part of the right hand to indicate that the enclosed chosen from a group, the options
side of the grammar part can be repeated indefinitely are placed in parentheses &
left out together separated by the OR operator, |

Attribute Grammars
A grammar is a powerful tool for describing and analyzing languages. It is a set of rules by which valid
sentences in a language are constructed. A grammar is used to describe the syntax of programming
languages. It is a device used to describe more of the structure of a programming language than can be
describe with a context free grammar. It is an extension of context free grammar.

Four classifications of grammars:

1. Type 0 Grammar - unrestricted grammar


2. Type 1 Grammar - context sensitive grammar
3. Type 2 Grammar - context free grammar
4. Type 3 Grammar - regular grammar or restrictive grammar

Compilation Process
Before a program can be run, it must be translated into a form in which it can be executed by a
computer. The software systems that do this are called compilers. An important role of the compiler is to
report any errors in the source program that it detects during the translation process.
A compiler / an interpreter :
translates high-level language statements into machine codes
issues error messages
A compiler translates an entire program before executing the statements. A language that a compiler
translates is called a source language.
An interpreter translates one statement at a time, executes the statement as soon as it is translated.
Lexical Analyzer is the first phase of a compiler
Its main task is to read input characters and produce as output a sequence of tokens that parser uses for
syntax analysis
Figure 2.1: The Compilation Process
The language that compiler translates is called the source language. The process of compilation and
program execution takes place in several phases, the most important of which are shown in Figure 1.1.

The lexical analyzer gathers the characters of the source program into lexical units. The lexical units of a
program are identifiers, special words, operators, and punctuation symbols. The lexical analyzer ignores
comments in the source program, because the compiler has no use for them. The syntax analyzer takes
the lexical units from the lexical analyzer and uses them to construct hierarchical structures called parse
trees. These parse trees represent the syntactic structure of the program.

The intermediate code generator produces a program in a different language, at an intermediate level
between the source program and the final output of the compiler, the machine language program.
Intermediate languages sometimes look very much like assembly languages and in fact sometimes are
actually assembly languages. The semantic analyzer is an integral part of the intermediate code
generator. The semantic analyzer checks for errors that are difficult if not impossible to detect during
syntax analysis, such as type errors. Optimization, which improves programs (usually in their
intermediate code version) by making them smaller or faster or both, is often an optional part of
compilation. In fact, some compilers are incapable of doing any significant optimization.

The code generator translates the optimized intermediate code version of the program into an
equivalent machine language program. The symbol table serves as a database for the compilation
process. The primary contents of the symbol table are the type and attribute information of each user-
defined name in the program. This information is placed in the symbol table by the lexical analyzer and
syntax analyzer and is used by the semantic analyzer and the code generator.

Although the machine language generated by a compiler can be executed directly on the hardware, it
must nearly always be run along with some other code. Most user programs also require programs from
the operating system. Among the most common of these are programs for input and output. Before the
machine language programs produced by a compiler can be executed, the required programs from the
operating system must be found and linked to the user program. The process of collecting system
programs and linking them to user programs is called linking and loading, or sometimes just linking. It is
accomplished by a system programs program called a linker.

Names, Binding, Type Checking and Scopes


Primitive values are those which cannot be decomposed into more primitive parts. Composite values,
such as arrays and lists are aggregations of parts, meaning that each are composite or primitive values
themselves.

Variables : a variable is a name which is associated with a memory location to store a value.

Attributes of a variable :
Name : names that can be declared as variables
Memory address : storage location for a particular variable
Value : the value that is associated with the variable
Type : the type of variables determines the range of values the variable can have and the set of
operations that are defined for values of the type.
Lifetime : refers to the period of time that the variable is bound to an address(starting when we create
it and end it when it is destroyed)
Scope : defines the capability of variables to be accessed within a program segment(local or global?)

Binding:
Process of association of attribute variables during compilation and execution of a program. 2 types of
binding :
Static binding : early binding, binds the attributes at compile time and translates into machine language
Dynamic binding : binds attributes at run-time

Data Types

Constants
A constant stores only one value, and that value cannot be changed during program execution. The
advantage is that when the value changes, you dont have to edit the entire program, you just have to
edit the declaration statement.
Examples :
C++ : const int MAX = 100;
Java : final int MAX = 100;
Pascal : const MAX = 100;

Primitive data types:


Numeric data types : integer, real.

Integer Floating point


Operations with integers are faster
Integers need less storage Requires more bytes of computer memory
Always precise Some loss of accuracy

Example of floating point number inaccuracies :

float trial = 0.0;

while ( trial != 11.0)


{
cout<<" trial no :"<<trial<<endl;
trial = trial + 1.1 ;

}
cout<<"last trial : "<<trial ;
In the codes above the loop may fail to terminate due to representational error. Representational error
is an error caused by imprecise representation of the decimal number in binary form. The result of
adding 1.1 will not exactly be 11, resulting in an infinite loop.

Arithmetic underflow and arithmetic overflow

double x = 100000.00;
double y = 0.000001234;
double z ;
z = x + y;
cout<<" Z : "<<z;

In the example above, the output is 100000, which is wrong. This is caused by processing a large
number with a much smaller number, whereby the larger number will cancel out the smaller number.
This error is called cancellation error.

Multiplying 2 very small numbers may result to a very small number that it is represented as zero, this is
called arithmetic underflow. While multiplying 2 very large numbers may result to a very large number to
be represented, this is called arithmetic overflow.

Boolean : the simplest data type, has only 2 values, true or false. Booleans are often used to represent
switches or flags in a program. In some programming languages Boolean variables are represented as the
binary numbers 1 ( true ) and 0 ( false ).

Character : character data are stored as numeric codings. The most commonly used was the ASCII code.
Structured data types: arrays, strings, enumeration, pointers
User defined ordinal types
An ordinal type is one in which the range of possible values can be easily associated with the set of
positive integers. 2 types of ordinal types : enumeration and subrange
Strings :A String consists of sequences of characters.
Enumeration types: enumeration is a way to represent data types that allow the values to be associated
with a range of integers

Example 1 -> C++ :


enum days { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday};
days hari;
for ( hari = Monday; hari <= Sunday; hari++)
cout<<hari;
Example 2
enum sizes { small, medium, large, jumbo };
sizes drink_size, popcorn_size;
drink_size = large;
popcorn_size = jumbo;

Subrange : a subrange type is a contiguous subsequence of an ordinal type. Subrange types were
introduced by Pascal and are included in Ada

Pascal :
type
Letter = A .. Z;
DaysInMonth = 1..31;

Array : an array is a group of contiguous memory locations that all have the same name and the same
type. Each element is identified by its position in the array.

C++ : int num[10];


Pascal : num : Array[1..10] of integer;
JAVA : int num[] = new int[10];

Record type : First Introduced by COBOL in the early 1960s, a record is a collection of related items.
Unlike an array, the individual components of a record can contain data of different types.
The record data types are used for file/data base applications
In C and C++, records are supported with the struct data type. In Pascal, we can use record data
type.

Cobol :
1 EMPLOYEE-RECORD.
2 EMPLOYEE-NAME.
05 FIRST PICTURE IS X(20).
05 MIDDLE PICTURE IS X(10).
05 LAST PICTURE IS X(20).
02 HOURLY-RATE PICTURE IS 99V99.
C++ :
struct Student
{ int id;
float cgpa; };

Student stu1;
Student array1 [20];

Accessing members of a structure (struct) :


cin >> stu1.id;
cout << stu1.id << endl;
cin >> array1[m].cgpa;
cout << array1[m].cgpa << endl;
Pascal :
type
Student = record
Id : Integer;
cgpa : Real;
end;
var
stu1 : Student;
array1 : array [1..20] of Student;

Accessing members of a record:


ReadLn (stu1.Id);
WriteLn ( ID = , stu1.Id);
ReadLn (array1[m].Id);
WriteLn ( ID = , array1[m].Id);

Pointers : pointers contain memory addresses as their values.


Pointers provide the power of indirect addressing
Pointers also provide a method of dynamic storage management
C++ :
int *iptr;
float *yptr;
box *boxPtr;

Expressions & Assignment Statements


Expressions are the fundamental means of specifying computations in a programming language.
In programming languages, arithmetic expressions consist of operators, operands, parentheses, and
function calls.
The purpose of an arithmetic expression is to specify an arithmetic computation.
The operators can be unary, meaning they have a single operand, or binary, meaning they have two
operands.
Examples (C++) :

Unary operator : x++ x-- ++x --x


binary operator : a+b a*b a>b a<b
Operator C++ Pascal
Assignment operator = :=
Relational operators < > <= >= == != < > <= >= = <>
Arithmetic operators + - / * % + - / * mod div
Logical operators && || and or
An assignment statement can simply cause a value to be copied from one memory cell to
another. But in many cases, assignment statements include expressions with operators.

Examples (C++) :
x = 10;
y = (x + b) / (k * p);

A relational operator is an operator that compares the values of its two operands.
A relational expression has two operands and one relational operator.
Examples : a>b a<b a >= b a <= b

Type checking
Type checking is the process of identifying errors in a program based on explicitly or implicitly stated
type information. It is an effective technique to catch inconsistencies in programs. It can be used to catch
erros such as if parameters passed to a procedure do not match the signature of the procedure.
Type conversion( casting)
Implicit type casting:
double
float
long
int
short
byte
double t = 7;
( you can go up, but not down)

Explicit type casting


Ada : FLOAT ( INDEX )
C : int(speed);

Short cut assignment operators :


total = total + a; is equivalent to total += a;

A short-circuit evaluation of an expression is one in which the result is determined without evaluating all
of the operands and/or operators. This is to speed up the operation.
For example, the value of arithmetic expression ( 13 * a ) * ( b / 13 1 ) is independent of the value ( b /
13 1 ) if a is 0, because 0 * x = 0 for any x. So, when a is 0, there is no need to evaluate (b / 13 1 ).
Example 1 : 0 * ( y + x / 2 ) -> multiplication by 0, values of x and y are irrelevant. Evaluation need
not take place.
Example 2 : ( X and Y ) -> if X is false then the value of Y is irrelevant
Statement-Level Control Structures
A control structure is a control statement and the collection of statements whose execution it controls.
There are three types of control structures in the programs:

Sequential control structure


Selection control structure
Iteration/ Repetition control structure

Sequential Control Structure


The sequential control structure is the simplest of all the structures. The program statements are
executed sequentially one by one, starting from the first statement until the last statement. Statements
are executed in order their written.

The basic operations in a program:

input process output

Selection Control Structure


In selection control structures, the program executes particular statements depending on some
condition(s). Certain statements are to be executed only if certain conditions are true.
There are different types of selection statements such as one-way selection, two-way selection, and
multiple selection.

Examples of statements:
C++ : if statement, switch statement
Pascal : if statement, case statement
Java : if statement, switch statement

Iteration/ Repetition/ Looping Control Structure


An iterative statement is one that causes a statement or collection of statements to be executed zero,
one, or more times. Every programming language has included some method of repeating the execution
of segments of code.
Examples of statements:

C++ : for statement, while statement, do while statement.


Pascal : for statement, while statement, repeat statement
Java : for statement, while statement, do while statement

There are different types of loops that can be implemented in programs:

Counter-controlled loops
The loop is controlled by a control variable and the number of repetition is known.

Sentinel-controlled loops
The loop continues to execute as long as the program has not read the sentinel value.
Flag-controlled loops
A flag-controlled loop uses a Boolean variable to control the loop.

Subprograms
Subprograms, the fundamental building blocks of programs are sometimes called subroutine, function,
procedure and method. A subprogram is a sequence of code which performs a specific task. A
subprogram should compute and return a value and should not have any observable side effects, while
procedures may produce side effects by acting on either non-local variables or on parameters which
allow the transfer of data to the caller. Whereas, subroutine is used in assembly language and method is
defined by object-oriented paradigm.
The subprogram which calls other subprogram is called the Caller, while the subprogram that is called by
a subprogram is known as the Callee.
Definition: A subprogram is either a procedure or a function. The procedure specifies a sequence of
actions and is invoked by a procedure call statement
Advantages of using subprograms :
Subprograms help reduce the amount of redundancy in a program
Subprograms facilitate top-down design
( Top-down design is the tackling of a program from the most general to the most specific)

All subprograms have the following characteristics:

1. Each subprogram has a single entry point


2. There is only one subprogram in execution at any given time
3. The caller is suspended during execution of the called subprogram
4. Control always returns to the caller when the subprogram execution terminates

The components of a subprogram

Subprogram definition
Subprogram declaration or prototype : define name and type of subprogram and the
parameter list, it does not have the subprogram body.
Header : declares the name, type and parameters(if any) of the subprogram.
Call : explicit request to execute the particular subprogram
Parameter profile ( signature ) : describes the number of parameters, parameter order and
parameter type.

Variables

A local variable is a variable that is given local scope. This variable is accessible only from the
function or block in which it is declared.
A global variable is a variable that is accessible in every scope. A global variable can be modified
from anywhere. The use of global variables makes software harder to read and understand.

Parameters

Parameter profile(signature) describes the number of parameters, the type of parameters and
the order of the parameter.
Actual parameters : those in subprogram call
Formal parameters : those in subprogram header
The correspondence( binding) between actual and formal parameters is done by position.
Meaning that the first actual parameter is bound to the first formal parameter and so forth.
There are 2 types of formal parameters :- value parameters and reference parameters
A value parameter receives the value of the corresponding actual parameter.
A reference parameter receives the memory address of the corresponding actual parameter.
Subprograms can define their own local variables.

Implementation Models of parameter passing(parameter passing methods)

i. Pass-by-Value ( IN-MODE )
When a parameter is passed by value, the value of the actual parameter is used to
initialize the corresponding formal parameters. The value of the actual parameter must
be copied to the corresponding formal parameter. Any changes made to the formal
parameter will not change the value of the actual parameter. Pass-by-value is used by
default in C, C++, Pascal, Java
ii. Pass-by-Result ( OUT-MODE )
Pass-by-result is an implementation model for out-mode parameters. When a parameter
is passed by result, no value is transmitted to the subprogram. The corresponding formal
parameter acts as a local variable, but just before control is transferred back to the caller,
its value is transmitted back to the caller.
iii. Pass-by-Value-Result ( INOUT-MODE )
Pass-by-value-result is an implementation model for inout-mode parameters in which
actual values are copied. It is in effect a combination of pass-by-value and pass-by-result.
The value of the actual parameter is used to initialize the corresponding formal
parameter. At subprogram termination, the value of the formal parameter is transmitted
back to the caller.
iv. Pass-by-Reference ( INOUT-MODE )
Pass-by-reference is a second implementation model for inout-mode parameters. Rather
than copying data values back and forth, however, as in pass-by-value-result, the pass-
by-reference method transmits an access path, usually an address, to the called
subprogram. Thus, the called subprogram is allowed to access the actual parameter.
Actually, both the actual and formal parameters refer to the same memory location.
During program execution, changes made by the formal parameter permanently change
the value of the actual parameter.
v. Pass-by-name
Pass-by-name is a passing name as an actual parameter. The name is substituted for the
formal parameter throughout the body of the subprogram. Pass-by-name was
introduced in Algol 60 and Simula 67 but most of the imperative languages do not
support this feature.
vi. Passing subprograms as parameters
Functions and procedures may be passed as arguments to other subprograms. In the
parameter declarations for a subprogram that receives a function or procedure
argument, you must also declare the parameters for the function or procedure being
passed. For example,
procedure proc(function f(x:real):real; i,j:integer);
starts off a declaration for a procedure named proc. The parameters for this procedure
are a function and two integers. The function takes one real parameter and returns a
real value.
When a subprogram is passed as an argument to another subprogram, the parameter
lists must match exactly in both caller and callee. It is not enough for the parameters to
be compatible.

Support for Object-Oriented Programming


A language that is object-oriented must provide support for 3 key language features:

1. Abstract Data Types


2. Inheritance
3. Dynamic Binding

Abstract Data Types and Encapsulation Constructs


An abstraction is a view or representation of an entity that includes only the most significant attributes.
Abstraction allows one to collect instances of entities into groups in which their common attributes need
not be considered. Abstraction allows programmers to focus on essential attributes, while ignoring
subordinate attributes. There are 2 fundamental kinds of abstraction:

1. Process abstraction
2. Data abstraction

Introduction to Data Abstraction


An abstract data type is an enclosure that includes only the data representation of one specific data type
& the subprograms that provide the operations for that type. Through access controls, unnecessary
details of the type can be hidden from units outside the enclosure that use the type. Program units that
use an abstract data type can declare variables of that type, even though the actual representation is
hidden from them. An instance of an abstract data type is called an object. Data abstraction is one of the
most important components in object-oriented programming.

A user-defined abstract data type characteristic:

1. A type definition that allows program units to declare variables of the type but hides the
representation of objects of the type.
2. A set of operations for manipulating objects of the type.

Abstract Data Types in Java


All user-defined data types in Java are classes. All objects are allocated from the heap & accessed
through reference variables. Methods in Java can be defined only in classes.

Java abstract data type is both declared and defined in a single syntactic unit. Definitions are hidden
from clients by making them private. Rather than having private and public clauses in its class definitions,
in Java access modifiers can be attached to method & variable definitions.
Inheritance
Inheritance offers a solution to both the modification problem posed by data type reuse and the
program organization problem. Provides a framework for the definition of hierarchies of related classes
that can reflect the descendent relationship in the problem space.

Dynamic Binding
One purpose of dynamic binding is to allow software systems to be more easily extended during both
development & maintenance. All method calls are dynamically bound. Unless the called method has
been defined as final, in which case it cannot be overridden and all bindings are static. Static binding is
also used if the method is static or private, both of which disallow overriding.

You might also like