Topic 2: Language Design Principles: 2.1 Describing Syntax and Semantics
Topic 2: Language Design Principles: 2.1 Describing Syntax and Semantics
Brackets () to denotes the Braces {} in the right hand side When a single element must be
optional part of the right hand to indicate that the enclosed chosen from a group, the options
side of the grammar part can be repeated indefinitely are placed in parentheses &
left out together separated by the OR operator, |
Attribute Grammars
A grammar is a powerful tool for describing and analyzing languages. It is a set of rules by which valid
sentences in a language are constructed. A grammar is used to describe the syntax of programming
languages. It is a device used to describe more of the structure of a programming language than can be
describe with a context free grammar. It is an extension of context free grammar.
Compilation Process
Before a program can be run, it must be translated into a form in which it can be executed by a
computer. The software systems that do this are called compilers. An important role of the compiler is to
report any errors in the source program that it detects during the translation process.
A compiler / an interpreter :
translates high-level language statements into machine codes
issues error messages
A compiler translates an entire program before executing the statements. A language that a compiler
translates is called a source language.
An interpreter translates one statement at a time, executes the statement as soon as it is translated.
Lexical Analyzer is the first phase of a compiler
Its main task is to read input characters and produce as output a sequence of tokens that parser uses for
syntax analysis
Figure 2.1: The Compilation Process
The language that compiler translates is called the source language. The process of compilation and
program execution takes place in several phases, the most important of which are shown in Figure 1.1.
The lexical analyzer gathers the characters of the source program into lexical units. The lexical units of a
program are identifiers, special words, operators, and punctuation symbols. The lexical analyzer ignores
comments in the source program, because the compiler has no use for them. The syntax analyzer takes
the lexical units from the lexical analyzer and uses them to construct hierarchical structures called parse
trees. These parse trees represent the syntactic structure of the program.
The intermediate code generator produces a program in a different language, at an intermediate level
between the source program and the final output of the compiler, the machine language program.
Intermediate languages sometimes look very much like assembly languages and in fact sometimes are
actually assembly languages. The semantic analyzer is an integral part of the intermediate code
generator. The semantic analyzer checks for errors that are difficult if not impossible to detect during
syntax analysis, such as type errors. Optimization, which improves programs (usually in their
intermediate code version) by making them smaller or faster or both, is often an optional part of
compilation. In fact, some compilers are incapable of doing any significant optimization.
The code generator translates the optimized intermediate code version of the program into an
equivalent machine language program. The symbol table serves as a database for the compilation
process. The primary contents of the symbol table are the type and attribute information of each user-
defined name in the program. This information is placed in the symbol table by the lexical analyzer and
syntax analyzer and is used by the semantic analyzer and the code generator.
Although the machine language generated by a compiler can be executed directly on the hardware, it
must nearly always be run along with some other code. Most user programs also require programs from
the operating system. Among the most common of these are programs for input and output. Before the
machine language programs produced by a compiler can be executed, the required programs from the
operating system must be found and linked to the user program. The process of collecting system
programs and linking them to user programs is called linking and loading, or sometimes just linking. It is
accomplished by a system programs program called a linker.
Variables : a variable is a name which is associated with a memory location to store a value.
Attributes of a variable :
Name : names that can be declared as variables
Memory address : storage location for a particular variable
Value : the value that is associated with the variable
Type : the type of variables determines the range of values the variable can have and the set of
operations that are defined for values of the type.
Lifetime : refers to the period of time that the variable is bound to an address(starting when we create
it and end it when it is destroyed)
Scope : defines the capability of variables to be accessed within a program segment(local or global?)
Binding:
Process of association of attribute variables during compilation and execution of a program. 2 types of
binding :
Static binding : early binding, binds the attributes at compile time and translates into machine language
Dynamic binding : binds attributes at run-time
Data Types
Constants
A constant stores only one value, and that value cannot be changed during program execution. The
advantage is that when the value changes, you dont have to edit the entire program, you just have to
edit the declaration statement.
Examples :
C++ : const int MAX = 100;
Java : final int MAX = 100;
Pascal : const MAX = 100;
}
cout<<"last trial : "<<trial ;
In the codes above the loop may fail to terminate due to representational error. Representational error
is an error caused by imprecise representation of the decimal number in binary form. The result of
adding 1.1 will not exactly be 11, resulting in an infinite loop.
double x = 100000.00;
double y = 0.000001234;
double z ;
z = x + y;
cout<<" Z : "<<z;
In the example above, the output is 100000, which is wrong. This is caused by processing a large
number with a much smaller number, whereby the larger number will cancel out the smaller number.
This error is called cancellation error.
Multiplying 2 very small numbers may result to a very small number that it is represented as zero, this is
called arithmetic underflow. While multiplying 2 very large numbers may result to a very large number to
be represented, this is called arithmetic overflow.
Boolean : the simplest data type, has only 2 values, true or false. Booleans are often used to represent
switches or flags in a program. In some programming languages Boolean variables are represented as the
binary numbers 1 ( true ) and 0 ( false ).
Character : character data are stored as numeric codings. The most commonly used was the ASCII code.
Structured data types: arrays, strings, enumeration, pointers
User defined ordinal types
An ordinal type is one in which the range of possible values can be easily associated with the set of
positive integers. 2 types of ordinal types : enumeration and subrange
Strings :A String consists of sequences of characters.
Enumeration types: enumeration is a way to represent data types that allow the values to be associated
with a range of integers
Subrange : a subrange type is a contiguous subsequence of an ordinal type. Subrange types were
introduced by Pascal and are included in Ada
Pascal :
type
Letter = A .. Z;
DaysInMonth = 1..31;
Array : an array is a group of contiguous memory locations that all have the same name and the same
type. Each element is identified by its position in the array.
Record type : First Introduced by COBOL in the early 1960s, a record is a collection of related items.
Unlike an array, the individual components of a record can contain data of different types.
The record data types are used for file/data base applications
In C and C++, records are supported with the struct data type. In Pascal, we can use record data
type.
Cobol :
1 EMPLOYEE-RECORD.
2 EMPLOYEE-NAME.
05 FIRST PICTURE IS X(20).
05 MIDDLE PICTURE IS X(10).
05 LAST PICTURE IS X(20).
02 HOURLY-RATE PICTURE IS 99V99.
C++ :
struct Student
{ int id;
float cgpa; };
Student stu1;
Student array1 [20];
Examples (C++) :
x = 10;
y = (x + b) / (k * p);
A relational operator is an operator that compares the values of its two operands.
A relational expression has two operands and one relational operator.
Examples : a>b a<b a >= b a <= b
Type checking
Type checking is the process of identifying errors in a program based on explicitly or implicitly stated
type information. It is an effective technique to catch inconsistencies in programs. It can be used to catch
erros such as if parameters passed to a procedure do not match the signature of the procedure.
Type conversion( casting)
Implicit type casting:
double
float
long
int
short
byte
double t = 7;
( you can go up, but not down)
A short-circuit evaluation of an expression is one in which the result is determined without evaluating all
of the operands and/or operators. This is to speed up the operation.
For example, the value of arithmetic expression ( 13 * a ) * ( b / 13 1 ) is independent of the value ( b /
13 1 ) if a is 0, because 0 * x = 0 for any x. So, when a is 0, there is no need to evaluate (b / 13 1 ).
Example 1 : 0 * ( y + x / 2 ) -> multiplication by 0, values of x and y are irrelevant. Evaluation need
not take place.
Example 2 : ( X and Y ) -> if X is false then the value of Y is irrelevant
Statement-Level Control Structures
A control structure is a control statement and the collection of statements whose execution it controls.
There are three types of control structures in the programs:
Examples of statements:
C++ : if statement, switch statement
Pascal : if statement, case statement
Java : if statement, switch statement
Counter-controlled loops
The loop is controlled by a control variable and the number of repetition is known.
Sentinel-controlled loops
The loop continues to execute as long as the program has not read the sentinel value.
Flag-controlled loops
A flag-controlled loop uses a Boolean variable to control the loop.
Subprograms
Subprograms, the fundamental building blocks of programs are sometimes called subroutine, function,
procedure and method. A subprogram is a sequence of code which performs a specific task. A
subprogram should compute and return a value and should not have any observable side effects, while
procedures may produce side effects by acting on either non-local variables or on parameters which
allow the transfer of data to the caller. Whereas, subroutine is used in assembly language and method is
defined by object-oriented paradigm.
The subprogram which calls other subprogram is called the Caller, while the subprogram that is called by
a subprogram is known as the Callee.
Definition: A subprogram is either a procedure or a function. The procedure specifies a sequence of
actions and is invoked by a procedure call statement
Advantages of using subprograms :
Subprograms help reduce the amount of redundancy in a program
Subprograms facilitate top-down design
( Top-down design is the tackling of a program from the most general to the most specific)
Subprogram definition
Subprogram declaration or prototype : define name and type of subprogram and the
parameter list, it does not have the subprogram body.
Header : declares the name, type and parameters(if any) of the subprogram.
Call : explicit request to execute the particular subprogram
Parameter profile ( signature ) : describes the number of parameters, parameter order and
parameter type.
Variables
A local variable is a variable that is given local scope. This variable is accessible only from the
function or block in which it is declared.
A global variable is a variable that is accessible in every scope. A global variable can be modified
from anywhere. The use of global variables makes software harder to read and understand.
Parameters
Parameter profile(signature) describes the number of parameters, the type of parameters and
the order of the parameter.
Actual parameters : those in subprogram call
Formal parameters : those in subprogram header
The correspondence( binding) between actual and formal parameters is done by position.
Meaning that the first actual parameter is bound to the first formal parameter and so forth.
There are 2 types of formal parameters :- value parameters and reference parameters
A value parameter receives the value of the corresponding actual parameter.
A reference parameter receives the memory address of the corresponding actual parameter.
Subprograms can define their own local variables.
i. Pass-by-Value ( IN-MODE )
When a parameter is passed by value, the value of the actual parameter is used to
initialize the corresponding formal parameters. The value of the actual parameter must
be copied to the corresponding formal parameter. Any changes made to the formal
parameter will not change the value of the actual parameter. Pass-by-value is used by
default in C, C++, Pascal, Java
ii. Pass-by-Result ( OUT-MODE )
Pass-by-result is an implementation model for out-mode parameters. When a parameter
is passed by result, no value is transmitted to the subprogram. The corresponding formal
parameter acts as a local variable, but just before control is transferred back to the caller,
its value is transmitted back to the caller.
iii. Pass-by-Value-Result ( INOUT-MODE )
Pass-by-value-result is an implementation model for inout-mode parameters in which
actual values are copied. It is in effect a combination of pass-by-value and pass-by-result.
The value of the actual parameter is used to initialize the corresponding formal
parameter. At subprogram termination, the value of the formal parameter is transmitted
back to the caller.
iv. Pass-by-Reference ( INOUT-MODE )
Pass-by-reference is a second implementation model for inout-mode parameters. Rather
than copying data values back and forth, however, as in pass-by-value-result, the pass-
by-reference method transmits an access path, usually an address, to the called
subprogram. Thus, the called subprogram is allowed to access the actual parameter.
Actually, both the actual and formal parameters refer to the same memory location.
During program execution, changes made by the formal parameter permanently change
the value of the actual parameter.
v. Pass-by-name
Pass-by-name is a passing name as an actual parameter. The name is substituted for the
formal parameter throughout the body of the subprogram. Pass-by-name was
introduced in Algol 60 and Simula 67 but most of the imperative languages do not
support this feature.
vi. Passing subprograms as parameters
Functions and procedures may be passed as arguments to other subprograms. In the
parameter declarations for a subprogram that receives a function or procedure
argument, you must also declare the parameters for the function or procedure being
passed. For example,
procedure proc(function f(x:real):real; i,j:integer);
starts off a declaration for a procedure named proc. The parameters for this procedure
are a function and two integers. The function takes one real parameter and returns a
real value.
When a subprogram is passed as an argument to another subprogram, the parameter
lists must match exactly in both caller and callee. It is not enough for the parameters to
be compatible.
1. Process abstraction
2. Data abstraction
1. A type definition that allows program units to declare variables of the type but hides the
representation of objects of the type.
2. A set of operations for manipulating objects of the type.
Java abstract data type is both declared and defined in a single syntactic unit. Definitions are hidden
from clients by making them private. Rather than having private and public clauses in its class definitions,
in Java access modifiers can be attached to method & variable definitions.
Inheritance
Inheritance offers a solution to both the modification problem posed by data type reuse and the
program organization problem. Provides a framework for the definition of hierarchies of related classes
that can reflect the descendent relationship in the problem space.
Dynamic Binding
One purpose of dynamic binding is to allow software systems to be more easily extended during both
development & maintenance. All method calls are dynamically bound. Unless the called method has
been defined as final, in which case it cannot be overridden and all bindings are static. Static binding is
also used if the method is static or private, both of which disallow overriding.