Cit 430
Cit 430
By
1
Diversity of Languages
2
They tend to use languages they are familiar with for a new project
even if not suitable.
If they were familiar with other languages and their features, they
would be in a better position to make informed language choices.
3. Increased Ability to Learn new Languages
Computer programming is a young discipline with design
methodologies, software development tools and programming
languages still in continuous evolution.
Learning a new programming language can be lengthy and
difficult.
Once a thorough understanding of the fundamental concepts of
languages is acquired, it becomes easier to see how concepts are
incorporated into the design of the language being learned.
4. Better Understanding of Implementation Significance
An understanding of the implementation issues of the concepts of
programming languages leads to the ability to use a language
more intelligently as it was designed.
For example, a programmer who knows nothing about how
recursion is implemented often does not know that a recursive
algorithm can be far slower than an equivalent iterative algorithm.
5. Increased Ability to Design new Languages
6. Overall Advancement of Computing
Many believed ALGOL 60 would have displaced FORTRAN in the
sixties.
3
This is because of its elegance and much better control statements
among others.
But programmers and software development managers did not
clearly understand its conceptual design.
If those who choose languages are better informed, perhaps
better languages would more quickly squeeze out poorer ones.
Language Categories
Programming Domains
1. Scientific Applications
Have simple data structures but large numbers of floating-point
arithmetic computations.
Common data structures are:
i. Arrays and matrices
Common control structures are:
i. Loops and selections
Examples: FORTRAN and ALGOL
2. Business Applications
Language for business application software
4
Example: COBOL
3. Artificial Intelligence
Characterized by the use of symbolic rather than numeric
computations
Examples: Lisp, Prolog
4. Systems Programming
Systems software consists of operating system and programming
support tools.
i. The languages must have fast execution and features that
allow software interfaces to external devices
Examples: PL/1, PL/S, BLISS, Extended ALGOL, BCPL, Coral 66,
Jovial, Java and XPL.
The most widely used are: C and C++
5. Special-Purpose languages
They are domain-specific languages
Examples:
i. String Manipulation: COMIT, SNOBOL, SNOBOL 4
ii. List Processing Languages: IPL-V, Lisp
iii. Simulation Languages
Used for simulation of system e.g simulation of traffic
system to predict how a proposed model will perform
as traffic densities increase and show where
bottlenecks could occur.
Examples: GPSS (General-Purpose Simulation System)
and Simula 67
5
iv. Scripting Languages
Intended for communicating with/ plugging together
useful components in other programming languages
like Java, C++ and HTML.
They are interpreted instead of being compiled
Examples: awk, Perl, Python, TCl, JavaScript, Php and
ASP.
A. Readability: The ease with which programs can be read and understood.
B. Writability: A measure of how easily a language can be used to create
programs for a chosen problem.
C. Reliability: A program is reliable if it performs to its specifications under
all conditions.
D. Cost: Financial and computing resources.
i. Overall Simplicity
Size of Basic Types
o A language that has large number of basic components is
more difficult to learn than one with a small number of basic
types
6
Feature Multiplicity: Having more than one way to accomplish a
particular task might be complicating for some programmers e.g
count = count +1, count + = 1, count ++ and ++count in C++
and C all have the same meaning when used as standalone
expressions although the last two have slightly different
meanings.
Overloading
o Overloading is a useful feature, but can reduce program
readability.
ii. Orthogonality
This means that a relatively small set of primitive constructs can be
combined in a relatively small number to build control and data
structures.
It also means that every possible combination of primitives is legal
and meaningful.
The idea is that an orthogonal language will be simple.
Extreme use of orthogonality can lead to unnecessary complexity
e.g when conditional statements are used at the left side of
assignment statements as in ALGOL 68.
iii. Control Statements
The availability of control structures aid the readability of a
program.
Indiscriminate use of ‘goto’ reduces program readability.
iv. Data Types and Structures
7
Availability of facilities for defining data types and data structures
aids readability.
A language which uses numeric type for Boolean type is less
readable.
v. Syntax Considerations
Identifier Forms: Restricting identifiers to very short lengths
reduces readability
Special Words: The use of special words as variable names
reduces readability.
vi. Forms and Meaning
The appearance of a statement should at least partially indicate its
purpose to aid readability.
i. Type Checking
Compile-time type errors checking is better and is less expensive to
make the required repairs.
8
ii. Exception Handling
A language that offers facility to intercept run-time errors and take
corrective measures and then continue greatly aids reliability.
iii. Aliasing
Aliasing means having two or more distinct referring methods or
names for the same memory cell.
Aliasing is a dangerous feature in a programming language.
iv. Readability and Writability
The easier a program is to write, the more likely is to be correct,
which eventually affects reliability.
Readability affects reliability in both the writing and maintenance
phases of the life cycle.
9
Influences on Language Design
1. Computer architecture
The imperative languages were designed around von Neuman
architecture which was prevalent that time.
Data and instructions are stored in the same memory and have to
be piped to the CPU for execution.
Results of operations have to be piped back to the memory
represented by the left hand side of assignment operation.
Variables of imperative languages model the memory cells.
Assignments statements are based on piping operations.
2. Programming design methodologies
The process-oriented and data-oriented program designs all
brought about languages to support them.
10
Language description must take care of the diversity of people who must
understand it.
Most new programming languages are subjected to a period of scrutiny by
potential users before their design are completed.
Language study is divided into
Syntax and
Semantics
Syntax: The form of a language’s expressions, statements and program
units.
Semantics: The meaning of the expressions, statements and program
units
Example: while (<expr>)
{
<statements>
}
The above is the syntax of a while loop in C++.
The meaning is that the statement should be executed as long as the
expression is true.
11
Formal descriptions of the syntax of programming languages, for
simplicity's sake, however, often do not include the descriptions of the
lowest-level syntactic units called lexemes.
This is normally given by a lexical specification different from syntactic
description.
The lexeme of a programming language include its
Identifiers
Literals
Operators
Special words
The semantics should follow directly from syntax in a well-designed
language. That is, the form of a statement should suggest strongly the
meaning of the statement.
Token: The token of a language is the category of its lexemes.
Example: Consider the C++ statement: index = 2 * count + 17
Lexemes Tokens
index Identifier
= Equal_sign
2 Int_literal
* Mult_op
count Identifier
+ Plus-op
17 Int_literal
; Semicolon
Generally, a language can be defined formally in two ways;
12
Recognition
Generation
Recognition
A recognition device that is capable of reading strings of characters from
the alphabet of the language is constructed.
The device will indicate whether an input string belongs to the language
or not.
Because most useful languages are, for all practical purposes, infinite, this
seems to be a lengthy and ineffective process.
Syntax analyzer or parser of a compiler is a recognizer.
Generation
A generator is a device that can be used to generate the sentences of a
language.
It can be thought of as a button that, when pressed, produces a sentence
of the language.
Because the particular sentence produced by it when pressed is
unpredictable, it seems to be a device with limited usefulness as a
language descriptor.
Some generators can, however, read and understand sentences of a
language.
13
BNF is a metalanguage.
A metalanguage is a language used to describe other languages.
BNF is a generative device for defining languages.
The sentences of the language are generated through a sequence of
application of rules, beginning with a special nonterminal of the grammar
called the start symbol.
Example
<digit>::=0|1|2|3|4|5|6|7|8|9
<letter>::=a|b|c|d|…x|y|z
<identifier>::=<letter>|<identifier><digit>|<identifier><letter>
At each stage, the leftmost nonterminal has been replaced by the right
14
part of its defining production rules.
The sequence of terminals and nonterminals produced at each stage is a
derivation is called a sentential form.
The final sentential form with no nonterminals is called a sentence.
The sentence of a derivation is best shown by a derivation tree.
The tree for ‘ch1’ is shown below:
15
Both approaches are used in syntax analysis phase of compilers.
Recursion in Grammar
A grammar defining a programming language can be left-recursive or
right-recursive.
Example
<identifier>::=<identifier><letter>
The above is left-recursive because the nonterminal being defined, that
is, <identifier>, is the leftmost symbol in the right part.
Context-Free BNF
A BNF grammar is context-free if the left part always consists of a single
nonterminal.
This means that the left part can always be replaced by one of its
alternative right parts, irrespective of the context in which the left part
appears.
Ambiguity
A grammar that generates a sentence for which there are two or more
distinct derivation trees is said to be ambiguous.
Example of Ambiguity
Consider the definition:
<statement>::=<conditional statement>|begin<statement>end
<conditional statement>::=if <condition> then <statement>|if <condition> then
<statement>else <statement>
16
Two different derivation trees exist for the conditional statement:
if<condition> then if <condition> then <statement> else <statement>
The problem is, knowing to which then the else belongs.
The derivation trees are shown in a and b below:
17
Note: A grammar is always ambiguous if it is both left- and right-recursive
with respect to the same nonterminal.
Example
<exp>::=<exp> + <term> + <term> will be written as exp -> exp ‘+’ term
18
2. Axiomatic Semantics
3. Denotational Semantics
Operational Semantics
The idea is to describe the meaning of a program by executing its
statements on machine, whether real or simulated.
The changes that occur in the machine’s state when it executes a given
statement define the meaning of that statement.
Example
Let the state of a machine be the values of all its registers and memory
locations including condition codes and status registers.
If one simply records the state of the computer, executes the instruction for
which meaning is sought, and then examines the machine’s new state, the
semantics of that instruction are clear.
Axiomatic Semantics
It is based on symbolic logic and developed in conjunction with a method to
prove the correctness of programs.
Uses rules of inference to deduce the effect of executing a construct.
The meaning of a statement, or group of statements S is described in terms
of:
i. The condition P (the pre-condition) that is assumed or asserted to be
true before S is executed, and
ii. The condition Q (the post-condition) which can be deduced to be true
19
after execution of S.
This is usually written as:
{P} S {Q}
Example
Consider the assignment statement x:= x + 1
If the condition x ≥ 0 is true before execution of the statement, then x > 0
will be true after execution of the statement.
This is written as:
{x >= 0} x:=x + 1 {x > 0}
Denotational Semantics
It is the most rigorous widely known method for describing the semantics of
a program.
It is based on recursive function theory.
The fundamental concept of denotational semantics is to define for each
language entity, both a mathematical object and a function that maps
instances of the entity onto instances of the mathematical object.
20
Let the domain of semantic values of the objects be N, the set of non-
negative decimal integer values.
It is these objects that we want to associate with binary numbers
Let the semantic function be Mbin
The type of a data item determines its allowed values together with the
set of operations that can be used to manipulate these values.
Data items have a value and a type and may be held in what are called
variables.
Variables have a name, various attributes and refer to an area of computer
store.
There is a distinction between the following:
o The name of a variable : Identifier
o Where the variable is stored: Reference or Address
o The value stored.
1. Identifier
21
An exception is Perl where an identifier has prefix to show whether it is
scalar ($) and array @ or a hash(%) variable.
Spaces
Spaces are not normally allowed within identifiers, but some languages
like Ada, C, C++, etc allow the use of underscore.
Although Java allows the use of underscore, the convention is to use
internal capitals instead e.gleftLink instead of left_link.
Length
Case Sensitivity
Binding: The time when different language features are associated with, or
bound to one another.
Name-Declaration Binding/Scope
22
Java, C, C++ and Ada allow us to determine scope by looking at the
program text alone i.e at compile time. This is called Static Scope.
Consider the Java class declaration:
class Example
{
private double x, y;
public voidop1()
{
int y, z;
y=34;
z=27.4;
}//
}//Example
y and z declared inside op1 are local variables. They are not visible outside
op1
x and y declared under Example are global variables. They are visible in
op1 and op2.
The use of y in op1 and op2 refer to local variables. Ada, however, allows y
declared under Example to be accessed inside op1 and op2 by the name
Example.y.
A language that employs static binding is said to be statically typed.
When there are no loopholes in the type checking, the language is said to
be strongly typed e.g Ada and Algol 68.
23
Advantages of Static Typing
Static typing leads to programs which are:
Reliable: Because errors are detected at compile-time.
Understandable: Because the connection, or binding between the use
of an identifier and its declaration can be determined from the program
text.
Efficient: Because type checks do not have to be made at run time.
Dynamic Scope: The binding between the use of an identifier and its declaration
depends on the order of execution, and so is delayed until run time.
Declaration-Reference Binding/Scope
1. On Block Entry
This is when the procedure or method is called. When control is
returned from the method call, the memory is de-allocated.
Data members of class are allocated when the object is created and de-
allocated when the object expires.
2. Static Variables
Binding the declaration of a local variable to a different store location (i.e
reference) each time a procedure or method is entered has the
consequence that a local variable does not retain the value it had when
the method was last executed.
24
To handle this problem, Algol 60, C, C++ and Delphi allow the declaration
of static local variable.
In object-oriented languages, there is no need for static variables as
information that needs to be held from one method call to the other can
be declared at the class level called class variable.
For example, only one instance of counter variable will be created in the
following:
class Example
{
private static int counter=0;
…
}
Reference-Value Binding
The binding of a variable to its value involves three bindings:
i. The binding of the variable’s name to its declaration (name-
declaration binding) .
ii. The binding of its declaration to a store location (declaration-
reference binding).
iii. The binding of the store location to a value (reference-value
binding).
Dereferencing: The process of finding the value, given the reference is called
dereferencing.
25
Constants
3. Type Definitions
Kinds of Types
Scalar Types
Numeric Data Types: Integer, floating point and fixed point and complex.
Logical Type: Called Boolean
Character Type
Scalar types can be split into two categories:
i. Discrete/Ordinal Type: Where each value (except the minimum and the
maximum values) has a successor and a predecessor.
ii. Others: Like floating-point types for which this is not the case.
Built-in Types: These are types that are immediately available to the
programmer.
26
Specifying Type Information
The arithmetic operators (+,-,*,/) are defined for numeric data types.
However, the effect of these operators depends on the context of use.
Example
2 + 3 ---Integer addition
When the effect of an operator depends on the type of its operands, the
operator is said to be overloaded.
C, C++ and Java use a single overloaded division operator.
o E.g 7/2 gives 3----Integer division
o 7.0/2.0 gives 3.5----Floating-point division
Other operators available for numeric types are:
o Exponentiation
o Relational operator e.g <, <=, >, >=
o Equality e.g =, ==, !=, <>, .EQ., and .NE.
27
Integer
Floating-Point Numbers
As the exponent increases, the range increases and the precision decreases
Floating-point literals can be written with or without an exponent e.g 3.75,
2.5E+7, 0.17E-2, are valid in C, C++ and Java.
5. Logical Types
Languages also support logical types
In some languages, the name given to logical types is Boolean.
28
In C, a false logical expression returns 0, although any non-zero value is
treated as true.
C++(bool) and Java (boolean) have Boolean type.
6. Character Types
Operations on Character
7. Enumeration Type
29
Equality
In C and C++, enumeration types cannot be input and output and
enumeration literals must be identifiers.
Garbage
Structured Data
30
In object-oriented languages, records are displaced with classes.
Features of Array
i. Component Type: All elements are homogeneous
ii. Access to Components: Through their positions
iii. Efficiency: Efficient access at run time, but not as much as simple variable due
to computable index.
iv. Use:
a. Used in conjunction with loops.
b. In solution of scientific and engineering problems
c. If the position of the required component has to be computed during
run time, then arrays are an appropriate data structure.
Attributes of an Array
Example: C++
int array[4]; //An array of integer containing four elements and accessible via
the subscript 0 to 3.
In Java, an array is an object and is created using the new operator e.g
31
To create two-dimensional arrays in C, C++ and Java, we write
o double a[10][16]; // C or C++
o double [][]a=new double[10][16]; // Java
To access the element of array a at row i, column j, we write a[i][j]
To create a nested list in Python, we write
o a = [[1, 2, 3], [4, 5, 6]]
It is possible to write:
double total=0.0;
b= array;
for(int i=0;i<16;i++) total +=*b++;
32
which is more efficient.
1. This is the binding of the name of an array variable to its required amount of
storage.
2. Three ways:
1. Static Arrays (Compile-time binding)
The size is fixed when the array is declared and cannot change
during run time.
Advantages:
- Simple to compile
- Fast at run time
Disadvantages:
- Inflexible
33
- Array must be declared maximum size, which can lead to storage wastage.
Array Parameter
When a function or procedure is called, the types of the actual and formal
parameters must match.
Two approaches are employed:
1. Structural Equivalence: This means that the actual and formal
parameters have the same structure.
2. Name Equivalence: This assumes that two variables have the same type
if they have the same name.
34
Operations on Complete Arrays or Slices of Arrays
35
Associative Arrays
An associative array is an array in which the index (called the key) can be a
value of any type instead of computable scalar type of ordinary array.
The indices of non-associative arrays never get stored, but the user-defined
keys must be stored in the structure for associative arrays. Each element of
an associative array is a pair of entities, a key and a value.
Example: Perl
36
Its definition specifies its name and the types of the various fields of the
record.
A record groups logically related data as one structured variable.
The individual items in a record are accessed by name (using dot notations).
In C and C++, a record is called a struct and the components are called
members.
Type Date can be defined as :
struct Date
{
int day;
int month;
int year;
};
Variant Records
A variant record has a fixed part, with fields as in normal record followed by
a variant part, in which there are alternative fields.
Specific actions are taken based on the contents of the variant part of the
record e.g if a record has a variant part that consists of whether a person is
an undergraduate or postgraduate. A statement can be written to print
advisor name is an undergraduate, otherwise, print the course and supervisor
if a postgraduate.
37
Features of Record
Strings
Operations on String
i. String comparison
ii. Selection of substring
iii. Searching for substring
iv. Moving of strings
v. Replacement of substrings with a string
vi. Appending one string to the end of the other
Strings can be implemented as:
i. Fixed length
ii. Variable length
C represents strings as arrays of characters, but variable up to a maximum
defined length.
A C-style strings, which consists of an array of characters terminated by the
null character '\0', and which have properties over and above those of an
ordinary array of characters, as well as a whole library of functions for
dealing with strings represented in this form. Its header file is ‘cstring’.
In C++, there are two types of strings, C-style strings, and C++-style strings.
A C++ style string is a ‘class’ data type. The objects of C++ style string are
instances of the C++ ‘string’ class.
38
There is a library of C++ string functions as well, available by including the
’string’ header file.
In Java, string is an object. Many string operations in Java make it appear
that String is a primitive type like int, double or chare.g
In C# and Java, the data type String is treated as reference type. Instance of
Strings are treated as (immutable) objects in both languages, but support for
string literals provides a specialized means of constructing them. C# also
allows verbatim strings for quotation without escape sequences, which also
allow newlines.
In Python, string is an object with the following operations:
39
Sets
Files
Files are large collections of data that are kept on secondary storage devices.
Most modern languages like C, C++, Java, C# and Python allow for the
creation of files.
Class
40
Abstraction
An abstraction is a view or representation of an entity that includes only the
most significant attributes.
It allows a programmer to collect instances of entities into groups in which
their common attributes are not considered (abstracted away). Only the
distinguishing ones are considered.
It is a weapon against the complexity of programming; it simplifies the
programming process.
Types of Abstraction
1. Process abstraction
2. Data abstraction
Process Abstraction
All subprograms are process abstractions because they provide a way for
a program to specify that some process is to be done, without providing
the details of how it is to be done (at least in the calling program).
For example, the call sortList(list, size) is an abstraction because the
details of the implementation of the subprogram is not specified.
Data Abstraction
An abstract data type is an enclosure that includes only the data
representation of one specific data type and the subprograms that
provide the operations for that type.
Programs units that use an abstract data type declare variables of that
type called objects.
41
2. User-Defined Abstract Data Type
A facility for defining ADTs in a language must provide a syntactic unit that
encloses the type definition and subprograms definitions of the
abstraction operations.
It must be possible to make the type name and subprogram headers
visible to clients (programs that use ADTs) of the abstraction.
Few, if any, general built-in operations should be provided for objects of
ADTs, other than those provided with the type definition.
Some operations required by ADTs even though not universal, must be
provided by the designer of the type e,g iterators, accessors, constructors
and destructors.
Should the language support parameterized ADTs? A parameterized ADT
allows for storing elements of any scalar type.
42
Encapsulation
Information Hiding
A C++ class can contain both hidden and visible entities (hidden from or
visible to clients of the class)
Entities for hiding are placed in a private clause.
Visible ones are written in a public clause.
43
When protected clause is used, it means that the members of a derived
class can access the protected member whereas, it cannot access private
members.
Constructors are used to initialize the data members of newly created
objects.
They are called when an object of the class type is created.
They have the same name as the class whose objects they initialize.
A destructor can also be included in a class and is called when the lifetime
of an instance of the class ends.
All heap-dynamic objects live until explicitly de-allocated with delete
operator.
The name of a destructor is the class’s name, preceded by a tilde (~).
Constructors and destructors do not have return statements.
They can be explicitly called.
Java support for ADT is similar to that of C++, but there are, a few
important differences:
o All user-defined types in Java are classes (Java does not include
structs).
o All objects are allocated from the heap and accessed through
reference variables.
o Subprograms (methods) in Java can be only in classes; Java ADTs
are both declared and defined in a single syntactic unit.
o Definitions hidden from clients by making them private.
o Rather than having private and public classes in its class definitions,
in Java, access modifiers can be attached to a method and variable
definitions.
Involves the use of records and classes in conjunction with pointer variables.
Components of a record or a class can be specified to be a pointer type that
allows data structures in which individual records are linked to others.
44
List
A list is a sequence of zero o more objects of some given type that may
allow the deletion of its objects and the insertion of an object and so may
vary in length.
A list can be implemented using array or as a linked list.
The code below implements the above linked list and prints out the contents.
#include <iostream>
using namespace std;
class Node
{
private:
char data;// to contain each character
Node* next;// to point to the nest node
public:
Node(char c, Node*p)
{
data=c;
next=p;
}// constructor
friend class List;// the class that makes use of class Node
};//Node
45
class List
{
private:
Node* head;// a pointer to store the memory location of the node
public:
List()
{
head=NULL;
}// constructor
void List::add(char c)
{
head=new Node(c, head);// allocate memory for a new node and store in head
}// add
void List::inList()
{
Node*temp=head;
while(temp!=NULL)
{
cout<<temp->data<<"\t\n";
temp=temp->next;
}
}//inList
int main(void)
{
List aList;
aList.add('a');
aList.add('b');
aList.add('c');
aList.inList();
46
system("pause");
return 0;
}
Trees
A tree is a data structure that consists of a node called its root together with
zero or more subtrees each of which is a tree.
Levels of a Tree
The tree root is at level one, its children are at level 2; their children at level 3
and so on.
Degree of a Tree
The degree of a node is the number of children it has.
The degree of a leaf is 0.
The degree of a tree is the maximum of the degree of its nodes. (i.e. the
degree of the node with the highest degree).
Binary Tree
Each element in a binary tree has exactly (one or both of these subtrees may
be empty). Each element in a tree can have any number of subtrees.
The subtrees of each element in a binary tree are ordered, that is we
distinguish between the left and the right subtrees. The subtrees in a tree
are unordered.
48
Operations on ADT Binary Tree
Operations:
i. isEmpty() : return true if empty, return false otherwise root(); return the
root element
ii. makeTree(root, left, right) : create a binary tree with root as the root
element, left(right) as the left subtree.
iii. removeleftsubtree(): remove the left subtree and return it;
iv. removerightsubtree(): remove the right subtree and return
v. Preorder : preorder traversal of binary tree
vii. Inorder :Inorder traversal of binary tree
viii. Postorder :Postorder traversal of binary tree
ix. Levelorder: Level-order traversal of binary tree
Expressions
! highest
*,/,%
+,-
49
<, >,<=, >=
==, !=
&&
||
lowest
iii.If the operators in ii have the same precedence, apply them in order
from left to right.
Fortran, Ada, C, C++, C# and Python have similar rules.
Operators in expressions could be binary (dyadic) or unary (monadic).
Boolean Expressions
In a Boolean expression, the value returned must either be true or false.
They often involve relational operators, as in
o a + b> 0
o x <= y
A Boolean expression can also contain variables of type Boolean and Boolean
operators.
The common Boolean operators are not, and and or in Pascal and Ada and !,
&& and || in Java, C and C++.
Ada, Java, C and C++ also include exclusive or (xor in Ada and ^ in Java, C and
C++).
Algol 60 has two extra Boolean operators for implies and equivalence.
Mixed-Mode Expressions
Statements
Statements are commands which perform actions and change the state.
Typical statements are:
o Assignment statement
o Conditional statement
o Iterative statement
o Procedure and method calls
Assignment Statement
where el is the expression on the left-hand side that gives a reference as its
result.
er is the expression on the right-hand side that gives a value as its result.
Assignment Operators
Consider a = a + expression;
C, C++, Java, Algol 68 and Modula-2 provide short-hand forms as:
o a +=expression
o others are -=,*=,and /=
C, C++ and Java have:
o ++ increment operator
o -- decrement operator
Consider
o a =1; b = ++a // set a to 1, increment the value of a by 1 and then assign
its value to b
o a=1; b=a++; // sets a to 1, then assign a to b before incrementing a.
Languages like Algol 68, C, C++ and Java allow multiple assignment statement
like below:
o a = b= c= expression
o This kind of expression is executed from right to left.
52
Compound Statements
Selection
Types:
1. Two-way
53
2. N-way
General form:
If control_expression
then clause
else clause
Control Expression
Clause Form
In most contemporary languages, the then and else clauses appear as either
single or compound statements.
54
o Pascal, C, C++ and Java make the interpretation that the else is
associated with the innermost if; when the other interpretation is
required, a compound statement must be used.
In cases when we have nested ifs, it is usually better for nested if statement
to follow the else.
The multiple selection construct allows the selection of one of any number
of statements or statement group.
Example
The general form of C multiple construct, switch, which is also part of C++ and
Java is as below:
switch(expression)
{
case constant_expression_1: statement 1;
…
case constant-expression_n; statement n;
default: statement_n+1;
}
Iterative Statements
While Statement
55
A variant of the while loop allows the statement to be executed at least once
by placing the test at the end as in:
do statement-sequence while ( C) ;
Pascal form loops until the condition is true i.e repeat statement-sequence
until C
for Statement
C, C++ and Java have a very complex for statement with the general form:
for (e1;e2;e3) S
where each of the expressions e1, e2, e3 are optional.
The form is equivalent to:
e1;
while (e2)
{
S;
e3;
}
Ada states that control variable only has scope within the for statement, but
not available outside.
C++ and Java allow control variable to be declared within the for loop as in:
for (int j=o;j<n;j++) S
The j variable can also be declared within the block where the for loop
appears.
56
Pascal, Ada, C and C++ allow enumeration types to be used as control variable
as below:
enum Month=(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec);
for (j=Jan; j<=Dec; j++)
goto Statement
Exception Handling
try
{
…
if ( a[i] < 0)…
…
57
}catch (ArrayIndexOutofBoundException e) { …}
PHASES OF COMPILER
58
Lexical Analysis:
• The first phase of scanner works as a text scanner.
• This phase scans the source code as a stream of characters and
converts it into meaningful lexemes, each of which corresponds to a
symbol in the programming language, e.g., a variable name, keyword or
number.
Syntax Analysis
• This phase takes the list of tokens produced by the lexical analysis and
arranges these in a tree-structure (called the syntax tree) that reflects
the structure of the program.
• This phase is often called parsing.
59
LEXICAL ANALYSIS
60
* Mult_op
count Identifier
+ Plus-op
17 Int_literal
; Semicolon
• In C++ the variable declaration line: int value = 100; contains the
tokens:
int (keyword), value (identifier), = (operator), 100 (constant) and ;
(symbol).
• In programming language, keywords, constants, identifiers, strings,
numbers, operators, and punctuations symbols can be considered as
tokens.
• The predefined rules for every lexeme to be identified as a valid token are
defined by grammar rules, by means of a pattern/specification.
• Lexers can be constructed by hand, but are normally constructed
by lexer generators, which transform human-readable
patterns/specifications of tokens and white-space into efficient programs.
• For lexical analysis, specifications are traditionally written using regular
expressions: an algebraic notation for describing sets of strings.
• The generated lexers are in a class of extremely simple programs called
finite automata.
SPECIFICATIONS OF TOKENS
• Let us understand how the language theory undertakes the following
terms:
Alphabets
• Any finite set of symbols {0,1} is a set of binary alphabets.
• {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets.
• {a-z, A-Z} is a set of English language alphabets.
Strings
• Any finite sequence of alphabets is called a string.
• Length of the string is the total number of occurrence of alphabets,
e.g., the length of the string tutor is 5 and is denoted by |tutor| = 5.
• A string having no alphabets, i.e. a string of zero length is known as an
empty string and is denoted by ε (epsilon).
61
Special Symbols
• A typical high-level language contains the following symbols:
Assignment =
Preprocessor #
SYNTAX ANALYSIS
62
CFG is a superset of Regular Grammar, as depicted below:
Context-Free Grammar
63
The strings are derived from the start symbol by repeatedly replacing a
non-terminal (initially the start symbol) by the right side of a production,
for that non-terminal.
Example
10101 Σ = { 0, 1 }
Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer in the
form of token streams.
The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code. The output of this
phase is a parse tree.
64
The parser accomplishes two tasks, i.e., parsing the code, looking for
errors, and generating a parse tree as the output of the phase.
Parsers are expected to parse the whole code even if some errors exist in
the program.
Derivation
A derivation is basically a sequence of production rules, in order to get the
input string.
During parsing, we take two decisions for some sentential form of input:
i. Deciding the non-terminal which is to be replaced.
ii. Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two
options:
i. Left-most derivation
ii. Right-most derivation
Left-most Derivation
In the leftmost derivation, the input is scanned and replaced with the production
rule from left to right. So in leftmost derivation, we read the input string from left
to right.
If the sentential form of an input is scanned and replaced from left to right, it is
called left-most derivation.
The sentential form derived by the left-most derivation is called the left-sentential
form.
65
Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is
known as right-most derivation.
The sentential form derived from the right-most derivation is called the right-
sentential form.
Parse Tree
In a parse tree:
All leaf nodes are terminals.
All interior nodes are non-terminals.
In-order traversal gives original input string.
The deepest sub-tree is traversed first, therefore the operator in that sub-tree gets
precedence over the operator which is in the parent nodes
The process of deriving the input string by expanding the leftmost non-terminal
S -> aS|ϵ
To generate “aaa”
S -> aS
-> aaS (Using S -> aS)
-> aaaS (Using S -> aS)
-> aaaϵ (Using S -> ϵ)
-> aaa
66
Leftmost Tree
Rightmost Derivation
The process of deriving the input string by expanding the right non-terminal
Example
S -> aSS|aS|ϵ
To generate “aaa”
S -> aSS
-> aSaS (Using S -> aS)
-> aSaaS (Using S -> aS)
-> aSaaϵ (Using S -> ϵ)
-> aaa (Using S -> ϵ)
67
Rightmost Tree
References
68