CSC405 Org. of Prog Lang
CSC405 Org. of Prog Lang
Course Content: Language definition structure. Data types and structures, review of
basic types including lists and trees, control structure and data flow, run-time
consideration, interpretative languages, lexical analysis and parsing.
Lesson 1:
Language definition and structure.
A programming language is a notation for writing programs, which are specifications
of a computation or algorithm. It is a formal language, which comprises a set of
instructions used to produce various kinds of output. Programming languages are
used in computer programming to create programs that implement
specific algorithms. It is a language intended to be used by a person to express a
process by which a computer can solve a problem. A programming language is an
artificial language designed to express computations that can be performed by a
machine, particularly a computer. Programming languages can be used to create
programs that control the behavior of a machine and/or to express algorithms
precisely.
A programming language is usually split into the two components of syntax (form) and
semantics (meaning). Some languages are defined by a specification document (for
example, the C programming language is specified by an ISO Standard), while other
languages, such as Perl, have a dominant implementation that is used as a
reference.
Language: must help us to write good programs, if a program is good, it will be easy
to read, understand, and easy to modify.
1
Languages can be of any four programming language paradigms.
1. Imperative
This is designed around the Von Neumann architecture. Computation is performed
through statements that change a program’s state. Central features are variables,
assignment statements and iteration, sequence of commands, explicit state update
via assignment. The imperative paradigm most closely resembles the actual machine
itself, so the programmer is much closer to the machine; because of such closeness,
the imperative paradigm was the only one efficient enough for widespread use until
recently. Examples of such languages are Fortran, Algol, Pascal, c/c++, Java, Perl,
Javascript, Visual BASIC.NET.
2. Functional
Here, the main means of making computations is by applying functions to parameters.
The Functional Programming paradigm views all subprograms as functions in the
mathematical sense-informally, they take in arguments and return a single solution.
The solution returned is based entirely on the input, and the time at which a function
is called has no relevance. Examples are LISP, Scheme, ML, Haskell. It may also
include OO (Object Oriented) concepts.
3. Logic
This is Rule-based (rules are specified in no particular order). Computations here are
made through a logical inference process. The Logical Paradigm takes a declarative
approach to problem-solving. Various logical assertions about a situation are made,
establishing all known facts. Then queries are made. The role of the computer
becomes maintaining data and logical deduction.
Examples are PROLOG and CLIPS. This may also include OO concepts.
3
control structures, and a natural syntax to facilitate the users to code their problem
easily and efficiently.
Abstraction:- Abstraction means the ability to define and then use complicated
structures or operations in ways that allow many of the details to be ignored. The
degree of abstraction allowed by a programming language directly effects its writ
ability. Object oriented language support high degree of abstraction. Hence, writing
programs in object oriented language is much easier. Object oriented language also
support re usability of program segments due to this feature.
Efficiency :- Programs written in a good programming language are efficiently
translated into machine code, are efficiently executed, and acquire as little space in
the memory as possible. That is a good programming language is supported with a
good language translator which gives due consideration to space and time efficiency.
Structured:- Structured means that the language should have necessary features to
allow its users to write their programs based on the concepts of structured
programming. This property of a moreover, it forces a programmer to look at a
problem in a logical way, so that fewer errors are created while writing a program for
the problem.
Compactness :- In a good programming language, programmers should be able to
express intended operations concisely. A verbose language is generally not liked by
programmers, because they need to write too much.
Locality:- A good programming language should be such that while writing a
programmer concentrate almost solely on the part of the program around the
statement currently being worked with.
History and future of programming languages
The first programming languages developed in 1950 (low level languages)
Assembly and machine language
— tedious to write programs
— difficult to understand and debug
Continuously developed and improved: i.e. the object concept (1980), libraries and
scripting (1990), system independence (late 1990).
The abstraction of language increased continuously from a very low level (i.e.
assembly language) to a very high lev
4
el (i.e. XML).
Higher level language
— language closer to problem description
— programs portable
— exchange software
— programs easier to understand
5
• Once a thorough understanding of the fundamental concepts of
languages is acquired, it becomes easier to see how concepts are
incorporated into the design of the language being learned.
SYNTAX
A programming language's surface form is known as its syntax. The syntax of a
language describes the possible combinations of symbols that form a syntactically
correct program. The meaning given to a combination of symbols is handled by
semantics (either formal or hard-coded in a reference implementation).
Semantics
The term semantics refers to the meaning of languages, as opposed to their form
(syntax).
Static semantics
The static semantics defines restrictions on the structure of valid texts that are hard or
impossible to express in standard syntactic formalisms. For compiled languages,
static semantics essentially include those semantic rules that can be checked at
compile time. Examples include checking that every identifier is declared before it is
used (in languages that require such declarations) or that the labels on the arms of a
case statement are distinct. Many important restrictions of this type, like checking that
identifiers are used in the appropriate context (e.g. not adding an integer to a function
8
name), or that subroutine calls have the appropriate number and type of arguments,
can be enforced by defining them as rules in a logic called a type system. Other forms
of static analyses like data flow analysis may also be part of static semantics. Newer
programming languages like Java and C# have definite assignment analysis, a form
of data flow analysis, as part of their static semantics.
Dynamic semantics
Once data has been specified, the machine must be instructed to perform operations
on the data. For example, the semantics may define the strategy by which
expressions are evaluated to values, or the manner in which control structures
conditionally execute statements. The dynamic semantics (also known as execution
semantics) of a language defines how and when the various constructs of a language
should produce a program behaviour. There are many ways of defining execution
semantics. Natural language is often used to specify the execution semantics of
languages commonly used in practice. A significant amount of academic research
went into formal semantics of programming languages, which allow execution
semantics to be specified in a formal manner. Results from this field of research have
seen limited application to programming language design and implementation outside
academia.
Syntactic ambiguity
Syntactic ambiguity is a property of sentences which may be reasonably interpreted in
more than one way, or reasonably interpreted to mean more than one thing.
Ambiguity may or may not involve one word having two parts of speech or
homonyms.
Syntactic ambiguity arises not from the range of meanings of single words, but from
the relationship between the words and clauses of a sentence, and the sentence
structure implied thereby. When a reader can reasonably interpret the same sentence
as having more than one possible structure, the text is equivocal and meets the
definition of syntactic ambiguity.
9
Operator Precedence
When several operations occur in an expression, each part is evaluated and resolved
in a predetermined order called operator precedence. Parentheses can be used to
override the order of precedence and force some parts of an expression to be
evaluated before other parts. Operations within parentheses are always performed
before those outside. Within parentheses, however, normal operator precedence is
maintained.
When expressions contain operators from more than one category, arithmetic
operators are evaluated first, comparison operators are evaluated next, and logical
operators are evaluated last. Comparison operators all have equal precedence; that
is, they are evaluated in the left-to-right order in which they appear. Arithmetic and
logical operators are evaluated in the following order of precedence:
10
The string concatenation operator (&) is not an arithmetic operator, but in precedence
it does fall after all arithmetic operators and before all comparison operators. The Is
operator is an object reference comparison operator. It does not compare objects or
their values; it checks only to determine if two object references refer to the same
object.
Parsing
In linguistics, parsing is the process of analyzing a text, made of a sequence of tokens
(for example, words), to determine its grammatical structure with respect to a given
(more or less) formal grammar. Parsing can also be used as a linguistic term,
especially in reference to how phrases are divided up in garden path sentences.
Parser
In computing, a parser is one of the components in an interpreter or compiler, which
checks for correct syntax and builds a data structure (often some kind of parse tree,
abstract syntax tree or other hierarchical structure) implicit in the input tokens. The
parser often uses a separate lexical analyser to create tokens from the sequence of
input characters. Parsers may be programmed by hand or may be
(semi-)automatically generated (in some programming languages) by a tool.
Overview of process
Types of parser
11
The task of the parser is essentially to determine if and how the input can be derived
from the start symbol of the grammar. This can be done in essentially two ways:
Bottom-up parsing - A parser can start with the input and attempt to rewrite it to
the start symbol. Intuitively, the parser attempts to locate the most basic elements,
then the elements containing these, and so on. LR parsers are examples of
bottom-up parsers. Another term used for this type of parser is Shift-Reduce
parsing.
Variables
It is a symbolic name given to some known or unknown quantity or information, for the
purpose of allowing the name to be used independently of the information it
represents. A variable name in computer source code is usually associated with a
data storage location and thus also its contents.
Compilers have to replace variables' symbolic names with the actual locations of the
data. While the variable name, type, and location generally remain fixed, the data
stored in the location may get altered during program execution.
Naming conventions
Unlike their mathematical counterparts, programming variables and constants
commonly take multiple-character names, e.g. COST or total. Single-character names
are most commonly used only for auxiliary variables; for instance, i, j, k for array index
variables.
12
Some naming conventions are enforced at the language level as part of the language
syntax and involve the format of valid identifiers. In almost all languages, variable
names cannot start with a digit (0-9) and cannot contain whitespace characters.
Whether, which, and when punctuation marks are permitted in variable names varies
from language to language; many languages only permit the underscore (_) in
variable names and forbid all other punctuation. In some programming languages,
specific (often punctuation) characters (known as sigils) are prefixed or appended to
variable identifiers to indicate the variable's type.
Binding
Binding describes how a variable is created and used (or "bound") by and within the
given program, and, possibly, by other programs, as well.
Scope
The scope of a variable describes where in a program's text, the variable may be
used, while the extent (or lifetime) describes when in a program's execution a variable
has a (meaningful) value. Scope is a lexical aspect of a variable. Most languages
13
define a specific scope for each variable (as well as any other named entity), which
may differ within a given program. The scope of a variable is the portion of the
program code for which the variable's name has meaning and for which the variable
is said to be "visible". It is also of two type; static and dynamic scope.
• Static Scope: The static scope of a variable is the most immediately enclosing
block, excluding any enclosed blocks where the variable has been re-declared.
The static scope of a variable in a program can be determined by simply
studying the text of the program. Static scope is not affected by the order in
which procedures are called during the execution of the program.
• Dynamic Scope: The dynamic scope of a variable extends to all the procedures
called thereafter during program execution, until the first procedure to be called
that re-declares the variable.
Referencing
The referencing environment is the collection of variable which can be used. In a
static scoped language, one can only reference the variables in the static reference
environment. A function in a static scoped language does have dynamic ancestors
(i.e. its callers), but cannot reference any variables declared in that ancestor.
Lesson four
Names, Bindings, Type Checking, and Scopes
Introduction
• Imperative languages are abstractions of von Neumann architecture – Memory: stores
both instructions and data
– Processor: provides operations for modifying the contents of memory
• Variables characterized by attributes
– Type: to design, must consider scope, lifetime, type checking, initialization, and
type compatibility
Names
Design issues for names:
• Maximum length?
• Are connector characters allowed?
14
• Are names case sensitive?
• Are special words reserved words or keywords?
Name Forms
A name is a string of characters used to identify some entity in a program.
If too short, they cannot be connotative Language examples:
• FORTRAN I: maximum 6
• COBOL: maximum 30
• FORTRAN 90 and ANSI C: maximum 31
• Ada and Java: no limit, and all are significant
• C++: no limit, but implementers often impose a length limitation because they
do not want the symbol table in which identifiers are stored during compilation
to be too large and also to simplify the maintenance of that table.
Names in most programming languages have the same form: a letter followed by a
string consisting of letters, digits, and (_).
Although the use of the _ was widely used in the 70s and 80s, that practice is far less
popular.
C-based languages (C, C++, Java, and C#), replaced the _ by the “camel” notation, as in
myStack.
Prior to Fortran 90, the following two names are equivalent:
• Case sensitivity
– Disadvantage: readability (names that look alike are different)
• worse in C++ and Java because predefined names are mixed case (e.g.
IndexOutOfBoundsException)
• In C, however, exclusive use of lowercase for names.
– C, C++, and Java names are case sensitive rose, Rose, ROSE are distinct names
“What about Readability”
Special words
• An aid to readability; used to delimit or separate statement clauses • A keyword is a
word that is special only in certain contexts.
• Ex: Fortran
• Disadvantage: poor readability. Compilers and users must recognize the difference.
• A reserved word is a special word that cannot be used as a user-defined name.
• As a language design choice, reserved words are better than keywords.
• Ex: In Fortran, one could have the statements
Integer Real // keyword “Integer” and variable “Real”
15
Real Integer // keyword “Real” and variable “Integer”
Variables
• A variable is an abstraction of a memory cell(s).
• Variables can be characterized as a sextuple of attributes:
o Name
o Address
o Value
o Type
o Lifetime
o Scope
Name
- Not all variables have names: Anonymous, heap-dynamic variables
Address
• The memory address with which it is associated
• A variable name may have different addresses at different places and at different times
during execution.
// sum in sub1
• The address of a variable is sometimes called its l-value because that is what is required
when a variable appears in the left side of an assignment statement.
Aliases
• If two variable names can be used to access the same memory location, they are
called aliases
• Aliases are created via pointers, reference variables, C and C++ unions.
• Aliases are harmful to readability (program readers must remember all of them)
Type
• Determines the range of values of variables and the set of operations that are defined
for values of that type; in the case of floating point, type also determines the precision.
• For example, the int type in Java specifies a value range of -2147483648 to
2147483647, and arithmetic operations for addition, subtraction, multiplication,
division, and modulus.
Value
• The value of a variable is the contents of the memory cell or cells associated with the
variable.
• Abstract memory cell - the physical cell or collection of cells associated with a variable.
16
• A variable’s value is sometimes called its r-value because that is what is required when
a variable appears in the right side of an assignment statement.
• A binding is static if it first occurs before run time and remains unchanged throughout
program execution.
• A binding is dynamic if it first occurs during execution or can change during
execution of the program.
Type Bindings
• If static, the type may be specified by either an explicit or an implicit declaration.
Variable Declarations
17
– Disadvantage: reliability suffers because they prevent the compilation process
from detecting some typographical and programming errors.
– In Fortran, vars that are accidentally left undeclared are given default types and
unexpected attributes, which could cause subtle errors that, are difficult to
diagnose.
• Less trouble with Perl: Names that begin with $ is a scalar, if a name begins with @
it is an array, if it begins with %, it is a hash structure.
– In this scenario, the names @apple and %apple are unrelated.
• In C and C++, one must distinguish between declarations and definitions.
– Declarations specify types and other attributes but do no cause allocation of
storage. Provides the type of a var defined external to a function that is used in
the function.
– Definitions specify attributes and cause storage allocation.
– Type error detection by the compiler is difficult because any variable can
be assigned a value of any type.
• Incorrect types of right sides of assignments are not detected as errors; rather, the
type of the left side is simply changed to the incorrect type.
• Ex:
i, x -> Integer
y -> floating-point array
i = x -> what the user meant to type
i = y -> what the user typed instead
18
Type Inference (ML, Miranda, and Haskell)
• Rather than by assignment statement, types are determined from the context of the
reference.
• Ex:
fun circumf(r) = 3.14159 * r * r;
The argument and functional value are inferred to
be real.
fun times10(x) =10*x;
The argument and functional value are inferred to
be int.
Storage Bindings & Lifetime
– Allocation - getting a cell from some pool of available cells.
– Deallocation - putting a cell back into the pool.
– The lifetime of a variable is the time during which it is bound to a particular
memory cell. So the lifetime of a var begins when it is bound to a specific cell
and ends when it is unbound from that cell.
– Categories of variables by lifetimes: static, stack-dynamic, explicit heap-
dynamic, and implicit heap-dynamic
Static Variables:
– bound to memory cells before execution begins and remains bound to the same
memory cell throughout execution.
– e.g. all FORTRAN 77 variables, C static variables.
– Advantages:
• Efficiency: (direct addressing): All addressing of static vars can be
direct. No run-time overhead is incurred for allocating and deallocating
vars.
• History-sensitive: have vars retain their values between separate
executions of the subprogram.
– Disadvantage:
• Storage cannot be shared among variables.
• Ex: if two large arrays are used by two subprograms, which are never
active at the same time, they cannot share the same storage for their
arrays.
Stack-dynamic Variables:
– Storage bindings are created for variables when their declaration statements are
elaborated, but whose types are statically bound.
– Elaboration of such a declaration refers to the storage allocation and binding
process indicated by the declaration, which takes place when execution reaches
the code to which the declaration is attached.
– Ex:
• The variable declarations that appear at the beginning of a Java method
are elaborated when the method is invoked and the variables defined by
19
those declarations are deallocated when the method completes its
execution.
– Stack-dynamic variables are allocated from the run-time stack.
– If scalar, all attributes except address are statically bound.
– Ex:
• Local variables in C subprograms and Java methods. – Advantages:
• Allows recursion: each active copy of the recursive subprogram has its
own version of the local variables.
• In the absence of recursion it conserves storage b/c all subprograms share
the same memory space for their locals.
– Disadvantages:
• Overhead of allocation and deallocation.
• Subprograms cannot be history sensitive.
• Inefficient references (indirect addressing) is required b/c the place in the
stack where a particular var will reside can only be determined during
execution.
– In Java, C++, and C#, variables defined in methods are by default stack-
dynamic.
int *intnode;
…
intnode = new int; // allocates an int cell
…
delete intnode; // deallocates the cell to which //
intnode points
Type Checking
• Type checking is the activity of ensuring that the operands of an operator are of
compatible types.
• A compatible type is one that is either legal for the operator, or is allowed under language
rules to be implicitly converted, by compiler-generated code, to a legal type.
• This automatic conversion is called a coercion.
• Ex: an int var and a float var are added in Java, the value of the int var is coerced to float
and a floating-point is performed.
• A type error is the application of an operator to an operand of an inappropriate type.
• Ex: in C, if an int value was passed to a function that expected a float value, a type error
would occur (compilers didn’t check the types of parameters)
• If all type bindings are static, nearly all type checking can be static.
• If type bindings are dynamic, type checking must be dynamic and done at run-time.
Strong Typing
• A programming language is strongly typed if type errors are always detected. It requires
that the types of all operands can be determined, either at compile time or run time.
• Advantage of strong typing: allows the detection of the misuses of variables that result in
type errors.
• Java and C# are strongly typed. Types can be explicitly cast, which would result in type
error. However, there are no implicit ways type errors can go undetected.
• The coercion rules of a language have an important effect on the value of type checking.
• Coercion results in a loss of part of the reason of strong typing – error detection.
• Ex:
int a, b; float d;
a + d; // the programmer meant a + b, however
– The compiler would not detect this error. Var a would be coerced to float.
21
Scope
– The scope of a var is the range of statements in which the var is visible.
– A var is visible in a statement if it can be referenced in that statement.
– Local var is local in a program unit or block if it is declared there.
– Non-local var of a program unit or block are those that are visible within the program unit
or block but are not declared there.
Static Scope
– Binding names to non-local vars is called static scoping.
– There are two categories of static scoped languages:
Nested Subprograms.
Subprograms that can’t be nested.
– Ada, and JavaScript allow nested subprogram, but the C-based languages do not.
– When a compiler for static-scoped language finds a reference to a var, the attributes of the
var are determined by finding the statement in which it was declared.
– Ex: Suppose a reference is made to a var x in subprogram Sub1. The correct declaration is
found by first searching the declarations of subprogram Sub1.
– If no declaration is found for the var there, the search continues in the declarations of the
subprogram that declared subprogram Sub1, which is called its static parent.
– If a declaration of x is not found there, the search continues to the next larger enclosing
unit (the unit that declared Sub1’s parent), and so forth, until a declaration for x is found or
the largest unit’s declarations have been searched without success. an undeclared var
error has been detected.
– The static parent of subprogram Sub1, and its static parent, and so forth up to and including
the main program, are called the static ancestors of Sub1. Ex: Ada procedure:
Procedure Big is
X : Integer;
Procedure Sub1 is
Begin -- of Sub1
…X…
end; -- of Sub1 Procedure
Sub2 is
X Integer;
Begin -- of Sub2 …
X…
end; -- of
Sub2
Begin -- of Big
…
end; -- of Big
– Under static scoping, the reference to the var X in Sub1 is to the X declared in the
procedure Big.
– This is true b/c the search for X begins in the procedure in which the reference occurs,
Sub1, but no declaration for X is found there.
22
– The search thus continues in the static parent of Sub1, Big, where the declaration of X is
found.
– Ex: Skeletal C#
– The reference to count in the while loop is to that loop’s local count. The count of sub is
hidden from the code inside the while loop.
– A declaration for a var effectively hides any declaration of a var with the same name in a
larger enclosing scope.
– C++ and Ada allow access to these "hidden" variables
In Ada: Main.X
In C++: class_name::name
Blocks
– Allows a section of code to have its own local vars whose scope is minimized.
– Such vars are stack dynamic, so they have their storage allocated when the section is
entered and deallocated when the section is exited.
– From ALGOL 60:
– Ex:
C and C++: for (...)
{ int index;
...
}
Ada:
declare LCL : FLOAT; begin
...
end
Dynamic Scope
The scope of variables in APL, SNOBOL4, and the early versions of LISP is dynamic.
Based on calling sequences of program units, not their textual layout (temporal versus
spatial) and thus the scope is determined at run time.
References to variables are connected to declarations by searching back through the chain
of subprogram calls that forced execution to this point.
Ex:
23
Procedure Big is
X : Integer;
Procedure Sub1 is
Begin -- of Sub1
…X…
end; -- of Sub1 Procedure
Sub2 is
X Integer;
Begin -- of Sub2
…X…
end; -- of
Sub2
Begin -- of Big
…
end; -- of Big
Big calls Sub1 o The dynamic parent of Sub1 is Big. The reference is to the X in Big.
Big calls Sub2 and Sub2 calls Sub1 o The search proceeds from the local procedure, Sub1,
to its caller, Sub2, where a declaration of X is found.
Note that if static scoping was used, in either calling sequence the reference to X in Sub1
would be to Big’s X.
Scope and Lifetime
Ex: void printheader()
{
…
} /* end of printheader */
void compute()
{ int sum;
…
printheader();
} /* end of compute */
Referencing environment
It is the collection of all names that are visible in the statement.
• In a static-scoped language, it is the local variables plus all of the visible variables in
all of the enclosing scopes.
• The referencing environment of a statement is needed while that statement is being
compiled, so code and data structures can be created to allow references to non-local
vars in both static and dynamic scoped languages.
• A subprogram is active if its execution has begun but has not yet terminated.
24
• In a dynamic-scoped language, the referencing environment is the local variables plus
all visible variables in all active subprograms.
• Ex, Ada, static-scoped language procedure Example is
A, B : Integer;
…
procedure Sub1 is X,
Y : Integer; begin --
of Sub1
… -> 1
end -- of Sub1 procedure
Sub2 is
X : Integer;
…
procedure Sub3 is
X : Integer; begin --
of Sub3
… -> 2
end; -- of Sub3 begin -- of
Sub2
… -> 3
end;{ Sub2}
begin
… -> 4
end; {Example}
void sub1( )
{
int a, b;
… 1
} /* end of sub1 */
void sub2( )
{
int b, c;
25
… 2
sub1; } /* end of sub2
*/
void main ( )
{
int c, d;
… 3
sub2( ); } /* end of main
*/
Variable Initialization
• The binding of a variable to a value at the time it is bound to storage is called
initialization.
• Initialization is often done on the declaration statement.
• Ex, Java
int sum = 0;
Data Type
Introduction
• A data type defines a collection of data objects and a set of predefined operations on
those objects.
• Computer programs produce results by manipulating data.
26
• ALGOL 68 provided a few basic types and a few flexible structure-defining operators that
allow a programmer to design a data structure for each need.
• A descriptor is the collection of the attributes of a variable.
• In an implementation a descriptor is a collection of memory cells that store variable
attributes.
• If the attributes are static, descriptor are required only at compile time.
• They are built by the compiler, usually as a part of the symbol table, and are used during
compilation.
• For dynamic attributes, part or all of the descriptor must be maintained during execution.
• Descriptors are used for type checking and by allocation and deallocation operations.
Numeric Types
1. Integer
– Almost always an exact reflection of the hardware, so the mapping is trivial.
– There may be as many as eight different integer types in a language.
– Java has four: byte, short, int, and long.
– Integer types are supported by the hardware.
2. Floating-point
– Model real numbers, but only as approximations for most real values.
– On most computers, floating-point numbers are stored in binary, which exacerbates the
problem.
– Another problem is the loss of accuracy through arithmetic operations.
– Languages for scientific use support at least two floating-point types; sometimes more
(e.g. float, and double.)
– The collection of values that can be represented by a floating-point type is defined in
terms of precision and range.
– Precision: is the accuracy of the fractional part of a value, measured as the number of
bits. Figure below shows single and double precision.
– Range: is the range of fractions and exponents.
27
3. Decimal
– Most larger computers that are designed to support business applications have hardware
support for decimal data types.
– Decimal types store a fixed number of decimal digits, with the decimal point at a fixed
position in the value.
– These are the primary data types for business data processing and are therefore essential
to COBOL.
– Advantage: accuracy of decimal values.
– Disadvantages: limited range since no exponents are allowed, and its representation
wastes memory.
Boolean Types
– Introduced by ALGOL 60.
– They are used to represent switched and flags in programs.
– The use of Booleans enhances readability.
– One popular exception is C89, in which numeric expressions are used as conditionals.
In such expressions, all operands with nonzero values are considered true, and zero is
considered false.
Character Types
– Char types are stored as numeric codings (ASCII / Unicode).
– Traditionally, the most commonly used coding was the 8-bit code ASCII (American
Standard Code for Information Interchange).
– A 16-bit character set named Unicode has been developed as an alternative.
– Java was the first widely used language to use the Unicode character set. Since then, it
has found its way into JavaScript and C#.
– In this example, str is a char pointer set to point at the string of characters, apples0,
where 0 is the null char.
28
String Typical Operations: –
Assignment
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
– Some of the most commonly used library functions for character strings in C and
C++ are o strcpy: copy strings
o strcat: catenates on given string onto another
o strcmp: lexicographically compares (the order of their codes) two strings
o strlen: returns the number of characters, not counting the null – In Java, strings are
supported as a primitive type by String class
String Length Options
– Static Length String: The length can be static and set when the string is created. This
is the choice for the immutable objects of Java’s String class as well as similar classes
in the C++ standard class library and the .NET class library available to C#.
– Limited Dynamic Length Strings: allow strings to have varying length up to a
declared and fixed maximum set by the variable’s definition, as exemplified by the
strings in C.
– Dynamic Length Strings: Allows strings various length with no maximum. Requires
the overhead of dynamic storage allocation and deallocation but provides flexibility.
Ex: Perl and JavaScript.
Evaluation
– Aid to writability.
– As a primitive type with static length, they are inexpensive to provide--why not have
them?
– Dynamic length is nice, but is it worth the expense?
– Limited dynamic length Strings - may need a run-time descriptor for length to
store both the fixed maximum length and the current length (but not in C and C++
because the end of a string is marked with the null character).
29
Run-time descriptor for limited dynamic strings
– An ordinal type is one in which the range of possible values can be easily associated
with the set of positive integers
– Examples of primitive ordinal types in Java
– integer
– char
– boolean
– In some languages, users can define two kinds of ordinal types: enumeration and
subrange.
Enumeration Types
– All possible values, which are named constants, are provided in the definition
– C++ example
– The enumeration constants are typically implicitly assigned the integer values, 0, 1, …,
but can be explicitly assigned any integer literal.
– Java does not include an enumeration type, presumably because they can be represented
as data classes. For example,
Subrange Types
– An ordered contiguous subsequence of an ordinal type
– Example: 12..18 is a subrange of integer type
– Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekday;
Day2 := Day1;
Array Types
An array is an aggregate of homogeneous data elements in which an individual element is
identified by its position in the aggregate, relative to the first element.
A reference to an array element in a program often includes one or more non-constant
subscripts.
Such references require a run-time calculation to determine the memory location being
referenced.
• Ex: Ada
Because ( ) are used for both subprogram parameters and array subscripts in Ada, this
results in reduced readability.
31
Subscript Bindings and Array Categories
• The binding of subscript type to an array variable is usually static, but the subscript
value ranges are sometimes dynamically bound.
• In C-based languages, the lower bound of all index ranges is fixed at 0; Fortran 95, it
defaults to 1.
1. A static array is one in which the subscript ranges are statically bound and storage
allocation is static (done before run time).
• Advantages: efficiency “No allocation & deallocation.”
• Ex:
Arrays declared in C & C++ function that includes the static modifier are static.
2. A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but
the allocation is done at elaboration time during execution.
• Advantages: Space efficiency. A large array in one subprogram can use the same space
as a large array in different subprograms.
• Ex:
Arrays declared in C & C++ function without the static modifier are fixed stack-
dynamic arrays.
3. A stack-dynamic array is one in which the subscript ranges are dynamically bound, and
the storage allocation is dynamic “during execution.” Once bound they remain fixed
during the lifetime of the variable.
• Advantages: Flexibility. The size of the array is not known until the array is about to be
used.
• Ex:
Ada arrays can be stack dynamic:
Get (List_Len);
declare
List : array (1..List_Len) of
Integer; begin . . . end;
The user inputs the number of desired elements for array List. The elements are then
dynamically allocated when execution reaches the declare block. When execution
reaches the end of the block, the array is deallocated.
4. A fixed heap-dynamic array is one in which the subscript ranges are dynamically bound,
and the storage allocation is dynamic, but they are both fixed after storage is allocated.
• The bindings are done when the user program requests them, rather than at elaboration
time and the storage is allocated on the heap, rather than the stack. • Ex:
C & C++ also provide fixed heap-dynamic arrays. The function malloc and free
are used in C. The operations new and delete are used in C++.
In Java all arrays are fixed heap dynamic arrays. Once created, they keep the same
subscript ranges and storage.
32
5. A heap-dynamic array is one in which the subscript ranges are dynamically bound, and
the storage allocation is dynamic, and can change any number of times during the array’s
lifetime.
• Advantages: Flexibility. Arrays can grow and shrink during program execution as the
need for space changes.
• Ex:
C# provides heap-dynamic arrays using an array class ArrayList.
Array Initialization
Usually just a list of values that are put in the array in the order in which the array elements
are stored in memory.
Fortran uses the DATA statement, or put the values in / ... / on the declaration.
C and C++ - put the values in braces; let the compiler count them.
The array will have 8 elements because the null character is implicitly included by the
compiler.
In Java, the syntax to define and initialize an array of references to String objects.
33
String [ ] names = [“Bob”, “Jake”, “Debbie”];
Implementation of Arrays
Access function maps subscript expressions to an address in the array Access
function for single-dimensioned arrays:
3, 4, 7, 6, 2, 5, 1, 3, 8
o If the example matrix above were stored in column major, it would have the
following order in memory.
3, 6, 1, 4, 2, 3, 7, 5, 8
o In all cases, sequential access to matrix elements will be faster if they are accessed
in the order in which they are stored, because that will minimize the paging.
(Paging is the movement of blocks of information between disk and main memory.
The objective of paging is to keep the frequently needed parts of the program in
memory and the rest on disk.)
Locating an Element in a Multi-dimensioned Array (row major)
34
Associative Arrays
An associative array is an unordered collection of data elements that are indexed by an
equal number of values called keys.
So each element of an associative array is in fact a pair of entities, a key and a value.
Associative arrays are supported by the standard class libraries of Java and C++ and Perl.
Example: In Perl, associative arrays are often called hashes. Names begin with %; literals
are delimited by parentheses
%temps = ("Mon" => 77, "Tue" => 79, “Wed” => 65);
o Subscripting is done using braces and keys
$temps{"Wed"} = 83;
o Elements can be removed with delete
delete $temps{"Tue"};
@temps = ( );
Record Types
01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
35
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float; end
record;
Emp_Rec: Emp_Rec_Type;
References to Records
Most language use dot notation
Emp_Rec.Name
Fully qualified references must include all record names
Elliptical references allow leaving out record names as long as the reference is
unambiguous, for example in COBOL
Operations on Records
Assignment is very common if the types are identical
Ada allows record comparison
Ada records can be initialized with aggregate literals COBOL provides MOVE
CORRESPONDING
o Copies a field of the source record to the corresponding field in the target record
Unions Types
A union is a type whose variables are allowed to store different type values at different times
during execution.
36
Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case; end
record;
Pointer Operations
A pointer type usually includes two fundamental pointer operations, assignment and
dereferencing.
Assignment sets a pointer var’s value to some useful address.
Dereferencing takes a reference through one level of indirection.
In C++, dereferencing is explicitly specified with the (*) as a prefix unary operation.
If ptr is a pointer var with the value 7080, and the cell whose address is 7080 has the
value 206, then the assignment
j = *ptr
sets j to 206.
37
The assignment operation j = *ptr
Pointer Problems
1. Dangling pointers (dangerous)
– A pointer points to a heap-dynamic variable that has been deallocated.
– Dangling pointers are dangerous for the following reasons:
– The location being pointed to may have been allocated to some new heap-
dynamic var. If the new var is not the same type as the old one, type checks of
uses of the dangling pointer are invalid.
– Even if the new one is the same type, its new value will bear no relationship to
the old pointer’s dereferenced value.
– If the dangling pointer is used to change the heap-dynamic variable, the value of
the heap-dynamic variable will be destroyed.
– It is possible that the location now is being temporarily used by the storage
management system, possibly as a pointer in a chain of available blocks of
storage, thereby allowing a change to the location to cause the storage manager
to fail.
– The following sequence of operations creates a dangling pointer in many
languages:
38
Pointers in Ada
– Some dangling pointers are disallowed because dynamic objects can be automatically
de-allocated at the end of pointer's type scope
– The lost heap-dynamic variable problem is not eliminated by Ada
Pointers in Fortran 95
– Pointers point to heap and non-heap variables
– Implicit dereferencing
– Pointers can only point to variables that have the TARGET attribute – The TARGET
attribute is assigned in the declaration:
count = init;
int result = 0;
int &ref_result = result;
…
ref_ result = 100;
– In Java, reference variables are extended from their C++ form to one that allow them
to replace pointers entirely.
– The fundamental difference between C++ pointers and Java references is that C++
pointers refer to memory addresses, whereas Java references refer to class instances.
– Because Java class instances are implicitly deallocated (there is no explicit deallocation
operator), there cannot be a dangling reference.
– C# includes both the references of Java and the pointers of C++. However, the use of
pointers is strongly discouraged. In fact, any method that uses pointers must include the
unsafe modifier.
– Pointers can point at any variable regardless of when it was allocated
Lesson Six:
Arithmetic Expressions
Their evaluation was one of the motivations for the development of the first programming
languages.
40
Most of the characteristics of arithmetic expressions in programming languages were
inherited from conventions that had evolved in math.
Arithmetic expressions consist of operators, operands, parentheses, and function calls.
The operators can be unary, or binary. C-based languages include a ternary operator,
which has three operands (conditional expression).
The purpose of an arithmetic expression is to specify an arithmetic computation.
An implementation of such a computation must cause two actions: o Fetching the operands
from memory o Executing the arithmetic operations on those operands.
Design issues for arithmetic expressions:
1. What are the operator precedence rules?
2. What are the operator associativity rules?
3. What is the order of operand evaluation?
4. Are there restrictions on operand evaluation side effects?
5. Does the language allow user-defined operator overloading?
6. What mode mixing is allowed in expressions?
A + (- B) * C // is legal
A + - B * C // is illegal
2. Associativity
The operator associativity rules for expression evaluation define the order in which
adjacent operators with the same precedence level are evaluated. An operator can be
either left or right associative.
Typical associativity rules:
o Left to right, except **, which is right to left
o Sometimes unary operators associate right to left (e.g., FORTRAN) Ex: (Java)
a – b + c // left to right
41
Ex: (Fortran)
A ** B ** C // right to left
(A ** B) ** C // In Ada it must be
parenthesized
APL is different; all operators have equal precedence and all operators associate right
to left.
Ex:
A X B + C // A = 3, B = 4, C = 5 27
3. Parentheses
Programmers can alter the precedence and associativity rules by placing parentheses in
expressions.
A parenthesized part of an expression has precedence over its adjacent un-parenthesized
parts.
Ex:
(A + B) * C
4. Conditional Expressions
Sometimes if-then-else statements are used to perform a conditional expression
assignment.
if (count == 0) average =
0; else average = sum /
count;
42
Ex:
a + fun(a)
If fun does not have the side effect of changing a, then the order of evaluation of the
two operands, a and fun(a), has no effect on the value of the expression.
However, if fun changes a, there is an effect.
Ex:
Consider the following situation: fun returns the value of its argument divided by 2
and changes its parameter to have the value 20, and:
a = 10;
b = a + fun(a);
If the value of a is returned first (in the expression evaluation process), its value is
10 and the value of the expression is 15.
But if the second is evaluated first, then the value of the first operand is 20 and the
value of the expression is 25.
The following shows a C program which illustrate the same problem.
int a = 5; int
fun1() { a = 17;
return 3;
} void fun2() { a = a + fun1(); // C language a = 20;
Java a = 8
} void main() {
fun2();
}
The value computed for a in fun2 depends on the order of evaluation of the
operands in the expression a + fun1(). The value of a will be either 8 or 20.
Two possible solutions:
43
1. Write the language definition to disallow functional side effects o No two-
way parameters in functions. o No non-local references in functions.
o Advantage: it works!
o Disadvantage: Programmers want the flexibility of two-way parameters
(what about C?) and non-local references.
2. Write the language definition to demand that operand evaluation order be
fixed o Disadvantage: limits some compiler optimizations
Java guarantees that operands are evaluated in left-to-right order, eliminating this
problem. // C language a = 20; Java a = 8
Overloaded Operators
The use of an operator for more than one purpose is operator overloading.
Some are common (e.g., + for int and float).
Java uses + for addition and for string catenation. Some are potential trouble (e.g.,
& in C and C++)
x = &y // as binary operator bitwise logical
// AND, as unary it is the address of y
Coercion in Expressions
• A mixed-mode expression is one that has operands of different types.
• A coercion is an implicit type conversion.
• The disadvantage of coercions:
They decrease in the type error detection ability of the compiler
• In most languages, all numeric types are coerced in expressions, using widening
conversions
• Language are not in agreement on the issue of coercions in arithmetic expressions.
• Those against a broad range of coercions are concerned with the reliability problems
that can result from such coercions, because they eliminate the benefits of type
checking.
• Those who would rather include a wide range of coercions are more concerned with the
loss in flexibility that results from restrictions.
44
• The issue is whether programmers should be concerned with this category of errors or
whether the compiler should detect them.
• Java method Ex:
void mymethod() {
int a, b, c; float
d; … a = b * d;
…
}
Java:
Errors in Expressions
• Caused by:
– Inherent limitations of arithmetic e.g. division by zero
– Limitations of computer arithmetic e.g. overflow or underflow
• Floating-point overflow and underflow, and division by zero are examples of run-time
errors, which are sometimes called exceptions.
Relational and Boolean Expressions
• A relational operator: an operator that compares the values of its tow operands.
• Relational Expressions: two operands and one relational operator.
• The value of a relational expression is Boolean, unless it is not a type included in the
language.
– Use relational operators and operands of various types.
– Operator symbols used vary somewhat among languages (!=, /=, .NE., <>, #)
• The syntax of the relational operators available in some common languages is as
follows:
45
Languages
Equal = == .EQ. or ==
Not Equal /= != .NE. or <>
Greater than > > .GT. or >
Less than < < .LT. or <
Greater than or equal >= >= .GE. or >=
Less than or equal <= <= .LE. or >=
Boolean Expressions
• Operands are Boolean and the result is Boolean.
• Versions of C prior to C99 have no Boolean type; it uses int type with 0 for false and
nonzero for true.
• One odd characteristic of C’s expressions:
a < b < c is a legal expression, but the result is not what you might expect.
• The left most operator is evaluated first because the relational operators of C, are left
associative, producing either 0 or 1.
• Then this result is compared with var c. There is never a comparison between b and c.
Short Circuit Evaluation
So when a < 0, there is no need to evaluate b, the constant 10, the second relational
expression, or the && operation.
Unlike the case of arithmetic expressions, this shortcut can be easily discovered during
execution.
46
Short-circuit evaluation exposes the potential problem of side effects in expressions
Conditional Targets
Ex:
a = a + b
The syntax of assignment operators that is the catenation of the desired binary operator to
the = operator.
sum += value; ⇔ sum = sum + value;
47
sum = count ++; ⇔ sum = count; count = count + 1;
Assignment as an Expression
This design treats the assignment operator much like any other binary operator, except that
it has the side effect of changing its left operand.
Ex:
The assignment statement must be parenthesized because the precedence of the assignment
operator is lower than that of the relational operators.
Disadvantage: Another kind of expression side effect which leads to expressions that are
difficult to read and understand. For example
a = b + (c = d / b++) – 1
There is a loss of error detection in the C design of the assignment operation that frequently
leads to program errors.
if (x = y) …
instead of
if (x == y) …
Mixed-Mode Assignment
In FORTRAN, C, and C++, any numeric value can be assigned to any numeric scalar
variable; whatever conversion is necessary is done.
In Pascal, integers can be assigned to reals, but reals cannot be assigned to integers (the
programmer must specify whether the conversion from real to integer is truncated or
rounded.)
In Java, only widening assignment coercions are done.
In Ada, there is no assignment coercion.
In all languages that allow mixed-mode assignment, the coercion takes place only after the
right-side expression has been evaluated. For example, consider the following code:
48
int a, b;
float c;
…
c = a / b;
Because c is float, the values of a and b could be coerced to float before the division, which
could produce a different value for c than if the coercion were delayed (for example, if a
were 2 and b were 3).
Overloaded operators
In some programming languages an operator may be ad-hoc polymorphic, that is,
have definitions for more than one kind of data, (such as in Java where the + operator
is used both for the addition of numbers and for the concatenation of strings). Such
an operator is said to be overloaded. In languages that support operator overloading
by the programmer but have a limited set of operators, operator overloading is often
used to define customized uses for operators.
Short-Circuit Evaluation
Short-circuit evaluation, minimal evaluation, or McCarthy evaluation denotes the
semantics of some Boolean operators in some programming languages in which the
second argument is only executed or evaluated if the first argument does not suffice
to determine the value of the expression: when the first argument of the AND function
evaluates to false, the overall value must be false; and when the first argument of the
OR function evaluates to true, the overall value must be true. In some programming
languages (Lisp), the usual Boolean operators are short - circuit. In others (Java,
Ada), both short-circuit and standard Boolean operators are available.
49