0% found this document useful (0 votes)
31 views49 pages

CSC405 Org. of Prog Lang

The document outlines the course CSC 405 at Kaduna State University, focusing on the organization of programming languages, including their definitions, structures, paradigms, and evaluation criteria. It discusses the importance of programming languages in expressing algorithms and the characteristics that make a language effective, such as simplicity, clarity, and efficiency. Additionally, it covers the history of programming languages, syntax and semantics, and the significance of understanding programming language concepts for software development.

Uploaded by

Osuwa Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views49 pages

CSC405 Org. of Prog Lang

The document outlines the course CSC 405 at Kaduna State University, focusing on the organization of programming languages, including their definitions, structures, paradigms, and evaluation criteria. It discusses the importance of programming languages in expressing algorithms and the characteristics that make a language effective, such as simplicity, clarity, and efficiency. Additionally, it covers the history of programming languages, syntax and semantics, and the significance of understanding programming language concepts for software development.

Uploaded by

Osuwa Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

INSTITUTION: Kaduna State University, Kaduna

DEPARTMENT: Computer Science


COURSE CODE: CSC 405
COURSE TITLE: Organization of Programming Languages

Course Content: Language definition structure. Data types and structures, review of
basic types including lists and trees, control structure and data flow, run-time
consideration, interpretative languages, lexical analysis and parsing.

Lesson 1:
Language definition and structure.
A programming language is a notation for writing programs, which are specifications
of a computation or algorithm. It is a formal language, which comprises a set of
instructions used to produce various kinds of output. Programming languages are
used in computer programming to create programs that implement
specific algorithms. It is a language intended to be used by a person to express a
process by which a computer can solve a problem. A programming language is an
artificial language designed to express computations that can be performed by a
machine, particularly a computer. Programming languages can be used to create
programs that control the behavior of a machine and/or to express algorithms
precisely.

A programming language is usually split into the two components of syntax (form) and
semantics (meaning). Some languages are defined by a specification document (for
example, the C programming language is specified by an ISO Standard), while other
languages, such as Perl, have a dominant implementation that is used as a
reference.

Language: must help us to write good programs, if a program is good, it will be easy
to read, understand, and easy to modify.

1
Languages can be of any four programming language paradigms.
1. Imperative
This is designed around the Von Neumann architecture. Computation is performed
through statements that change a program’s state. Central features are variables,
assignment statements and iteration, sequence of commands, explicit state update
via assignment. The imperative paradigm most closely resembles the actual machine
itself, so the programmer is much closer to the machine; because of such closeness,
the imperative paradigm was the only one efficient enough for widespread use until
recently. Examples of such languages are Fortran, Algol, Pascal, c/c++, Java, Perl,
Javascript, Visual BASIC.NET.
2. Functional
Here, the main means of making computations is by applying functions to parameters.
The Functional Programming paradigm views all subprograms as functions in the
mathematical sense-informally, they take in arguments and return a single solution.
The solution returned is based entirely on the input, and the time at which a function
is called has no relevance. Examples are LISP, Scheme, ML, Haskell. It may also
include OO (Object Oriented) concepts.
3. Logic
This is Rule-based (rules are specified in no particular order). Computations here are
made through a logical inference process. The Logical Paradigm takes a declarative
approach to problem-solving. Various logical assertions about a situation are made,
establishing all known facts. Then queries are made. The role of the computer
becomes maintaining data and logical deduction.
Examples are PROLOG and CLIPS. This may also include OO concepts.

4. Object Oriented Programming (OOP) is a paradigm in which real-world objects


are each viewed as separate entities having their own state which is modified only by
built in procedures, called methods. Because objects operate independently, they are
encapsulated into modules which contain both local environments and methods.
Communication with an object is done by message passing.
2
Objects are organized into classes, from which they inherit methods and equivalent
variables. Each language has strength and weaknesses depending on the task
requiring solving (in a programming language).

What makes a good language?


 Clarity, simplicity, and unity of language concept
 Clarity of program syntax
 Naturalness for application
 Support for abstraction
 Ease of program verification
 Programming environment
 Portability of programs
 Cost of use:
— execution
— translation
— creation, testing, and use
— maintenance

Several characteristics believed to be important for making a programming


language good are:
Simplicity: A good programming language must be simple and easy to learn and use.
It should provide a programmer with a clear, simple and unified set of concepts, which
can be easily grasped. The overall simplicity of a programming language strongly
affects the readability of the programs written in that language, and programs, which
are easier to read and understand, are also easier to maintain. It is also easy to
develop and implement a compiler or an interpreter for a programming language,
which is simple. However, the power needed for the language should not be
sacrificed for simplicity.
Naturalness:- A good language should be natural for the application area, for which it
has been designed. That is, it should provide appropriate operators, data structures,

3
control structures, and a natural syntax to facilitate the users to code their problem
easily and efficiently.
Abstraction:- Abstraction means the ability to define and then use complicated
structures or operations in ways that allow many of the details to be ignored. The
degree of abstraction allowed by a programming language directly effects its writ
ability. Object oriented language support high degree of abstraction. Hence, writing
programs in object oriented language is much easier. Object oriented language also
support re usability of program segments due to this feature.
Efficiency :- Programs written in a good programming language are efficiently
translated into machine code, are efficiently executed, and acquire as little space in
the memory as possible. That is a good programming language is supported with a
good language translator which gives due consideration to space and time efficiency.
Structured:- Structured means that the language should have necessary features to
allow its users to write their programs based on the concepts of structured
programming. This property of a moreover, it forces a programmer to look at a
problem in a logical way, so that fewer errors are created while writing a program for
the problem.
Compactness :- In a good programming language, programmers should be able to
express intended operations concisely. A verbose language is generally not liked by
programmers, because they need to write too much.
Locality:- A good programming language should be such that while writing a
programmer concentrate almost solely on the part of the program around the
statement currently being worked with.
History and future of programming languages
The first programming languages developed in 1950 (low level languages)
Assembly and machine language
— tedious to write programs
— difficult to understand and debug
Continuously developed and improved: i.e. the object concept (1980), libraries and
scripting (1990), system independence (late 1990).
The abstraction of language increased continuously from a very low level (i.e.
assembly language) to a very high lev
4
el (i.e. XML).
Higher level language
— language closer to problem description
— programs portable
— exchange software
— programs easier to understand

Reasons for Studying Concepts of Programming Languages


• Increased ability to express ideas.
• It is believed that the depth at which we think is influenced by the
expressive power of the language in which we communicate our
thoughts. It is difficult for people to conceptualize structures they can’t
describe, verbally or in writing.
• Language in which they develop Software places limits on the kinds of
control structures, data structures, and abstractions they can use.
• Awareness of a wider variety of P/L features can reduce such limitations
in Software development.
• Can language constructs be simulated in other languages that do not
support those constructs directly?

• Improved background for choosing appropriate languages


• Many programmers, when given a choice of languages for a new project,
continue to use the language with which they are most familiar, even if it
is poorly suited to new projects.
• If these programmers were familiar with other languages available, they
would be in a better position to make informed language choices.
• Greater ability to learn new languages
• Programming languages are still in a state of continuous evolution, which
means continuous learning is essential.
• Programmers who understand the concept of OO programming will have
easier time learning Java.

5
• Once a thorough understanding of the fundamental concepts of
languages is acquired, it becomes easier to see how concepts are
incorporated into the design of the language being learned.

• Understand significance of implementation


• Understanding of implementation issues leads to an understanding of
why languages are designed the way they are.
• This in turn leads to the ability to use a language more intelligently, as it
was designed to be used.

• Ability to design new languages


• The more languages you gain knowledge of, the better understanding of
programming languages concepts you understand.

• Overall advancement of computing


• In some cases, a language became widely used, at least in part, b/c
those in positions to choose languages were not sufficiently familiar with
P/L concepts.
• Many believe that ALGOL 60 was a better language than Fortran;
however, Fortran was most widely used. It is attributed to the fact that
the programmers and managers didn’t understand the conceptual design
of ALGOL 60.
• Do you think IBM has something to do with it?

Criteria for Language Evaluation and Comparison


1. Expressivity means the ability of a language to clearly reflect the meaning
intended by the algorithm designer (the programmer). Thus an “expressive”
language permits an utterance to be compactly stated, and encourages the use of
statement forms associated with structured programming (usually “while “loops
and “if – then – else” statements).
2. Well-defined, we mean that the language’s syntax and semantics are free of
ambiguity, are internally consistent and complete. Thus the implementer of a
6
well-defined language should have, within its definition a complete specification of
all the language’s expressive forms and their meanings. The programmer, by the
same virtue should be able to predict exactly the behavior of each expression
before it is actually executed.
3. Data types and Structures”, we mean the ability of a language to support a
variety of data values (integers, real, strings, pointers etc.) and non elementary
collect ions of these.
4. Modularity has two aspects: the language’s support for sub-programming and the
language’s extensibility in the sense of allowing programmer – defined operators
and data types. By sub programming, we mean the ability to define independent
procedures and functions (subprograms), and communicate via parameters or
global variables with the invoking program.
5. Input-Output facilities In evaluating a language’s “Input-Output facilities” we are
looking at its support for sequential, indexed, and random access files, as well as
its support for database and information retrieval functions.
6. Portability A language which has “portability” is one which is implemented on a
variety of computers. That is, its design is relatively” machine – independent”.
Languages which are well- defined tend to be more portable than others.
7. An “efficient” language is one which permits fast compilation and execution on
the machines where it is implemented. Traditionally, FORTRAN and COBOL have
been relatively efficient languages in their respective application areas.
8. pedagogy Some languages have better “pedagogy” than others. That is, they are
intrinsically easier to teach and to learn, they have better textbooks; they are
implemented in a better program development environment, they are widely known
and used by the best programmers in an application area.
9. Generality: Means that a language is useful in a wide range of programming
applications. For instance, APL has been used in mathematical applications involving
matrix algebra and in business applications as well.
Lesson 2:
A brief history of programming language with emphasis on differences in syntax,
advantages and disadvantages of LISP, Algol, C, C++, Java, Python Scripting
Language for the Web such as PHP. It should be an assignment for the students.
7
Lesson 3:
Elements
All programming languages have some primitive building blocks for the description of
data and the processes or transformations applied to them (like the addition of two
numbers or the selection of an item from a collection). These primitives are defined by
syntactic and semantic rules which describe their structure and meaning respectively.

SYNTAX
A programming language's surface form is known as its syntax. The syntax of a
language describes the possible combinations of symbols that form a syntactically
correct program. The meaning given to a combination of symbols is handled by
semantics (either formal or hard-coded in a reference implementation).

Programming language syntax is usually defined using a combination of regular


expressions (for lexical structure) and Backus–Naur Form (for grammatical structure).

Semantics
The term semantics refers to the meaning of languages, as opposed to their form
(syntax).

Static semantics
The static semantics defines restrictions on the structure of valid texts that are hard or
impossible to express in standard syntactic formalisms. For compiled languages,
static semantics essentially include those semantic rules that can be checked at
compile time. Examples include checking that every identifier is declared before it is
used (in languages that require such declarations) or that the labels on the arms of a
case statement are distinct. Many important restrictions of this type, like checking that
identifiers are used in the appropriate context (e.g. not adding an integer to a function

8
name), or that subroutine calls have the appropriate number and type of arguments,
can be enforced by defining them as rules in a logic called a type system. Other forms
of static analyses like data flow analysis may also be part of static semantics. Newer
programming languages like Java and C# have definite assignment analysis, a form
of data flow analysis, as part of their static semantics.

Dynamic semantics
Once data has been specified, the machine must be instructed to perform operations
on the data. For example, the semantics may define the strategy by which
expressions are evaluated to values, or the manner in which control structures
conditionally execute statements. The dynamic semantics (also known as execution
semantics) of a language defines how and when the various constructs of a language
should produce a program behaviour. There are many ways of defining execution
semantics. Natural language is often used to specify the execution semantics of
languages commonly used in practice. A significant amount of academic research
went into formal semantics of programming languages, which allow execution
semantics to be specified in a formal manner. Results from this field of research have
seen limited application to programming language design and implementation outside
academia.

Syntactic ambiguity
Syntactic ambiguity is a property of sentences which may be reasonably interpreted in
more than one way, or reasonably interpreted to mean more than one thing.
Ambiguity may or may not involve one word having two parts of speech or
homonyms.

Syntactic ambiguity arises not from the range of meanings of single words, but from
the relationship between the words and clauses of a sentence, and the sentence
structure implied thereby. When a reader can reasonably interpret the same sentence
as having more than one possible structure, the text is equivocal and meets the
definition of syntactic ambiguity.

9
Operator Precedence
When several operations occur in an expression, each part is evaluated and resolved
in a predetermined order called operator precedence. Parentheses can be used to
override the order of precedence and force some parts of an expression to be
evaluated before other parts. Operations within parentheses are always performed
before those outside. Within parentheses, however, normal operator precedence is
maintained.

When expressions contain operators from more than one category, arithmetic
operators are evaluated first, comparison operators are evaluated next, and logical
operators are evaluated last. Comparison operators all have equal precedence; that
is, they are evaluated in the left-to-right order in which they appear. Arithmetic and
logical operators are evaluated in the following order of precedence:

Arithmetic Comparison Logical


Exponentiation (^) Equality (=) Not
Negation (-) Inequality (<>) And
Multiplication and Less than (<) Or
division (*, /)
Integer division (\) Greater than (>) Xor
Modulus arithmetic Less than or equal to Eqv
(Mod) (<=)
Addition and Greater than or equal Imp
subtraction (+, -) to (>=)
String concatenation Is &
(&)

When multiplication and division occur together in an expression, each operation is


evaluated as it occurs from left to right. Likewise, when addition and subtraction occur
together in an expression, each operation is evaluated in order of appearance from
left to right.

10
The string concatenation operator (&) is not an arithmetic operator, but in precedence
it does fall after all arithmetic operators and before all comparison operators. The Is
operator is an object reference comparison operator. It does not compare objects or
their values; it checks only to determine if two object references refer to the same
object.
Parsing
In linguistics, parsing is the process of analyzing a text, made of a sequence of tokens
(for example, words), to determine its grammatical structure with respect to a given
(more or less) formal grammar. Parsing can also be used as a linguistic term,
especially in reference to how phrases are divided up in garden path sentences.

Parser
In computing, a parser is one of the components in an interpreter or compiler, which
checks for correct syntax and builds a data structure (often some kind of parse tree,
abstract syntax tree or other hierarchical structure) implicit in the input tokens. The
parser often uses a separate lexical analyser to create tokens from the sequence of
input characters. Parsers may be programmed by hand or may be
(semi-)automatically generated (in some programming languages) by a tool.

Overview of process

Types of parser

11
The task of the parser is essentially to determine if and how the input can be derived
from the start symbol of the grammar. This can be done in essentially two ways:

 Top-down parsing- Top-down parsing can be viewed as an attempt to find


leftmost derivations of an input-stream by searching for parse trees using a top-
down expansion of the given formal grammar rules. Tokens are consumed from
left to right. Inclusive choice is used to accommodate ambiguity by expanding all
alternative right-hand-sides of grammar rules. Examples includes: Recursive
descent parser, LL parser (Left-to-right, Leftmost derivation), and so on.

 Bottom-up parsing - A parser can start with the input and attempt to rewrite it to
the start symbol. Intuitively, the parser attempts to locate the most basic elements,
then the elements containing these, and so on. LR parsers are examples of
bottom-up parsers. Another term used for this type of parser is Shift-Reduce
parsing.

Variables
It is a symbolic name given to some known or unknown quantity or information, for the
purpose of allowing the name to be used independently of the information it
represents. A variable name in computer source code is usually associated with a
data storage location and thus also its contents.

Compilers have to replace variables' symbolic names with the actual locations of the
data. While the variable name, type, and location generally remain fixed, the data
stored in the location may get altered during program execution.

Naming conventions
Unlike their mathematical counterparts, programming variables and constants
commonly take multiple-character names, e.g. COST or total. Single-character names
are most commonly used only for auxiliary variables; for instance, i, j, k for array index
variables.

12
Some naming conventions are enforced at the language level as part of the language
syntax and involve the format of valid identifiers. In almost all languages, variable
names cannot start with a digit (0-9) and cannot contain whitespace characters.
Whether, which, and when punctuation marks are permitted in variable names varies
from language to language; many languages only permit the underscore (_) in
variable names and forbid all other punctuation. In some programming languages,
specific (often punctuation) characters (known as sigils) are prefixed or appended to
variable identifiers to indicate the variable's type.

Case-sensitivity of variable names also varies between languages and some


languages require the use of a certain case in naming certain entities; most modern
languages are case-sensitive; some older languages are not. Some languages
reserve certain forms of variable names for their own internal use; in many
languages, names beginning with 2 underscores ("__") often fall under this category.

Binding
Binding describes how a variable is created and used (or "bound") by and within the
given program, and, possibly, by other programs, as well.

There are two types of binding; Dynamic, and Static binding.


• Dynamic Binding: (Also known as Dynamic Dispatch) is the process of mapping
a message to a specific sequence of code (method) at runtime. This is done to
support the cases where the appropriate method cannot be determined at
compile-time. It occurs first during execution, or can change during execution of
the program.
• Static Binding: It occurs first before run time and remains unchanged
throughout program execution

Scope
The scope of a variable describes where in a program's text, the variable may be
used, while the extent (or lifetime) describes when in a program's execution a variable
has a (meaningful) value. Scope is a lexical aspect of a variable. Most languages

13
define a specific scope for each variable (as well as any other named entity), which
may differ within a given program. The scope of a variable is the portion of the
program code for which the variable's name has meaning and for which the variable
is said to be "visible". It is also of two type; static and dynamic scope.

• Static Scope: The static scope of a variable is the most immediately enclosing
block, excluding any enclosed blocks where the variable has been re-declared.
The static scope of a variable in a program can be determined by simply
studying the text of the program. Static scope is not affected by the order in
which procedures are called during the execution of the program.
• Dynamic Scope: The dynamic scope of a variable extends to all the procedures
called thereafter during program execution, until the first procedure to be called
that re-declares the variable.

Referencing
The referencing environment is the collection of variable which can be used. In a
static scoped language, one can only reference the variables in the static reference
environment. A function in a static scoped language does have dynamic ancestors
(i.e. its callers), but cannot reference any variables declared in that ancestor.

Lesson four
Names, Bindings, Type Checking, and Scopes

Introduction
• Imperative languages are abstractions of von Neumann architecture – Memory: stores
both instructions and data
– Processor: provides operations for modifying the contents of memory
• Variables characterized by attributes
– Type: to design, must consider scope, lifetime, type checking, initialization, and
type compatibility

Names
Design issues for names:
• Maximum length?
• Are connector characters allowed?

14
• Are names case sensitive?
• Are special words reserved words or keywords?

Name Forms
ƒ A name is a string of characters used to identify some entity in a program.
ƒ If too short, they cannot be connotative  Language examples:
• FORTRAN I: maximum 6
• COBOL: maximum 30
• FORTRAN 90 and ANSI C: maximum 31
• Ada and Java: no limit, and all are significant
• C++: no limit, but implementers often impose a length limitation because they
do not want the symbol table in which identifiers are stored during compilation
to be too large and also to simplify the maintenance of that table.
ƒ Names in most programming languages have the same form: a letter followed by a
string consisting of letters, digits, and (_).
ƒ Although the use of the _ was widely used in the 70s and 80s, that practice is far less
popular.
ƒ C-based languages (C, C++, Java, and C#), replaced the _ by the “camel” notation, as in
myStack.
ƒ Prior to Fortran 90, the following two names are equivalent:

Sum Of Salaries // names could have embedded spaces


SumOfSalaries // which were ignored

• Case sensitivity
– Disadvantage: readability (names that look alike are different)
• worse in C++ and Java because predefined names are mixed case (e.g.
IndexOutOfBoundsException)
• In C, however, exclusive use of lowercase for names.
– C, C++, and Java names are case sensitive  rose, Rose, ROSE are distinct names
“What about Readability”

Special words
• An aid to readability; used to delimit or separate statement clauses • A keyword is a
word that is special only in certain contexts.
• Ex: Fortran

Real Apple // Real is a data type followed with


a name, therefore Real is a keyword
Real = 3.4 // Real is a variable name

• Disadvantage: poor readability. Compilers and users must recognize the difference.
• A reserved word is a special word that cannot be used as a user-defined name.
• As a language design choice, reserved words are better than keywords.
• Ex: In Fortran, one could have the statements
Integer Real // keyword “Integer” and variable “Real”

15
Real Integer // keyword “Real” and variable “Integer”
Variables
• A variable is an abstraction of a memory cell(s).
• Variables can be characterized as a sextuple of attributes:
o Name
o Address
o Value
o Type
o Lifetime
o Scope

Name
- Not all variables have names: Anonymous, heap-dynamic variables

Address
• The memory address with which it is associated
• A variable name may have different addresses at different places and at different times
during execution.

// sum in sub1 and sub2

• A variable may have different addresses at different times during execution. If a


subprogram has a local var that is allocated from the run time stack when the
subprogram is called, different calls may result in that var having different addresses.

// sum in sub1

• The address of a variable is sometimes called its l-value because that is what is required
when a variable appears in the left side of an assignment statement.
Aliases
• If two variable names can be used to access the same memory location, they are
called aliases
• Aliases are created via pointers, reference variables, C and C++ unions.
• Aliases are harmful to readability (program readers must remember all of them)
Type
• Determines the range of values of variables and the set of operations that are defined
for values of that type; in the case of floating point, type also determines the precision.
• For example, the int type in Java specifies a value range of -2147483648 to
2147483647, and arithmetic operations for addition, subtraction, multiplication,
division, and modulus.
Value
• The value of a variable is the contents of the memory cell or cells associated with the
variable.
• Abstract memory cell - the physical cell or collection of cells associated with a variable.

16
• A variable’s value is sometimes called its r-value because that is what is required when
a variable appears in the right side of an assignment statement.

The Concept of Binding


• The l-value of a variable is its address.
• The r-value of a variable is its value.
• A binding is an association, such as between an attribute and an entity, or
between an operation and a symbol.
• Binding time is the time at which a binding takes place.
• Possible binding times:
o Language design time: bind operator symbols to operations.
▪ For example, the asterisk symbol (*) is bound to the multiplication
operation.
• Language implementation time:
o A data type such as int in C is bound to a range of possible values.
• Compile time: bind a variable to a particular data type at compile time.
• Load time: bind a variable to a memory cell (ex. C static variables)
• Runtime: bind a nonstatic local variable to a memory cell.

Binding of Attributes to Variables

• A binding is static if it first occurs before run time and remains unchanged throughout
program execution.
• A binding is dynamic if it first occurs during execution or can change during
execution of the program.

Type Bindings
• If static, the type may be specified by either an explicit or an implicit declaration.

Variable Declarations

• An explicit declaration is a program statement used for declaring the types of


variables.
• An implicit declaration is a default mechanism for specifying types of variables
(the first appearance of the variable in the program.) • Both explicit and implicit
declarations create static bindings to types.
• FORTRAN, PL/I, BASIC, and Perl provide implicit declarations.
• EX:
– In Fortran, an identifier that appears in a program that is not explicitly declared
is implicitly declared according to the following convention:
I, J, K, L, M, or N or their lowercase versions is implicitly declared to be
Integer type; otherwise, it is implicitly declared as Real type.
– Advantage: writability.

17
– Disadvantage: reliability suffers because they prevent the compilation process
from detecting some typographical and programming errors.
– In Fortran, vars that are accidentally left undeclared are given default types and
unexpected attributes, which could cause subtle errors that, are difficult to
diagnose.
• Less trouble with Perl: Names that begin with $ is a scalar, if a name begins with @
it is an array, if it begins with %, it is a hash structure.
– In this scenario, the names @apple and %apple are unrelated.
• In C and C++, one must distinguish between declarations and definitions.
– Declarations specify types and other attributes but do no cause allocation of
storage. Provides the type of a var defined external to a function that is used in
the function.
– Definitions specify attributes and cause storage allocation.

Dynamic Type Binding (JavaScript and PHP)


• Specified through an assignment statement
• Ex, JavaScript

list = [2, 4.33, 6, 8]; -> single-dimensioned array


list = 47; -> scalar variable

– Advantage: flexibility (generic program units) – Disadvantages:


– High cost (dynamic type checking and interpretation)
• Dynamic type bindings must be implemented using pure interpreter not compilers.
• Pure interpretation typically takes at least ten times as long as to execute equivalent
machine code.

– Type error detection by the compiler is difficult because any variable can
be assigned a value of any type.
• Incorrect types of right sides of assignments are not detected as errors; rather, the
type of the left side is simply changed to the incorrect type.
• Ex:

i, x -> Integer
y -> floating-point array
i = x -> what the user meant to type
i = y -> what the user typed instead

• No error is detected by the compiler or run-time system. i is simply changed to a


floating-point array type. Hence, the result is erroneous. In a static type binding
language, the compiler would detect the error and the program would not get to
execution.

18
Type Inference (ML, Miranda, and Haskell)
• Rather than by assignment statement, types are determined from the context of the
reference.
• Ex:
fun circumf(r) = 3.14159 * r * r;
The argument and functional value are inferred to
be real.
fun times10(x) =10*x;
The argument and functional value are inferred to
be int.
Storage Bindings & Lifetime
– Allocation - getting a cell from some pool of available cells.
– Deallocation - putting a cell back into the pool.
– The lifetime of a variable is the time during which it is bound to a particular
memory cell. So the lifetime of a var begins when it is bound to a specific cell
and ends when it is unbound from that cell.
– Categories of variables by lifetimes: static, stack-dynamic, explicit heap-
dynamic, and implicit heap-dynamic

Static Variables:
– bound to memory cells before execution begins and remains bound to the same
memory cell throughout execution.
– e.g. all FORTRAN 77 variables, C static variables.
– Advantages:
• Efficiency: (direct addressing): All addressing of static vars can be
direct. No run-time overhead is incurred for allocating and deallocating
vars.
• History-sensitive: have vars retain their values between separate
executions of the subprogram.
– Disadvantage:
• Storage cannot be shared among variables.
• Ex: if two large arrays are used by two subprograms, which are never
active at the same time, they cannot share the same storage for their
arrays.

Stack-dynamic Variables:
– Storage bindings are created for variables when their declaration statements are
elaborated, but whose types are statically bound.
– Elaboration of such a declaration refers to the storage allocation and binding
process indicated by the declaration, which takes place when execution reaches
the code to which the declaration is attached.
– Ex:
• The variable declarations that appear at the beginning of a Java method
are elaborated when the method is invoked and the variables defined by

19
those declarations are deallocated when the method completes its
execution.
– Stack-dynamic variables are allocated from the run-time stack.
– If scalar, all attributes except address are statically bound.
– Ex:
• Local variables in C subprograms and Java methods. – Advantages:
• Allows recursion: each active copy of the recursive subprogram has its
own version of the local variables.
• In the absence of recursion it conserves storage b/c all subprograms share
the same memory space for their locals.
– Disadvantages:
• Overhead of allocation and deallocation.
• Subprograms cannot be history sensitive.
• Inefficient references (indirect addressing) is required b/c the place in the
stack where a particular var will reside can only be determined during
execution.
– In Java, C++, and C#, variables defined in methods are by default stack-
dynamic.

Explicit Heap-dynamic Variables:


– Nameless memory cells that are allocated and deallocated by explicit directives
“run-time instructions”, specified by the programmer, which take effect during
execution.
– These vars, which are allocated from and deallocated to the heap, can only be
referenced through pointers or reference variables.
– The heap is a collection of storage cells whose organization is highly
disorganized b/c of the unpredictability of its use.
– e.g. dynamic objects in C++ (via new and delete)

int *intnode;

intnode = new int; // allocates an int cell

delete intnode; // deallocates the cell to which //
intnode points

– An explicit heap-dynamic variable of int type is created by the new operator.

– This operator can be referenced through the pointer, intnode.


– The var is deallocated by the delete operator.
– Java, all data except the primitive scalars are objects.
– Java objects are explicitly heap-dynamic and are accessed through reference
variables.
– Java uses implicit garbage collection.
– Explicit heap-dynamic vars are used for dynamic structures, such as linked lists
and trees that need to grow and shrink during execution.
20
– Advantage:
– Provides for dynamic storage management.
– Disadvantage:
– Inefficient “Cost of allocation and deallocation” and unreliable “difficulty of
using pointer and reference variables correctly”

Implicit Heap-dynamic Variables:


– Bound to heap storage only when they are assigned value.
Allocation and deallocation caused by assignment statements.
– All their attributes are bound every time they are assigned.
– e.g. all variables in APL; all strings and arrays in Perl and JavaScript.
– Advantage:
– Flexibility allowing generic code to be written.
– Disadvantages:
• Inefficient, because all attributes are dynamic “run-time.”
• Loss of error detection by the compiler.

Type Checking

• Type checking is the activity of ensuring that the operands of an operator are of
compatible types.
• A compatible type is one that is either legal for the operator, or is allowed under language
rules to be implicitly converted, by compiler-generated code, to a legal type.
• This automatic conversion is called a coercion.
• Ex: an int var and a float var are added in Java, the value of the int var is coerced to float
and a floating-point is performed.
• A type error is the application of an operator to an operand of an inappropriate type.
• Ex: in C, if an int value was passed to a function that expected a float value, a type error
would occur (compilers didn’t check the types of parameters)
• If all type bindings are static, nearly all type checking can be static.
• If type bindings are dynamic, type checking must be dynamic and done at run-time.

Strong Typing
• A programming language is strongly typed if type errors are always detected. It requires
that the types of all operands can be determined, either at compile time or run time.
• Advantage of strong typing: allows the detection of the misuses of variables that result in
type errors.
• Java and C# are strongly typed. Types can be explicitly cast, which would result in type
error. However, there are no implicit ways type errors can go undetected.
• The coercion rules of a language have an important effect on the value of type checking.
• Coercion results in a loss of part of the reason of strong typing – error detection.
• Ex:
int a, b; float d;
a + d; // the programmer meant a + b, however
– The compiler would not detect this error. Var a would be coerced to float.

21
Scope
– The scope of a var is the range of statements in which the var is visible.
– A var is visible in a statement if it can be referenced in that statement.
– Local var is local in a program unit or block if it is declared there.
– Non-local var of a program unit or block are those that are visible within the program unit
or block but are not declared there.

Static Scope
– Binding names to non-local vars is called static scoping.
– There are two categories of static scoped languages:
ƒ Nested Subprograms.
ƒ Subprograms that can’t be nested.
– Ada, and JavaScript allow nested subprogram, but the C-based languages do not.
– When a compiler for static-scoped language finds a reference to a var, the attributes of the
var are determined by finding the statement in which it was declared.
– Ex: Suppose a reference is made to a var x in subprogram Sub1. The correct declaration is
found by first searching the declarations of subprogram Sub1.
– If no declaration is found for the var there, the search continues in the declarations of the
subprogram that declared subprogram Sub1, which is called its static parent.
– If a declaration of x is not found there, the search continues to the next larger enclosing
unit (the unit that declared Sub1’s parent), and so forth, until a declaration for x is found or
the largest unit’s declarations have been searched without success.  an undeclared var
error has been detected.
– The static parent of subprogram Sub1, and its static parent, and so forth up to and including
the main program, are called the static ancestors of Sub1. Ex: Ada procedure:

Procedure Big is
X : Integer;
Procedure Sub1 is
Begin -- of Sub1
…X…
end; -- of Sub1 Procedure
Sub2 is
X Integer;
Begin -- of Sub2 …
X…
end; -- of
Sub2
Begin -- of Big

end; -- of Big
– Under static scoping, the reference to the var X in Sub1 is to the X declared in the
procedure Big.
– This is true b/c the search for X begins in the procedure in which the reference occurs,
Sub1, but no declaration for X is found there.
22
– The search thus continues in the static parent of Sub1, Big, where the declaration of X is
found.
– Ex: Skeletal C#

void sub() { int


count; … while
(…) { int
count; count +
+;

}

}

– The reference to count in the while loop is to that loop’s local count. The count of sub is
hidden from the code inside the while loop.
– A declaration for a var effectively hides any declaration of a var with the same name in a
larger enclosing scope.
– C++ and Ada allow access to these "hidden" variables
ƒ In Ada: Main.X
ƒ In C++: class_name::name

Blocks
– Allows a section of code to have its own local vars whose scope is minimized.
– Such vars are stack dynamic, so they have their storage allocated when the section is
entered and deallocated when the section is exited.
– From ALGOL 60:
– Ex:
C and C++: for (...)
{ int index;
...
}

Ada:
declare LCL : FLOAT; begin
...
end
Dynamic Scope
ƒ The scope of variables in APL, SNOBOL4, and the early versions of LISP is dynamic.
ƒ Based on calling sequences of program units, not their textual layout (temporal versus
spatial) and thus the scope is determined at run time.
ƒ References to variables are connected to declarations by searching back through the chain
of subprogram calls that forced execution to this point.
ƒ Ex:

23
Procedure Big is
X : Integer;
Procedure Sub1 is
Begin -- of Sub1
…X…
end; -- of Sub1 Procedure
Sub2 is
X Integer;
Begin -- of Sub2
…X…
end; -- of
Sub2
Begin -- of Big

end; -- of Big

ƒ Big calls Sub1 o The dynamic parent of Sub1 is Big. The reference is to the X in Big.
ƒ Big calls Sub2 and Sub2 calls Sub1 o The search proceeds from the local procedure, Sub1,
to its caller, Sub2, where a declaration of X is found.
ƒ Note that if static scoping was used, in either calling sequence the reference to X in Sub1
would be to Big’s X.
Scope and Lifetime
ƒ Ex: void printheader()
{

} /* end of printheader */
void compute()
{ int sum;

printheader();
} /* end of compute */

ƒ The scope of sum in contained within compute.


ƒ The lifetime of sum extends over the time during which printheader executes.
ƒ Whatever storage location sum is bound to before the call to printheader, that binding
will continue during and after the execution of printheader.

Referencing environment
 It is the collection of all names that are visible in the statement.
• In a static-scoped language, it is the local variables plus all of the visible variables in
all of the enclosing scopes.
• The referencing environment of a statement is needed while that statement is being
compiled, so code and data structures can be created to allow references to non-local
vars in both static and dynamic scoped languages.
• A subprogram is active if its execution has begun but has not yet terminated.

24
• In a dynamic-scoped language, the referencing environment is the local variables plus
all visible variables in all active subprograms.
• Ex, Ada, static-scoped language procedure Example is
A, B : Integer;

procedure Sub1 is X,
Y : Integer; begin --
of Sub1
… -> 1
end -- of Sub1 procedure
Sub2 is
X : Integer;

procedure Sub3 is
X : Integer; begin --
of Sub3
… -> 2
end; -- of Sub3 begin -- of
Sub2
… -> 3
end;{ Sub2}
begin
… -> 4
end; {Example}

 The referencing environments of the indicated program points are as follows:

Point Referencing Environment


1 X and Y of Sub1, A & B of Example
2 X of Sub3, (X of Sub2 is hidden), A and B of Example
3 X of Sub2, A and B of Example
4 A and B of Example

ƒ Ex, dynamic-scoped language


ƒ Consider the following program; assume that the only function calls are the following:
main calls sub2, which calls sub1

void sub1( )
{
int a, b;
… 1
} /* end of sub1 */
void sub2( )
{
int b, c;

25
… 2
sub1; } /* end of sub2
*/
void main ( )
{
int c, d;
… 3
sub2( ); } /* end of main
*/

 The referencing environments of the indicated program points are as follows:

Point Referencing Environment


1 a and b of sub1, c of sub2, d of main
2 b and c of sub2, d of main
3 c and d of main
Named Constants
ƒ It is a var that is bound to a value only at the time it is bound to storage; its value can’t
be change by assignment or by an input statement.
ƒ Ex, Java
final int LEN = 100;

ƒ Advantages: readability and modifiability

Variable Initialization
• The binding of a variable to a value at the time it is bound to storage is called
initialization.
• Initialization is often done on the declaration statement.
• Ex, Java
int sum = 0;

Lesson Five: Organization of Programming Language

Data Type

Introduction

• A data type defines a collection of data objects and a set of predefined operations on
those objects.
• Computer programs produce results by manipulating data.

26
• ALGOL 68 provided a few basic types and a few flexible structure-defining operators that
allow a programmer to design a data structure for each need.
• A descriptor is the collection of the attributes of a variable.
• In an implementation a descriptor is a collection of memory cells that store variable
attributes.
• If the attributes are static, descriptor are required only at compile time.
• They are built by the compiler, usually as a part of the symbol table, and are used during
compilation.
• For dynamic attributes, part or all of the descriptor must be maintained during execution.
• Descriptors are used for type checking and by allocation and deallocation operations.

Primitive Data Types


• Those not defined in terms of other data types are called primitive data types.
• The primitive data types of a language, along with one or more type constructors
provide structured types.

Numeric Types
1. Integer
– Almost always an exact reflection of the hardware, so the mapping is trivial.
– There may be as many as eight different integer types in a language.
– Java has four: byte, short, int, and long.
– Integer types are supported by the hardware.

2. Floating-point
– Model real numbers, but only as approximations for most real values.
– On most computers, floating-point numbers are stored in binary, which exacerbates the
problem.
– Another problem is the loss of accuracy through arithmetic operations.
– Languages for scientific use support at least two floating-point types; sometimes more
(e.g. float, and double.)
– The collection of values that can be represented by a floating-point type is defined in
terms of precision and range.
– Precision: is the accuracy of the fractional part of a value, measured as the number of
bits. Figure below shows single and double precision.
– Range: is the range of fractions and exponents.

27
3. Decimal

– Most larger computers that are designed to support business applications have hardware
support for decimal data types.
– Decimal types store a fixed number of decimal digits, with the decimal point at a fixed
position in the value.
– These are the primary data types for business data processing and are therefore essential
to COBOL.
– Advantage: accuracy of decimal values.
– Disadvantages: limited range since no exponents are allowed, and its representation
wastes memory.

Boolean Types
– Introduced by ALGOL 60.
– They are used to represent switched and flags in programs.
– The use of Booleans enhances readability.
– One popular exception is C89, in which numeric expressions are used as conditionals.
In such expressions, all operands with nonzero values are considered true, and zero is
considered false.

Character Types
– Char types are stored as numeric codings (ASCII / Unicode).
– Traditionally, the most commonly used coding was the 8-bit code ASCII (American
Standard Code for Information Interchange).
– A 16-bit character set named Unicode has been developed as an alternative.
– Java was the first widely used language to use the Unicode character set. Since then, it
has found its way into JavaScript and C#.

Character String Types


– A character string type is one in which values are sequences of characters.
– Important Design Issues:
1. Is it a primitive type or just a special kind of array?
2. Is the length of objects static or dynamic?
– C and C++ use char arrays to store char strings and provide a collection of string
operations through a standard library whose header is string.h.
– How is the length of the char string decided?
– The null char which is represented with 0.
– Ex:

char *str = “apples”; // char ptr points at the str


apples0

– In this example, str is a char pointer set to point at the string of characters, apples0,
where 0 is the null char.

28
String Typical Operations: –
Assignment
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
– Some of the most commonly used library functions for character strings in C and
C++ are o strcpy: copy strings
o strcat: catenates on given string onto another
o strcmp: lexicographically compares (the order of their codes) two strings
o strlen: returns the number of characters, not counting the null – In Java, strings are
supported as a primitive type by String class
String Length Options
– Static Length String: The length can be static and set when the string is created. This
is the choice for the immutable objects of Java’s String class as well as similar classes
in the C++ standard class library and the .NET class library available to C#.
– Limited Dynamic Length Strings: allow strings to have varying length up to a
declared and fixed maximum set by the variable’s definition, as exemplified by the
strings in C.
– Dynamic Length Strings: Allows strings various length with no maximum. Requires
the overhead of dynamic storage allocation and deallocation but provides flexibility.
Ex: Perl and JavaScript.
Evaluation
– Aid to writability.
– As a primitive type with static length, they are inexpensive to provide--why not have
them?
– Dynamic length is nice, but is it worth the expense?

Implementation of Character String Types


– Static length - compile-time descriptor has three fields:
1. Name of the type
2. Type’s length
3. Address of first char

Compiler-time descriptor for static strings

– Limited dynamic length Strings - may need a run-time descriptor for length to
store both the fixed maximum length and the current length (but not in C and C++
because the end of a string is marked with the null character).

29
Run-time descriptor for limited dynamic strings

– Dynamic length Strings–


– Need run-time descriptor because only current length needs to be stored.
– Allocation/deallocation is the biggest implementation problem. Storage to which it
is bound must grow and shrink dynamically.
– There are two approaches to supporting allocation and deallocation:
1. Strings can be stored in a linked list “Complexity of string operations, pointer
chasing”
2. Store strings in adjacent cells. “What about when a string grows?” Find a new
area of memory and the old part is moved to this area. Allocation and
deallocation is slower but using adjacent cells results in faster string operations
and requires less storage.
User-Defined Ordinal Types

– An ordinal type is one in which the range of possible values can be easily associated
with the set of positive integers
– Examples of primitive ordinal types in Java
– integer
– char
– boolean
– In some languages, users can define two kinds of ordinal types: enumeration and
subrange.

Enumeration Types
– All possible values, which are named constants, are provided in the definition
– C++ example

enum colors {red, blue, green, yellow, black};


colors myColor = blue, yourColor = red;
myColor++; // would assign green to myColor

– The enumeration constants are typically implicitly assigned the integer values, 0, 1, …,
but can be explicitly assigned any integer literal.
– Java does not include an enumeration type, presumably because they can be represented
as data classes. For example,

class colors { public


final int red = 0;
public final int blue = 1;
30
}

Subrange Types
– An ordered contiguous subsequence of an ordinal type
– Example: 12..18 is a subrange of integer type
– Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;

Day1: Days;
Day2: Weekday;
Day2 := Day1;

Array Types
ƒ An array is an aggregate of homogeneous data elements in which an individual element is
identified by its position in the aggregate, relative to the first element.
ƒ A reference to an array element in a program often includes one or more non-constant
subscripts.
ƒ Such references require a run-time calculation to determine the memory location being
referenced.

Arrays and Indexes


• Indexing is a mapping from indices to elements.
• The mapping can be shown as:

map(array_name, index_value_list) → an element

• Ex: Ada

Sum := Sum + B(I);

Because ( ) are used for both subprogram parameters and array subscripts in Ada, this
results in reduced readability.

• C-based languages use [ ] to delimit array indices.


• Two distinct types are involved in an array type:
o The element type, and o The
type of the subscripts.
• The type of the subscript is often a sub-range of integers.
• Ada allows other types as subscripts, such as Boolean, char, and enumeration.
• Among contemporary languages, C, C++, Perl, and Fortran don’t specify range checking
of subscripts, but Java, and C# do.

31
Subscript Bindings and Array Categories
• The binding of subscript type to an array variable is usually static, but the subscript
value ranges are sometimes dynamically bound.
• In C-based languages, the lower bound of all index ranges is fixed at 0; Fortran 95, it
defaults to 1.
1. A static array is one in which the subscript ranges are statically bound and storage
allocation is static (done before run time).
• Advantages: efficiency “No allocation & deallocation.”
• Ex:
Arrays declared in C & C++ function that includes the static modifier are static.

2. A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but
the allocation is done at elaboration time during execution.
• Advantages: Space efficiency. A large array in one subprogram can use the same space
as a large array in different subprograms.
• Ex:
Arrays declared in C & C++ function without the static modifier are fixed stack-
dynamic arrays.

3. A stack-dynamic array is one in which the subscript ranges are dynamically bound, and
the storage allocation is dynamic “during execution.” Once bound they remain fixed
during the lifetime of the variable.
• Advantages: Flexibility. The size of the array is not known until the array is about to be
used.
• Ex:
Ada arrays can be stack dynamic:

Get (List_Len);
declare
List : array (1..List_Len) of
Integer; begin . . . end;

The user inputs the number of desired elements for array List. The elements are then
dynamically allocated when execution reaches the declare block. When execution
reaches the end of the block, the array is deallocated.

4. A fixed heap-dynamic array is one in which the subscript ranges are dynamically bound,
and the storage allocation is dynamic, but they are both fixed after storage is allocated.
• The bindings are done when the user program requests them, rather than at elaboration
time and the storage is allocated on the heap, rather than the stack. • Ex:
C & C++ also provide fixed heap-dynamic arrays. The function malloc and free
are used in C. The operations new and delete are used in C++.

In Java all arrays are fixed heap dynamic arrays. Once created, they keep the same
subscript ranges and storage.

32
5. A heap-dynamic array is one in which the subscript ranges are dynamically bound, and
the storage allocation is dynamic, and can change any number of times during the array’s
lifetime.
• Advantages: Flexibility. Arrays can grow and shrink during program execution as the
need for space changes.
• Ex:
C# provides heap-dynamic arrays using an array class ArrayList.

ArrayList intList = new ArrayList( );

Elements are added to this object with the Add method, as in


intArray.Add(nextOne);
Perl and JavaScript also support heap-dynamic arrays.
For example, in Perl we could create an array of five numbers with

@list = {1, 2, 4, 7, 10);

Later, the array could be lengthened with the push function, as in

push(@list, 13, 17);

Now the aary’s value is (1, 2, 4, 7, 10, 13, 17).

Array Initialization
 Usually just a list of values that are put in the array in the order in which the array elements
are stored in memory.
 Fortran uses the DATA statement, or put the values in / ... / on the declaration.

Integer List (3)


Data List /0, 5, 5/ // List is initialized to the values

 C and C++ - put the values in braces; let the compiler count them.

int stuff [] = {2, 4, 6, 8};

 The compiler sets the length of the array.


 What if the programmer mistakenly left a value out of the list?
 Character Strings in C & C++ are implemented as arrays of char.

char name [ ] = “Freddie”; //how many elements in array name?

 The array will have 8 elements because the null character is implicitly included by the
compiler.
 In Java, the syntax to define and initialize an array of references to String objects.

33
String [ ] names = [“Bob”, “Jake”, “Debbie”];

 Ada positions for the values can be specified:

List : array (1..5) of Integer := (1, 3, 5, 7, 9);


Bunch : array (1..5) of Integer:= (1 => 3, 3 => 4, others =>
0); Note: the array value is (3, 0, 4, 0, 0)

Implementation of Arrays
 Access function maps subscript expressions to an address in the array  Access
function for single-dimensioned arrays:

address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size)

 Accessing Multi-dimensioned Arrays  Two common ways:


o Row major order (by rows) – used in most languages o column major order (by
columns) – used in Fortran
 For example, if the matrix had the values
347
625
138
o it would be stored in row major order as:

3, 4, 7, 6, 2, 5, 1, 3, 8

o If the example matrix above were stored in column major, it would have the
following order in memory.

3, 6, 1, 4, 2, 3, 7, 5, 8

o In all cases, sequential access to matrix elements will be faster if they are accessed
in the order in which they are stored, because that will minimize the paging.
(Paging is the movement of blocks of information between disk and main memory.
The objective of paging is to keep the frequently needed parts of the program in
memory and the rest on disk.)
 Locating an Element in a Multi-dimensioned Array (row major)

Location (a[i,j]) = address of a [1,1 ] + ( (i - 1) * n + (j - 1) ) * element_size

34
Associative Arrays
 An associative array is an unordered collection of data elements that are indexed by an
equal number of values called keys.
 So each element of an associative array is in fact a pair of entities, a key and a value.
 Associative arrays are supported by the standard class libraries of Java and C++ and Perl.
 Example: In Perl, associative arrays are often called hashes. Names begin with %; literals
are delimited by parentheses

%temps = ("Mon" => 77, "Tue" => 79, “Wed” => 65);
o Subscripting is done using braces and keys
$temps{"Wed"} = 83;
o Elements can be removed with delete
delete $temps{"Tue"};

o Elements can be emptied by assigning the empty literal

@temps = ( );
Record Types

 A record is a possibly heterogeneous aggregate of data elements in which the individual


elements are identified by names.
 In C, C++, and C#, records are supported with the struct data type. In C++, structures are a
minor variation on classes.
 COBOL uses level numbers to show nested records; others use recursive definition
 Definition of Records in COBOL o COBOL uses level numbers to show nested records;
others use recursive definition

01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.

 Definition of Records in Ada o Record structures are indicated in an orthogonal way

type Emp_Rec_Type is record

35
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float; end
record;
Emp_Rec: Emp_Rec_Type;

References to Records
ƒ Most language use dot notation

Emp_Rec.Name
ƒ Fully qualified references must include all record names
ƒ Elliptical references allow leaving out record names as long as the reference is
unambiguous, for example in COBOL

FIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC are elliptical references to


the employee’s first name

Operations on Records
ƒ Assignment is very common if the types are identical
ƒ Ada allows record comparison
ƒ Ada records can be initialized with aggregate literals  COBOL provides MOVE
CORRESPONDING
o Copies a field of the source record to the corresponding field in the target record
Unions Types
 A union is a type whose variables are allowed to store different type values at different times
during execution.

Discriminated vs. Free Unions


ƒ Fortran, C, and C++ provide union constructs in which there is no language support for
type checking; the union in these languages is called free union
ƒ Type checking of unions require that each union include a type indicator called a
discriminanted union.
ƒ Supported by Ada

Ada Union Types


type Shape is (Circle, Triangle,
Rectangle); type Colors is (Red, Green,
Blue); type Figure (Form: Shape) is
record
Filled: Boolean;
Color: Colors;
case Form is
when Circle => Diameter: Float;
when Triangle =>

36
Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case; end
record;

Ada Union Type Illustrated: A discriminated union of three shape variables

Evaluation of Unions  Potentially


unsafe construct
ƒ Java and C# do not support unions
ƒ Reflective of growing concerns for safety in programming language
Pointers
ƒ A pointer type in which the vars have a range of values that consists of memory addresses
and a special value, nil.
ƒ The value nil is not a valid address and is used to indicate that a pointer cannot currently be
used to reference any memory cell.

Pointer Operations
ƒ A pointer type usually includes two fundamental pointer operations, assignment and
dereferencing.
ƒ Assignment sets a pointer var’s value to some useful address.
ƒ Dereferencing takes a reference through one level of indirection.
ƒ In C++, dereferencing is explicitly specified with the (*) as a prefix unary operation.
ƒ If ptr is a pointer var with the value 7080, and the cell whose address is 7080 has the
value 206, then the assignment

j = *ptr

sets j to 206.

37
The assignment operation j = *ptr
Pointer Problems
1. Dangling pointers (dangerous)
– A pointer points to a heap-dynamic variable that has been deallocated.
– Dangling pointers are dangerous for the following reasons:
– The location being pointed to may have been allocated to some new heap-
dynamic var. If the new var is not the same type as the old one, type checks of
uses of the dangling pointer are invalid.
– Even if the new one is the same type, its new value will bear no relationship to
the old pointer’s dereferenced value.
– If the dangling pointer is used to change the heap-dynamic variable, the value of
the heap-dynamic variable will be destroyed.
– It is possible that the location now is being temporarily used by the storage
management system, possibly as a pointer in a chain of available blocks of
storage, thereby allowing a change to the location to cause the storage manager
to fail.
– The following sequence of operations creates a dangling pointer in many
languages:

a. Pointer p1 is set to point at a new heap-dynamic variable.


b. Set a second pointer p2 to the value of the first pointer p1.
c. The heap-dynamic variable pointed to by p1 is explicitly deallocated (setting p1 to
nil), but p2 is not changed by the operation. P2 is now a dangling pointer.

2. Lost Heap-Dynamic Variables (wasteful)

– A heap-dynamic variable that is no longer referenced by any program pointer “no


longer accessible by the user program.”
– Such variables are often called garbage because they are not useful for their original
purpose, and also they can’t be reallocated for some new use by the program.
– Creating Lost Heap-Dynamic Variables:

a. Pointer p1 is set to point to a newly created heap-dynamic variable.


b. p1 is later set to point to another newly created heap-dynamic variable.
c. The first heap-dynamic variable is now inaccessible, or lost.

– The process of losing heap-dynamic variables is called memory leakage.

38
Pointers in Ada
– Some dangling pointers are disallowed because dynamic objects can be automatically
de-allocated at the end of pointer's type scope
– The lost heap-dynamic variable problem is not eliminated by Ada

Pointers in Fortran 95
– Pointers point to heap and non-heap variables
– Implicit dereferencing
– Pointers can only point to variables that have the TARGET attribute – The TARGET
attribute is assigned in the declaration:

INTEGER, TARGET :: NODE

Pointers in C and C++


– Extremely flexible but must be used with care.
– Pointers can point at any variable regardless of when it was allocated
– Used for dynamic storage management and addressing
– Pointer arithmetic is possible in C and C++ makes their pointers more interesting than
those of the other programming languages.
– Unlike the pointers of Ada, which can only point into the heap, C and C++ pointers can
point at virtually any variable anywhere in memory.
– Explicit dereferencing and address-of operators
– In C and C++, the asterisk (*) denotes the dereferencing operation, and the ampersand
(&) denotes the operator for producing the address of a variable. For example, in the
code
int *ptr;
int count, init;

ptr = &init;
count = *ptr

– the two assignment statement are equivalent to the single assignement

count = init;

– Example: Pointer Arithmetic in C and C++


int list[10];
int *ptr;
ptr = list;

*(ptr+5) is equivalent to list[5] and ptr[5]


*(ptr+i) is equivalent to list[i] and ptr[i]

– Domain type need not be fixed (void *)


– void * can point to any type and can be type checked (cannot be de-referenced)
39
Reference Types
– C++ includes a special kind of pointer type called a reference type that is used primarily
for formal parameters in function definition.
– A C++ reference type variable is a constant pointer that is always implicitly
dereferenced.
– Because a C++ reference type variable is a constant, it must be initialized with the
address of some variable in its definition, and after initialization a reference type
variable can never be set to reference any other variable.
– Reference type variables are specified in definitions by preceding their names with
ampersands (&). for example,

int result = 0;
int &ref_result = result;

ref_ result = 100;

I this code segment, result and ref_ result are aliases.

– In Java, reference variables are extended from their C++ form to one that allow them
to replace pointers entirely.
– The fundamental difference between C++ pointers and Java references is that C++
pointers refer to memory addresses, whereas Java references refer to class instances.
– Because Java class instances are implicitly deallocated (there is no explicit deallocation
operator), there cannot be a dangling reference.
– C# includes both the references of Java and the pointers of C++. However, the use of
pointers is strongly discouraged. In fact, any method that uses pointers must include the
unsafe modifier.
– Pointers can point at any variable regardless of when it was allocated

Lesson Six:

Expressions and Assignment Statements


Introduction

 Expressions are the fundamental means of specifying computations in a programming


language.
 To understand expression evaluation, need to be familiar with the orders of operator and
operand evaluation.
 Essence of imperative languages is dominant role of assignment statements.

Arithmetic Expressions
 Their evaluation was one of the motivations for the development of the first programming
languages.

40
 Most of the characteristics of arithmetic expressions in programming languages were
inherited from conventions that had evolved in math.
 Arithmetic expressions consist of operators, operands, parentheses, and function calls.
 The operators can be unary, or binary. C-based languages include a ternary operator,
which has three operands (conditional expression).
 The purpose of an arithmetic expression is to specify an arithmetic computation.
 An implementation of such a computation must cause two actions: o Fetching the operands
from memory o Executing the arithmetic operations on those operands.
 Design issues for arithmetic expressions:
1. What are the operator precedence rules?
2. What are the operator associativity rules?
3. What is the order of operand evaluation?
4. Are there restrictions on operand evaluation side effects?
5. Does the language allow user-defined operator overloading?
6. What mode mixing is allowed in expressions?

Operator Evaluation Order


1. Precedence
 The operator precedence rules for expression evaluation define the order in which
“adjacent” operators of different precedence levels are evaluated (“adjacent” means
they are separated by at most one operand).
 Typical precedence levels:
1. parentheses
2. unary operators
3. ** (if the language supports it) 4. *, /
5. +, -
 Many languages also include unary versions of addition and subtraction.
 Unary addition (+) is called the identity operator because it usually has no associated
operation and thus has no effect on its operand.
 In Java, unary plus actually does have an effect when its operand is short or byte. An
implicit conversion of short and byte operands to int type takes place.
 Unary minus operator (-) Ex:

A + (- B) * C // is legal
A + - B * C // is illegal

2. Associativity
 The operator associativity rules for expression evaluation define the order in which
adjacent operators with the same precedence level are evaluated. An operator can be
either left or right associative.
 Typical associativity rules:
o Left to right, except **, which is right to left
o Sometimes unary operators associate right to left (e.g., FORTRAN)  Ex: (Java)

a – b + c // left to right

41
 Ex: (Fortran)

A ** B ** C // right to left

(A ** B) ** C // In Ada it must be
parenthesized

Language Associativity Rule


FORTRAN Left: * / + -
Right: **
C-BASED LANGUAGES Left: * / % binary + binary -
Right: ++ -- unary – unary +
ADA Left: all except **
Non-associative: **

 APL is different; all operators have equal precedence and all operators associate right
to left.
 Ex:
A X B + C // A = 3, B = 4, C = 5  27

 Precedence and associativity rules can be overridden with parentheses.

3. Parentheses
 Programmers can alter the precedence and associativity rules by placing parentheses in
expressions.
 A parenthesized part of an expression has precedence over its adjacent un-parenthesized
parts.
 Ex:

(A + B) * C

4. Conditional Expressions
 Sometimes if-then-else statements are used to perform a conditional expression
assignment.

if (count == 0) average =
0; else average = sum /
count;

 In the C-based languages, this can be specified more conveniently in an assignment


statement using a conditional expressions. Note that ? is used in conditional expression
as a ternary operator (3 operands).

expression_1 ? expression_2 : expression_3

42
 Ex:

average = (count == 0) ? 0 : sum / count;


Operand evaluation order
 The process:
1. Variables: just fetch the value from memory.
2. Constants: sometimes a fetch from memory; sometimes the constant is in the
machine language instruction.
3. Parenthesized expressions: evaluate all operands and operators first.
• Side Effects
 A side effect of a function, called a functional side effect, occurs when the function
changes either one of its parameters or a global variable.
 Ex:

a + fun(a)

 If fun does not have the side effect of changing a, then the order of evaluation of the
two operands, a and fun(a), has no effect on the value of the expression.
 However, if fun changes a, there is an effect.
 Ex:
Consider the following situation: fun returns the value of its argument divided by 2
and changes its parameter to have the value 20, and:
a = 10;
b = a + fun(a);

 If the value of a is returned first (in the expression evaluation process), its value is
10 and the value of the expression is 15.
 But if the second is evaluated first, then the value of the first operand is 20 and the
value of the expression is 25.
 The following shows a C program which illustrate the same problem.

int a = 5; int
fun1() { a = 17;
return 3;
} void fun2() { a = a + fun1(); // C language a = 20;
Java a = 8
} void main() {
fun2();
}

ƒ The value computed for a in fun2 depends on the order of evaluation of the
operands in the expression a + fun1(). The value of a will be either 8 or 20.
ƒ Two possible solutions:

43
1. Write the language definition to disallow functional side effects o No two-
way parameters in functions. o No non-local references in functions.
o Advantage: it works!
o Disadvantage: Programmers want the flexibility of two-way parameters
(what about C?) and non-local references.
2. Write the language definition to demand that operand evaluation order be
fixed o Disadvantage: limits some compiler optimizations

Java guarantees that operands are evaluated in left-to-right order, eliminating this
problem. // C language a = 20; Java a = 8
Overloaded Operators
ƒ The use of an operator for more than one purpose is operator overloading.
ƒ Some are common (e.g., + for int and float).
ƒ Java uses + for addition and for string catenation.  Some are potential trouble (e.g.,
& in C and C++)
x = &y // as binary operator bitwise logical
// AND, as unary it is the address of y

– Causes the address of y to be placed in x.


– Some loss of readability to use the same symbol for two completely unrelated
operations.
– The simple keying error of leaving out the first operand for a bitwise AND
operation can go undetected by the compiler “difficult to diagnose”.
– Can be avoided by introduction of new symbols (e.g., Pascal’s div for integer
division and / for floating point division)
Type Conversions
• A narrowing conversion is one that converts an object to a type that cannot include all
of the values of the original type e.g., double to float.
• A widening conversion is one in which an object is converted to a type that can include
at least approximations to all of the values of the original type e.g., int to float.

Coercion in Expressions
• A mixed-mode expression is one that has operands of different types.
• A coercion is an implicit type conversion.
• The disadvantage of coercions:
 They decrease in the type error detection ability of the compiler
• In most languages, all numeric types are coerced in expressions, using widening
conversions
• Language are not in agreement on the issue of coercions in arithmetic expressions.
• Those against a broad range of coercions are concerned with the reliability problems
that can result from such coercions, because they eliminate the benefits of type
checking.
• Those who would rather include a wide range of coercions are more concerned with the
loss in flexibility that results from restrictions.

44
• The issue is whether programmers should be concerned with this category of errors or
whether the compiler should detect them.
• Java method Ex:

void mymethod() {
int a, b, c; float
d; … a = b * d;

}

ƒ Assume that the second operand was supposed to be c instead of d.


ƒ Because mixed-mode expressions are legal in Java, the compiler would not detect this
as an error. Simply, b will be coerced to float.
Explicit Type Conversions
ƒ Often called casts in C-based languages.
ƒ Ex: Ada:

FLOAT(INDEX)--INDEX is INTEGER type

Java:

(int)speed /*speed is float type*/

Errors in Expressions
• Caused by:
– Inherent limitations of arithmetic e.g. division by zero
– Limitations of computer arithmetic e.g. overflow or underflow
• Floating-point overflow and underflow, and division by zero are examples of run-time
errors, which are sometimes called exceptions.
Relational and Boolean Expressions
• A relational operator: an operator that compares the values of its tow operands.
• Relational Expressions: two operands and one relational operator.
• The value of a relational expression is Boolean, unless it is not a type included in the
language.
– Use relational operators and operands of various types.
– Operator symbols used vary somewhat among languages (!=, /=, .NE., <>, #)
• The syntax of the relational operators available in some common languages is as
follows:

Operation Ada C-Based Fortran 95

45
Languages
Equal = == .EQ. or ==
Not Equal /= != .NE. or <>
Greater than > > .GT. or >
Less than < < .LT. or <
Greater than or equal >= >= .GE. or >=
Less than or equal <= <= .LE. or >=

Boolean Expressions
• Operands are Boolean and the result is Boolean.

FORTRAN 77 FORTRAN 90 C Ada


.AND. and && and
.OR. or || or
.NOT. not ! not

• Versions of C prior to C99 have no Boolean type; it uses int type with 0 for false and
nonzero for true.
• One odd characteristic of C’s expressions:
a < b < c is a legal expression, but the result is not what you might expect.
• The left most operator is evaluated first because the relational operators of C, are left
associative, producing either 0 or 1.
• Then this result is compared with var c. There is never a comparison between b and c.
Short Circuit Evaluation

ƒ A short-circuit evaluation of an expression is one in which the result is determined


without evaluating all of the operands and/or operators.
ƒ Ex:

(13 * a) * (b/13 – 1) // is independent of the value


(b/13 – 1) if a = 0, because 0*x = 0.
ƒ So when a = 0, there is no need to evaluate (b/13 – 1) or perform the second
multiplication.
ƒ However, this shortcut is not easily detected during execution, so it is never taken.
ƒ The value of the Boolean expression:

(a >= 0) && (b < 10) // is independent of the second


expression if a < 0, because(F && x)
is False for all the values of x.

ƒ So when a < 0, there is no need to evaluate b, the constant 10, the second relational
expression, or the && operation.
ƒ Unlike the case of arithmetic expressions, this shortcut can be easily discovered during
execution.
46
ƒ Short-circuit evaluation exposes the potential problem of side effects in expressions

(a > b) || (b++ / 3) // b is changed only when a <= b.


ƒ If the programmer assumed b would change every time this expression is evaluated during
execution, the program will fail.
ƒ C, C++, and Java: use short-circuit evaluation for the usual Boolean operators (&& and ||),
but also provide bitwise Boolean operators that are not short circuit (& and |)
Assignment Statements
Simple Assignments
ƒ The C-based languages use == as the equality relational operator to avoid confusion with
their assignment operator.
ƒ The operator symbol for assignment:
1. = FORTRAN, BASIC, PL/I, C, C++, Java
2. := ALGOL, Pascal, Ada

Conditional Targets
 Ex:

flag ? count 1 : count2 = 0; ⇔ if (flag)


count1 = 0;
else
count2 = 0;
Compound Assignment Operators
ƒ A compound assignment operator is a shorthand method of specifying a commonly needed
form of assignment.
ƒ The form of assignment that can be abbreviated with this technique has the destination var
also appearing as the first operand in the expression on the right side, as in

a = a + b
ƒ The syntax of assignment operators that is the catenation of the desired binary operator to
the = operator.
sum += value; ⇔ sum = sum + value;

Unary Assignment Operators


ƒ C-based languages include two special unary operators that are actually abbreviated
assignments.
ƒ They combine increment and decrement operations with assignments.
ƒ The operators ++ and -- can be used either in expression or to form stand-alone single-
operator assignment statements. They can appear as prefix operators:

sum = ++ count; ⇔ count = count + 1; sum = count;


ƒ If the same operator is used as a postfix operator:

47
sum = count ++; ⇔ sum = count; count = count + 1;

Assignment as an Expression

ƒ This design treats the assignment operator much like any other binary operator, except that
it has the side effect of changing its left operand.
ƒ Ex:

while ((ch = getchar())!=EOF)


{…} // why ( ) around assignment?

ƒ The assignment statement must be parenthesized because the precedence of the assignment
operator is lower than that of the relational operators.
ƒ Disadvantage: Another kind of expression side effect which leads to expressions that are
difficult to read and understand. For example

a = b + (c = d / b++) – 1

denotes the instructions


Assign b to temp
Assign b + 1 to b
Assign d / temp to c
Assign b + c to temp
Assign temp – 1 to a

ƒ There is a loss of error detection in the C design of the assignment operation that frequently
leads to program errors.

if (x = y) …

instead of

if (x == y) …

Mixed-Mode Assignment
ƒ In FORTRAN, C, and C++, any numeric value can be assigned to any numeric scalar
variable; whatever conversion is necessary is done.
ƒ In Pascal, integers can be assigned to reals, but reals cannot be assigned to integers (the
programmer must specify whether the conversion from real to integer is truncated or
rounded.)
ƒ In Java, only widening assignment coercions are done.
ƒ In Ada, there is no assignment coercion.
ƒ In all languages that allow mixed-mode assignment, the coercion takes place only after the
right-side expression has been evaluated. For example, consider the following code:

48
int a, b;
float c;

c = a / b;

Because c is float, the values of a and b could be coerced to float before the division, which
could produce a different value for c than if the coercion were delayed (for example, if a
were 2 and b were 3).

Overloaded operators
In some programming languages an operator may be ad-hoc polymorphic, that is,
have definitions for more than one kind of data, (such as in Java where the + operator
is used both for the addition of numbers and for the concatenation of strings). Such
an operator is said to be overloaded. In languages that support operator overloading
by the programmer but have a limited set of operators, operator overloading is often
used to define customized uses for operators.

Short-Circuit Evaluation
Short-circuit evaluation, minimal evaluation, or McCarthy evaluation denotes the
semantics of some Boolean operators in some programming languages in which the
second argument is only executed or evaluated if the first argument does not suffice
to determine the value of the expression: when the first argument of the AND function
evaluates to false, the overall value must be false; and when the first argument of the
OR function evaluates to true, the overall value must be true. In some programming
languages (Lisp), the usual Boolean operators are short - circuit. In others (Java,
Ada), both short-circuit and standard Boolean operators are available.

49

You might also like