Slide 1-1: Abstract Data Types
Slide 1-1: Abstract Data Types
In this chapter we will look at the notion of abstract data types, which may be
regarded as an essential constituent of object-oriented modeling. In particular,
we will study the notion of data abstraction from a foundational perspective,
that is based on a mathematical description of types. We start this chapter
by discussing the notion of types as constraints. Then, we look at the (first
order) algebraic specification of abstract data types, and we explore the trade-
offs between the traditional implementation of abstract data types by employing
1
2 Abstract data types
Second order lambda calculus has been used to model information hiding and
the polymorphism supported by inheritance and templates. In the next chapter
we will study this approach in more detail.
In both approaches, the meaning of a type is (ultimately) a set of elements
satisfying certain restrictions. However, in a more abstract fashion, we may regard
a type as specifying a constraint. The better we specify the constraint, the
more tightly the corresponding set of elements will be defined (and hence the
smaller the set). A natural consequence of the idea of types as constraints is
to characterize types by means of logical formulas. This is the approach taken
by type theories based on constructive logic, in which the notion of formulas as
types plays an important role. Although we will not study type theories based
on constructive logic explicitly, our point of view is essentially to regard types
as constraints, ranging from purely syntactical constraints (as expressed in a
signature) to semantic constraints (as may be expressed in contracts).
From the perspective of types as constraints, a typing system may contribute
to a language framework guiding a system designer’s conceptualization and sup-
porting the verification (based on the formal properties of the types employed)
of the consistency of the descriptive information provided by the program. Such
an approach is to be preferred (both from a pragmatic and theoretical point of
view) to an ad hoc approach employing special annotations and support mecha-
nisms, since these may become quite complicated and easily lead to unexpected
interactions.
Formal models There is a wide variety of formal models available in the litera-
ture. These include algebraic models (to characterize the meaning of abstract
data types), models based on the lambda-calculus and its extensions (which
are primarily used for a type theoretical analysis of object-oriented language
constructs), algebraic process calculi (which may be used to characterize the
behavior of concurrent objects), operational and denotational semantic models
(to capture structural and behavioral properties of programs), and various spec-
ification languages based on first or higher-order logics (which may be used to
specify the desired behavior of collections of objects).
We will limit ourselves to studying algebraic models capturing the properties
Abstraction and types 5
of abstract data types and objects (section 1.2.4), type calculi based on typed
extensions of the lambda calculus capturing the various flavors of polymorphism
and subtyping (sections ??–??), and an operational semantic model characterizing
the behavior of objects sending messages (section ??).
Both the algebraic and type theoretical models are primarily intended to
clarify the means we have to express the desired behavior of objects and the
restrictions that must be adhered to when defining objects and their relations.
The operational characterization of object behavior, on the other hand, is intended
to give a more precise characterization of the notion of state and state changes
underlying the verification of object behavior by means of assertion logics.
Despite the numerous models introduced there are still numerous approaches
not covered here. One approach worth mentioning is the work based on the
pi-calculus. The pi-calculus is an extension of algebraic process calculi that allow
for communication via named channels. Moreover, the pi-calculus allows for a
notion of migration and the creation and renaming of channels. A semantics of
object-based languages based on the pi-calculus is given in [Walker90]. However,
this semantics does not cover inheritance or subtyping. A higher-order object-
oriented programming language based on the pi-calculus is presented in [PRT93].
Another approach of interest, also based on process calculi, is the object
calculus (OC) described in [Nier93]. OC allows for modeling the operational
semantics of concurrent objects. It merges the notions of agents, as used in
process calculi, with the notion of functions, as present in the lambda calculus.
For alternative models the reader may look in the comp.theory newsgroup to
which information concerning formal calculi for OOP is posted by Tom Mens of
the Free University, Brussels.
In general, the more expressive the type system the better the support that
the compiler may offer. In this respect, associating constructors with types may
help in relieving the programmer from dealing with simple but necessary tasks
such as the initialization of complex structures. Objects, in contrast to modules or
packages, allow for the automatic (compiler supported) initializations of instances
of (abstract) data types, providing the programmer with relief from an error-prone
routine.
Another area in which a type system may make the life of a programmer easier
concerns the association of operations with objects. A polymorphic type system
is needed to understand the automatic dispatching for virtual functions and the
opportunity of overloading functions, which are useful mechanisms to control the
complexity of a program, provided they are well understood.
Reuse and understanding are promoted by allowing inheritance and refinement
of description components. (As remarked earlier, inheritance and refinement may
be regarded as the essential contribution of object-oriented programming to the
practice of software development.) It goes without saying that such reuse needs
a firm semantical basis in order to achieve the goal of reliable and maintainable
software.
Another important issue for which a powerful type system can provide support
is the separation of specification and implementation. Naturally, we expect our
type system to support type-safe separate compilation. But in addition, we may
think of allowing multiple implementations of a single (abstract type) specifica-
tion. Explicit typing may then be of help in choosing the right binding when the
program is actually executed. For instance in a parallel environment, behavior
may be realized in a number of ways that differ in the degree to which they affect
locality of access and how they affect, for example, load balancing. With an eye
to the future, these are problems that may be solved with a good type system
(and accompanying compiler).
One of the desiderata for a type system for OOP, laid down in [DT88], is
the separation of a behavioral hierarchy (specifying the behavior of a type in an
abstract sense) and an implementation hierarchy (specifying the actual realization
of that behavior). Separation is needed to accommodate the need for multiple
realizations and to resolve the tension between subtyping and inheritance (a
Algebraic specification 7
adt bool is
functions
true : bool
false : bool
and, or : bool * bool -¿ bool
not : bool -¿ bool
axioms
[B1] and(true,x) = x
[B2] and(false,x) = false
[B3] not(true) = false
[B4] not(false) = true
[B5] or(x,y) = not(and(not(x),not(y)))
end
In this specification two constants are introduced (the zero-ary functions true
and false), three functions (respectively and, or and not). The or function is
defined by employing not and and, according to a well-known logical law. These
functions may all be considered to be (strictly) related to the type bool. Equations
are used to specify the desired characteristics of elements of type bool. Obviously,
8 Abstract data types
Abstract data types may be considered as modules specifying the values and
functions belonging to the type. In [Dahl92], a type T is characterized as a
tuple specifying the set of elements constituting the type T and the collection
of functions related to the type T. Since constants may be regarded as zero-ary
functions (having no arguments), we will speak of a signature Σ or ΣT defining
a particular type T. Also, in accord with common parlance, we will speak of the
sorts s ∈ Σ, which are the sorts (or types) occurring in the declaration of the
functions in Σ. See slide 1-6.
• f : s1 × . . . × sn →s
Functions – for T
• constants – c : →T C
• producers – g : s1 × . . . × sn →T P
• observers – f : T →si O
Type – generators
• ΣT = PT ∪ OT , CT ⊂ PT , PT ∩ OT = ∅
A signature specifies the names and (function) profiles of the constants and
functions of a data type. In general, the profile of a function is specified as
• f : s1 × . . . × sn →s
where si (i = 1..n) are the sorts defining the domain (that is the types of the
arguments) of the function f, and s is the sort defining the codomain (or result
type) of f. In the case n = 0 the function f may be regarded as a constant. More
generally, when s1 , . . . , sn are all unrelated to the type T being defined, we may
regard f as a relative constant. Relative constants are values that are assumed to
be defined in the context where the specification is being employed.
The functions related to a data type T may be discriminated according to
their role in defining T. We distinguish between producers g ∈ PT , that have
the type T under definition as their result type, and observers f ∈ OT , that
have T as their argument type and deliver a result of a type different from T. In
other words, producer functions define how elements of T may be constructed. (In
the literature one often speaks of constructors, but we avoid this term because
it already has a precisely defined meaning in the object-oriented programming
language C++.) In contrast, observer functions do not produce values of T, but
give instead information on some particular aspect of T.
The signature ΣT of a type T is uniquely defined by the union of producer
functions PT and observer functions OT . Constants of type T are regarded as
a subset of the producer functions PT defining T. Further, we require that the
collection of producers is disjoint from the collection of observers for T, that is
PT ∩ OT = ∅.
Generators The producer functions actually defining the values of a data type
T are called the generator basis of T, or generators of T. The generators of T may
be used to enumerate the elements of T, resulting in the collection of T values
that is called the generator universe in [Dahl92]. See slide 1-7.
• generator basis – GT = {g ∈ PT }
• generator universe – GUT = {v1 , v2 , . . .}
Examples
• GBool = {t, f }, GUBool = {t, f }
• GNat = {0, S }, GUNat = {0, S 0, SS 0, . . .}
• GSetA = {∅, add}, GUSetA = {∅, add(∅, a), . . .}
the value domain of Bool, the generator universe GUBool consists only of the
values t and f.
As another example, consider the data type Nat (representing the natural
numbers) with generator basis GNat = {0, S }, consisting of the constant 0 and
the successor function S : Nat→Nat (that delivers the successor of its argument).
The terms that may be constructed by GNat is the set GUNat = {0, S 0, SS 0, . . .},
which uniquely corresponds to the natural numbers {0, 1, 2, . . .}. (More precisely,
the natural numbers are isomorphic with GUNat .)
In contrast, given a type A with element a, b, ..., the generators of SetA result
in a universe that contains terms such as add (∅, a) and add (add (∅, a), a) which
we would like to identify, based on our conception of a set as containing only
one exemplar of a particular value. To effect this we need additional equations
imposing constraints expressing what we consider as the desired shape (or normal
form) of the values contained in the universe of T. However, before we look at
how to extend a signature Σ defining T with equations defining the (behavioral)
properties of T we will look at another example illustrating how the choice of a
generator basis may affect the structure of the value domain of a data type.
In the example presented in slide 1-8, the profiles are given of the functions
that may occur in the signature specifying sequences. (The notation is used to
indicate parameter positions.)
Sequences 1-8
Seq
• ε : seqT empty
• B : seqT × T →seqT right append
• C : T × seqT →seqT left append
• · : seqT × seqT →seqT concatenation
• h i : T →seqT lifting
• h , . . . , i : T n →seqT multiple arguments
require our specification to be first-order and finite, infinite generator bases (such
as G 000 ) must be disallowed, even if they result in a one-to-one correspondence.
See [Dahl92] for further details.
The specification of the signature of a type (which lists the syntactic constraints to
which a specification must comply) is in general not sufficient to characterize the
properties of the values of the type. In addition, we need to impose semantic
constraints (in the form of equations) to define the meaning of the observer
functions and (very importantly) to identify the elements of the type domain
that are considered equivalent (based on the intuitions one has of that particular
type).
• x =x reflexivity
• x =y⇒y=x symmetry
• x =y ∧y =z ⇒x =z transitivity
• x = y ⇒ f (. . . , x , . . .) = f (. . . , y, . . .)
functions
0 : Nat
S : Nat -¿ Nat
mul : Nat * Nat -¿ Nat
plus : Nat * Nat -¿ Nat
axioms
[1] plus(x,0) = x
[2] plus(x,Sy) = S(plus(x,y))
[3] mul(x,0) = 0
[4] mul(x,Sy) = plus(mul(x,y),x)
end
1-11
mul(plus(S 0,S 0),S 0) -[2]-¿
mul(S(plus(S 0,0)), S 0) -[1]-¿
mul(SS 0,S 0) -[4]-¿
plus(mul(SS0,0),SS0) -[3]-¿
plus(0,SS0) -[2*]-¿ SS0
Axioms
[S1] add (add (s, x ), y) = add (add (s, y), x ) commutativ-
ity
[S2] add (add (s, x ), x ) = add (s, x ) idempotence
In the case of sets we have the problem that we do not start with a one-to-one
generator base as we had with the natural numbers. Instead, we have a many-
14 Abstract data types
• {∅}
• {add(0, a), add(add(0, a), a), . . .}
• ...
• {add(add(0, a), b), add(add(0, b), a), . . .}
• Σ-algebra – A = ({As }s ∈ S , Σ)
• interpretation – eval : TΣ →A
• adequacy – A |= t1 = t2 ⇐⇒ E ` t1 = t2
Booleans 1-15
Natural numbers
• N = (N, {++, +, ?})
• evalN : TNat →N = {S 7→ ++, mul 7→ ?, plus 7→ +}
The structure B given above is simply a boolean algebra, with the operators
¬, ∧ and ∨. The functions not, and and or naturally map to their semantic
counterparts. In addition, we assume that the constants true and false map to
the elements tt and ff.
As another example, look at the structure N and the interpretation evalN ,
which maps the functions S, mul and plus specified in Nat in a natural way.
However, since we have also given equations for Nat (specifying how to eliminate
the functions mul and plus) we must take precautions such that the requirement
• ΣE -algebra – M = (TΣ / ∼, Σ/ ∼)
Properties
• no junk – ∀ a : TΣ / ∼ ∃ t • evalM (t) = a
• no confusion – M |= t1 = t2 ⇐⇒ E ` t1 = t2
The starting point for the construction of an initial model for a given specifica-
tion with signature Σ is to construct a term algebra TΣ with the terms that may
be generated from the signature Σ as elements. The next step is then to factor the
universe of generated terms into equivalence classes, such that two terms belong
to the same class if they can be proven equivalent with respect to the equational
theory of the specification. We will denote the representative of the equivalence
class to which a term t belongs by [t]. Hence t1 = t2 (in the model) iff [t1 ] = [t2 ].
So assume that we have constructed a structure M = (TΣ / ∼, Σ) then;
finally, we must define an interpretation, say evalM : TΣ →M, that assigns closed
terms to appropriate terms in the term model (namely the representatives of the
equivalence class of that term). Hence, the interpretation of a function f in the
structure M is such that
fM ([t1 ], . . . , [tn ]) = [f (t1 , . . . , tn )]
Example Consider the specification of Bool as given before. For this specification
we have given the structure B and the interpretation evalB which defines an initial
model for Bool. (Check this!)
functions
new : stack;
push : element * stack -¿ stack;
empty : stack -¿ boolean;
pop : stack -¿ stack;
top : stack -¿ element;
axioms
empty( new ) = true
empty( push(x,s) ) = false
top( push(x,s) ) = x
pop( push(x,s) ) = s
preconditions
pre: pop( s : stack ) = not empty(s)
pre: top( s : stack ) = not empty(s)
end
the object itself (as in the case of a stack) is the object account, as specified in
slide 1-19. The example is taken from [Goguen].
object account is
functions
bal : account -¿ money
methods
credit : account * money -¿ account
debit : account * money -¿ account
error
overdraw : money -¿ money
axioms
bal(new(A)) = 0
bal(credit(A,M)) = bal(A) + M
bal(debit(A,M)) = bal(A) - M if bal(A) ¿= M
error-axioms
bal(debit(A,M)) = overdraw(M) if bal(A) ¡ M
end
An account object has one attribute function (called bal) that delivers the
amount of money that is (still) in the account. In addition, there are two method
functions, credit and debit that may respectively be used to add or withdraw
money from the account. Finally, there is one special error function, overdraw,
that is used to define the result of balance when there is not enough money left to
grant a debit request. Error axioms are needed whenever the proper axioms are
stated conditionally, that is contain an if expression. The conditional parts of the
axioms, including the error axioms, must cover all possible cases.
Now, first look at the form of the axioms. The axioms are specified as
fn(method (Object, Args)) = expr
where fn specifies an attribute function (bal in the case of account) and method
a method (either new, which is used to create new accounts, credit or debit). By
convention, we assume that method (Object, . . .) = Object, that is that a method
function returns its first argument. Applying a method thus results in redefining
the value of the function fn. For example, invoking the method credit(acc, 10)
for the account acc results in modifying the function bal to deliver the value
bal (acc) + 10 instead of simply bal (acc). In the example above, the axioms define
the meaning of the function bal with respect to the possible method applications.
It is not difficult to see that these operations are of a non-applicative nature, non-
applicative in the sense that each time a method is invoked the actual definition
20 Abstract data types
of bal is changed. The change is necessary because, in contrast to, for example,
the functions employed in a boolean algebra, the actual value of the account may
change in time in a completely arbitrary way. A first order framework of (multi
sorted) algebras is not sufficiently strong to define the meaning of such changes.
What we need may be characterized as a multiple world semantics, where each
world corresponds to a possible state of the account. As an alternative semantics
we will also discuss the interpretation of an object as an abstract machine, which
resembles an (initial) algebra with hidden sorts.
The first rule (attribute) describes how attribute functions are evaluated.
Whenever a function f with arguments t1 , . . . , tn evaluates to a value (or expres-
sion) v, then the term f (t1 , . . . , tn ) may be replaced by v without affecting the
database D. (We have simplified the treatment by omitting all aspects having to do
with matching and substitutions, since such details are not needed to understand
the process of symbolic evaluation in a multiple world context.) The next rule
(method) describes the result of evaluating a method. We assume that invoking
the method changes the database D into D 0 . Recall that, by convention, a method
Algebraic specification 21
returns its first argument. Finally, the last rule (composition) describes how we
may glue all this together.
No doubt, the reader needs an example to get a picture of how this machinery
actually works.
In slide 1-21, we have specified a simple object ctr with an attribute function
value (delivering the value of the counter) and a method function incr (that may
be used to increment the value of the counter).
¡n(incr(incr(new(C)))),{ C }¿ -[new]-¿
¡n(incr(incr(C))),{ C[n:=0] }¿ -[incr]-¿
¡n(incr(C)),{ C[n:=1] }¿ -[incr]-¿
¡n(C), { C[n:=2] }¿ -[n]-¿
¡2, { C[n:=2] }¿
The end result of the evaluation depicted in slide 1-22 is the value 2 and a
context (or database) in which the value of the counter C is (also) 2. The database
is modified in each step in which the method incr is applied. When the attribute
function value is evaluated the database remains unchanged, since it is merely
consulted.
Recall that an initial algebra semantics defines a model in which the elements
are equivalence classes representing the abstract values of the data type. In effect,
initial models are defined only up to isomorphism (that is, structural equivalence
with similar models). In essence, the framework of initial algebra semantics allows
us to abstract from the particular representation of a data type, when assigning
meaning to a specification. From this perspective it does not matter, for example,
whether integers are represented in binary or decimal notation.
The notion of abstract machines generalizes the notion of initial algebras in
that it loosens the requirement of (structural) isomorphism, to allow for what
we may call behavioral equivalence. The idea underlying the notion of behavioral
equivalence is to make a distinction between visible sorts and hidden sorts and
to look only at the visible sorts to determine whether two algebras A and B
are behaviorally equivalent. According to [Goguen], two algebras A and B are
behaviorally equivalent if and only if the result of evaluating any expression of a
visible sort in A is the same as the result of evaluating that expression in B.
Now, an abstract machine (in the sense of Goguen and Meseguer, 1986) is
simply the equivalence class of behaviorally equivalent algebras, or in other words
the maximally abstract characterization of the visible behavior of an abstract data
type with (hidden) states.
The notion of abstract machines is of particular relevance as a formal frame-
work to characterize the (implementation) refinement relation between objects.
For example, it is easy to determine that the behavior of a stack implemented
as a list is equivalent to the behavior of a stack implemented by a pointer array,
whereas these objects are clearly not equivalent from a structural point of view.
Moreover, the behavior of both conform (in an abstract sense) with the behavior
specified in an algebraic way. Together, the notions of abstract machine and
behavioral equivalence provide a formalization of the notion of information hiding
in an algebraic setting. In the chapters that follow we will look at alternative for-
malisms to explain information hiding, polymorphism and behavioral refinement.