Relational Databases and Beyond
Relational Databases and Beyond
373
M F Worboys
network database management systems. The example the relational model, and data models whose
underlying model for these systems is navigational, primary roles are to represent the meaning of the
that is connections between records are made by application domains as closely as possible (so-called
navigating explicit relationships between them. These semantic data models, of which entity-relationship
relationships were ‘hard-wired’ into the database, thus modelling is an example: see also Martin, Chapter 6;
limiting the degree to which such databases could be Raper, Chapter 5). It might be that a semantic data
extended or distributed to other groups of users. model is used to develop applications for a database
The acknowledged founder of relational database system designed around another model: the
technology is Ted Codd, who in a pioneering paper prototypical example of this is the use of the entity-
(Codd 1970) set out the framework of the relational relationship model to develop relational database
model. The 1970s saw the advent of relatively easy- applications. The three currently most important data
to-use relational database languages such as the modelling approaches are record-based, object-based
Structured Query Language, SQL, originally called and object-relational.
the Structured English Query Language, SEQUEL
(Chamberlin and Boyce 1974) and Query Language, 1.4 Human database interaction
QUEL (Held et al 1975), as well as prototype
relational systems such as IBM’s System R Humans need to interact with database systems to
(Astrahan et al 1976) and University of California at perform the following broad types of task:
Berkeley’s Interactive Graphics and Retrieval 1 Data definition: description of the conceptual and
Systems, INGRES (Stonebraker et al 1976). logical organisation of the database, the database
From the latter part of the 1970s, shortcomings schema;
of the relational model began to become apparent 2 Storage definition: description of the physical
for particular applications, including GIS. Codd structure of the database, for example file
himself provided extensions to incorporate more location and indexing methods;
semantics (Codd 1979). Object-oriented notions 3 Database administration: daily operation of the
were introduced from programming languages into database;
databases, culminating in prototype object-oriented 4 Data manipulation: insertion, modification,
database systems, such as O2 (Deux 1990) and retrieval, and deletion of data from the database.
ORION (Kim et al 1990). Today, object-oriented
systems are well established in the marketplace, as The first three of these tasks are most likely to be
are object-oriented extensions of relational systems, performed by the database professional, while the
which may be where the future really is. Early fourth will be required by a variety of user types
developments in object-relational systems are possessing a range of skills and experience as well as
described in Haas et al (1990) and Stonebraker variable needs requirements in terms of frequency
(1986). SQL has developed into the international and flexibility of access.
standard SQL-92, and SQL3 is being developed. User interfaces are designed to be flexible enough
to handle this variety of usage. Standard methods for
making interfaces more natural to users include
1.3 Data models menus, forms, and graphics (windows, icons, mice: see
The data model provides a collection of constructs Egenhofer and Kuhn, Chapter 28; Martin 1996).
for describing and structuring applications in the Natural language would be an appropriate means of
database. Its purpose is to provide a common communication between human and database, but
computationally meaningful medium for use by successful interfaces based on natural language have
system developers and users. For developers, the not yet been achieved. For spatial data, the graphical
data model provides a means to represent the user interface (GUI) is of course highly appropriate.
application domain in terms that may be translated Specialised query languages for database interaction
into a design and implementation of the system. For have been devised.
the users, it provides a description of the structure of
the system, independent of specific items of data or 1.5 Database management
details of the particular implementation.
A clear distinction should be made between data The software system driving a database is called the
models upon which database systems are built, for database management system (DBMS). Figure 1
374
Relational databases and beyond
shows schematically the place of some of these The logical atom of interaction with a database is
components in the processing of an interactive query, the transaction, broadly classified as create, modify
or an application program that contains within the (update), and delete. Transactions are either executed
host general-purpose programming language some in their entirety (committed) or not at all (rollback
database access commands. The DBMS has a query to previous commit). The sequence of operations
compiler that will parse and analyse a query and, if contained in transactions is maintained in a system
all is correct, generate execution code that is passed to log or journal, hence the ability of the DBMS to
the runtime database processor. Along the way, the roll back. When a ‘commit’ is reached, all changes
compiler may call the query optimiser to optimise the since the last commit point are then made
code so that performance on the retrieval is permanent in the database. Thus, a transaction may
improved. If the database language expression had be thought of as a unit of recovery. The DBMS
been embedded in a general-purpose computer seeks to maintain the so-called ACID properties of
language such as C++, then an earlier precompiler transactions: Atomicity (all-or-nothing),
stage would be needed. To retrieve the required data Consistency (of the database), Isolation (having no
from the database, mappings must be made between side-effects and unforeseen effects on other
the high-level objects in the query language statement concurrent transactions), and Durability (ability to
and the physical location of the data on the storage survive even after system crash).
device. These mappings are made using the system
catalogue. Access to DBMS data is handled by the
stored data manager, which calls the operating system 2 RECORD-BASED DATA MODELS:
for control of physical access to storage devices. RELATIONAL DATABASES
2.1 Introduction to the relational model
Interactive Application A record-based model structures the database as a
query program collection of files of fixed-format records. The
records in a file are all of the same record type,
Host containing a fixed set of fields (attributes). The early
Precompiler language
network and hierarchical database systems,
compiler
mentioned earlier, conform to the record-based data
model. However, they proved to be too closely linked
Query to physical implementation details, and they have
compiler been largely superseded by the relational model.
A relational database is a collection of tabular
relations, each having a set of attributes. The data in a
Run-time relation are structured as a set of rows. A row, or
database System
tuple, consists of a list of values, one for each
processor catalogue
attribute. An attribute has associated with it a
domain, from which its values are drawn. Most
Stored current systems require that values are atomic – for
data example they cannot be decomposed as lists of
manager further values – so a single cell in a relation cannot
Concurrency contain a set, list or array of values. This limits the
control/backup/ possibilities of the pure relational model for GIS.
recovery units A distinction is made between a relation schema,
which does not include the data but gives the
structure of the relation (its attributes, their
corresponding domains, and any constraints on the
Stored data data) and a relation, which includes the data. The
relation schema is usually declared when the
Fig 1. DBMS components used to process user queries. database is set up and then remains relatively
375
M F Worboys
Table 1 Tuples from the Country relation. Table 3 Tuples from the Country relation after a project operation.
Table 2 Tuples from the City relation. Table 4 Tuples from the City relation after a restrict operation.
unaltered during the lifespan of the system. A Austria 8 32 Vienna Austria 1500
relation, however, will typically be changing Germany 81 138 Berlin Germany 3400
frequently as data are inserted, modified and Italy 58 116 Rome Italy 2800
deleted. A database schema is a set of relation France 58 210 Paris France 2100
schemata and a relational database is a set of Switzerland 7 16 Bern Switzerland 100
376
Relational databases and beyond
means of direct interaction with the database, or restrict condition. Relational joins are effected by
may be embedded in a general-purpose allowing more than one relation (or even the same
programming language. The most recent SQL relation called twice with different names) in the
standard is SQL-92 (also called SQL2: ISO 1992 ). FROM clause. For example, to find names of
There is a large effort to move forward to SQL3. countries whose capitals have a population less than
two million people, use the expression:
2.2.1 Schema definition using SQL
The data definition language component of SQL SELECT Country.Name
FROM Country, City
allows the creation, alteration, and deletion of
WHERE Country.Capital = City. Name
relation schemata. It is usual that a relation schema
AND City. Population < 2000000
is altered only rarely once the database is
operational. A relation schema provides a set of In this case, the first part of the WHERE clause
attributes, each with its associated data domain. provides the join condition by specifying that tuples
SQL allows the definition of a domain by means of from the two tables are to be combined only when
a CREATE DOMAIN expression. the values of the attributes Capital in Country and
A relation schema is created by a CREATE Name in City are equal. Attributes are qualified by
TABLE command as a set of attributes, each prefixing the relation name in case of any ambiguity.
associated with a domain, with additional properties Most of the features of SQL have been omitted
relating to keys and integrity constraints. For from this very brief summary. The documentation
example, the relation schema City may be created by on the SQL2 standard is about 600 pages in length.
the command: The reader is referred to Date (1995) for a good
survey of the relational model and SQL2.
CREATE TABLE City
(Name PlaceName,
Country PlaceName, 2.3 Relational technology for geographical
Population Population, information
PRIMARY KEY (Name)
There are essentially two ways of managing spatial
This statement begins by naming the relation schema
data with relational technology: putting all the data
(called a table in SQL) as City. The attributes are
(spatial and non-spatial) in the relational database
then defined by giving each its name and associated
(integrated approach), or separating the spatial from
domain (assuming that we have already created
the non-spatial data (hybrid approach). The benefits
domains PlaceName and Population). The primary
of using an integrated architecture are considerable,
key, which serves to identify a tuple uniquely, is next
allowing a uniform treatment of all data by the
given as the attribute Name. There are also SQL
DBMS, and thus not consigning the spatial data to a
commands to alter a relation schema by changing
less sheltered existence outside the database, where
attributes or integrity constraints and to delete a
integrity, concurrency, and security may not be so
relation schema.
rigorously enforced. In theory, the integrated
approach is perfectly possible: for example,
2.2.2 Data manipulation using SQL
Roessel (1987) provides a relational model of
Having defined the schemata and inserted data into
configurations of nodes, arcs, and polygons.
the relations, the next step is to retrieve data. A
However, in practice the pure relational geospatial
simple example of SQL data retrieval resulting in the
model has not up to now been widely adopted
relation in Table 4 is:
because of unacceptable performance (Healey 1991).
SELECT * Essentially, problems arise because of:
FROM City
1 slow retrieval due to multiple joins required of
WHERE Population > 2000000
spatial data in relations;
The SELECT clause indicates the attribute to be 2 inappropriate indexes and access methods, which
retrieved from the City relation (* indicates all are provided primarily for 1-dimensional data
attributes), while the WHERE clause provides the types by general-purpose relational systems;
377
M F Worboys
Capital of
models allow much more flexibility in declaring
has city
indexes for different types of data. For the third
problem, the limitations of SQL have been apparent
for some time in a number of fields (for example,
CAD/CAM, GIS, multimedia databases, office
City
information systems, and text databases). SQL3,
currently being developed as a standard, promises
much in this respect. Name Population
378
Relational databases and beyond
request for an operation. The request invokes the that object to execute a method in response. This
operation that defines some service to be performed.’ highly active environment is another feature that
The code associated with a collection of data in an distinguishes between OO and ERA, which is
object provides a set of methods that can be essentially a collection of passive data. An object has
performed upon it. As well as executing methods on both state, being the values of the instance variables
its own data, an object may as part of one of its within it, and behaviour, being the potential for
methods send a message to another object, causing acting upon objects (including itself). Objects with
the same types of instance variables and methods are
said to be in the same object class. Figure 3 shows
some instance variables and methods associated with
Country classes Country and Polygon and the manner in
Population: Integer
which the class Polygon is referenced as an instance
Name: String variable by the class Country. Figure 4 shows in
Capital city: City schematic form an object encapsulating state and
Extent: Polygon methods, receiving a message from another object
Fig 3. Part of the class descriptions for Country and Polygon. Fig 4. State, methods, and messages of an object.
379
M F Worboys
and executing methods which result in two messages As an illustration of some of these constructs,
output. Figure 5 shows the interaction of several Figure 6 shows the object class Country (as an
objects in response to a message to one of them. abstract object class, represented as a triangle) with
With encapsulation, the internal workings of an three of its instance variables Name, Population,
object are transparent to users and other objects, and Area. Variables Name and Population reference
which can communicate with it only through a set of printable object class Character String (represented
predefined message types that the object can as an oval) and Area references abstract class
understand and handle. To take an example from the Polygon. The class Polygon has instance variable
real world, I usually do not care about the state of Boundary referencing an association of class
my car under the bonnet (internal state of object Segment (the association class shown in the figure as
class Car) provided that when I put my foot on the a star and circle). Each segment has a Begin and
accelerator (send message) the car’s speed increases End Point, and each Point has a Position which is an
(change in the internal state leading to a change in aggregation (shown as a cross and circle) of
the observable properties of the object). From the printable classes X-coordinate and Y-coordinate.
viewpoint external to the object, it is only its
observable properties that are usually of interest. 3.3 Object-oriented database management systems
380
Relational databases and beyond
dary
Boun
Area
Polygon
Country
Segment
Population
Name
Begi n
End
Character string
Point
io
n
Posit
X-coordinate Y-coordinate
381
M F Worboys
382
Relational databases and beyond
383
M F Worboys
384