18.1.1 Centralized Systems
18.1.1 Centralized Systems
1 Centralized Systems
A modern, general-purpose computer system consists of one to a few CPUs
and a number of device controllers that are connected through a common bus that
provides access to shared memory (Figure 18.1). The CPUs have local cache
memories that store local copies of parts of the memory, to speed up access to data.
Each device controller is in charge of a speci.c type of device (for example, a disk
drive, an audio device, or a video display). The CPUs and the device controllers
can execute concurrently, competing for memory access. Cache memory reduces
the contention for memory access, since it reduces the number of times that the
CPU needs to access the shared memory.
We distinguish two ways in which computers are used: as single-user
systems
and as multiuser systems. Personal computers and workstations fall into the first
category.
A typical single-user system is a desktop unit used by a single person,
usually with only one CPU and one or two hard disks, and usually only one person
using the
machine at a time. A typical multiuser system, on the other hand, has more disks
and more memory, may have multiple CPUs and has a multiuser operating system.
It serves a large number of users who are connected to the system via
terminals. Database systems designed for use by single users usually do not
provide many of the facilities that a multiuser database provides. In particular, they
may not support concurrency control, which is not required when only a single
user can generate updates.
Provisions for crash-recovery in such systems are either absent or primitive–
for example, they may consist of simply making a backup of the database before
any update. Many such systems do not support SQL, and provide a simpler query
language, such as a variant of QBE. In contrast, database systems designed for
multiuser systems support the full transactional features that we have studied
earlier.
Although general-purpose computer systems today have multiple processors,
they have coarse-granularity parallelism, with only a few processors (about two
to four, typically), all sharing the main memory. Databases running on such
machines usually do not attempt to partition a single query among the processors;
instead, they run each query on a single processor, allowing multiple queries to run
concurrently. Thus, such systems support a higher throughput; that is, they allow a
greater number of transactions to run per second, although individual transactions
do not run any faster.
Databases designed for single-processor machines already provide
multitasking, allowing multiple processes to run on the same processor in a time-
shared manner, giving a view to the user of multiple processes running in parallel.
Thus, coarse granularity parallel machines logically appear to be identical to
single-processor machines, and database systems designed for time-shared
machines can be easily adapted to run on them.
In contrast, machines with .ne-granularity parallelism have a large number
of processors, and database systems running on such machines attempt to
parallelize single tasks (queries, for example) submitted by users.
The lock manager determines whether the lock can be granted immediately.
If the lock can be granted, the lock manager sends a message to that effect to the
site at which the lock request was initiated. Otherwise, the request is delayed until
it can be granted, at which time a message is sent to the site at which the lock
request was initiated. The transaction can read the data item from any one of the
sites at which a replica of the data item resides. In the case of a write, all the sites
where a replica of the data item resides must be involved in the writing.
Each site maintains a local lock manager whose function is to administer the
lock and unlock requests for those data items that are stored in that site. When a
transaction wishes to lock data item Q, which is not replicated and resides at site
Si, a message is sent to the lock manager at site Si requesting a lock (in a particular
lock mode). If data item Q is locked in an incompatible mode, then the request is
delayed until it can be granted. Once it has determined that the lock request can be
granted, the lock manager sends a message back to the initiator indicating that it
has granted the lock request.
There are several alternative ways of dealing with replication of data items.
3. Majority Protocol
The majority protocol works this way: If data item Q is replicated in n
different sites, then a lock-request message must be sent to more than one-half of
the n sites in which Q is stored. Each lock manager determines whether the lock
can be granted immediately (as far as it is concerned). As before, the response is
delayed until the request can be granted. The transaction does not operate on Q
until it has successfully obtained a lock on a majority of the replicas of Q.
4. Biased Protocol
The biased protocol is another approach to handling replication. The
difference from the majority protocol is that requests for shared locks are given
more favorable treatment than requests for exclusive locks.
Exclusive locks. When a transaction needs to lock data item Q, it requests a lock
on Q from the lock manager at all sites that contain a replica of Q.
19.5.2 Timestamping
The principal idea behind the timestamping scheme is that each transaction
is given a unique timestamp that the system uses in deciding the serialization
order. Our first task, then, in generalizing the centralized scheme to a distributed
There are two primary methods for generating unique timestamps, one
centralized and one distributed. In the centralized scheme, a single site distributes
the timestamps. The site can use a logical counter or its own local clock for this
purpose. In the distributed scheme, each site generates a unique local timestamp by
using either a logical counter or the local clock. We obtain the unique global
timestamp by concatenating the unique local timestamp with the site identifier,
which also must be unique (Figure 19.2). The order of concatenation is important!
We use the site identifier in the least significant position to ensure that the global
timestamps generated in one site are not always greater than those generated in
another site. Compare this technique for generating unique timestamps with the
one that we presented in Section 19.2.3 for generating unique names.
We may still have a problem if one site generates local timestamps at a rate
faster than that of the other sites. In such a case, the fast site’s logical counter will
be larger than that of other sites. Therefore, all timestamps generated by the fast
site will be larger than those generated by other sites. What we need is a
mechanism to ensure that local timestamps are generated fairly across the system.
We define within each site Si a logical clock (LCi), which generates the unique
local timestamp. The logical clock can be implemented as a counter that is
incremented after a new local timestamp is generated. To ensure that the various
logical clocks are synchronized, we require that a site Si advance its logical clock
whenever a transaction Ti with timestamp <x,y> visits that site and x is greater
than the current value of LCi. In this case, site Si advances its logical clock to the
value x + 1.