0% found this document useful (0 votes)
79 views7 pages

18.1.1 Centralized Systems

1. Centralized computer systems have one to a few CPUs connected through a shared bus and memory. Each device controller handles a specific type of device like disk drives. CPUs and controllers can execute concurrently competing for memory access. 2. Systems can be single-user with one CPU and user, or multi-user with more resources and operating system supporting many simultaneous users through terminals. 3. Modern systems have multiple CPUs but coarse-grained parallelism, running queries on one CPU at a time but allowing multiple queries concurrently. Fine-grained parallel systems distribute single queries across many CPUs.

Uploaded by

Nio Janshali
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views7 pages

18.1.1 Centralized Systems

1. Centralized computer systems have one to a few CPUs connected through a shared bus and memory. Each device controller handles a specific type of device like disk drives. CPUs and controllers can execute concurrently competing for memory access. 2. Systems can be single-user with one CPU and user, or multi-user with more resources and operating system supporting many simultaneous users through terminals. 3. Modern systems have multiple CPUs but coarse-grained parallelism, running queries on one CPU at a time but allowing multiple queries concurrently. Fine-grained parallel systems distribute single queries across many CPUs.

Uploaded by

Nio Janshali
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

18.1.

1 Centralized Systems
A modern, general-purpose computer system consists of one to a few CPUs
and a number of device controllers that are connected through a common bus that
provides access to shared memory (Figure 18.1). The CPUs have local cache
memories that store local copies of parts of the memory, to speed up access to data.
Each device controller is in charge of a speci.c type of device (for example, a disk
drive, an audio device, or a video display). The CPUs and the device controllers
can execute concurrently, competing for memory access. Cache memory reduces
the contention for memory access, since it reduces the number of times that the
CPU needs to access the shared memory.
We distinguish two ways in which computers are used: as single-user
systems
and as multiuser systems. Personal computers and workstations fall into the first
category.
A typical single-user system is a desktop unit used by a single person,
usually with only one CPU and one or two hard disks, and usually only one person
using the
machine at a time. A typical multiuser system, on the other hand, has more disks
and more memory, may have multiple CPUs and has a multiuser operating system.
It serves a large number of users who are connected to the system via
terminals. Database systems designed for use by single users usually do not
provide many of the facilities that a multiuser database provides. In particular, they
may not support concurrency control, which is not required when only a single
user can generate updates.
Provisions for crash-recovery in such systems are either absent or primitive–
for example, they may consist of simply making a backup of the database before
any update. Many such systems do not support SQL, and provide a simpler query
language, such as a variant of QBE. In contrast, database systems designed for
multiuser systems support the full transactional features that we have studied
earlier.
Although general-purpose computer systems today have multiple processors,
they have coarse-granularity parallelism, with only a few processors (about two
to four, typically), all sharing the main memory. Databases running on such
machines usually do not attempt to partition a single query among the processors;
instead, they run each query on a single processor, allowing multiple queries to run
concurrently. Thus, such systems support a higher throughput; that is, they allow a
greater number of transactions to run per second, although individual transactions
do not run any faster.
Databases designed for single-processor machines already provide
multitasking, allowing multiple processes to run on the same processor in a time-
shared manner, giving a view to the user of multiple processes running in parallel.
Thus, coarse granularity parallel machines logically appear to be identical to
single-processor machines, and database systems designed for time-shared
machines can be easily adapted to run on them.
In contrast, machines with .ne-granularity parallelism have a large number
of processors, and database systems running on such machines attempt to
parallelize single tasks (queries, for example) submitted by users.

19.5.1 Locking Protocols


The various locking protocols can be used in a distributed environment. The
only change that needs to be incorporated is in the way the lock manager deals
with replicated data. We present several possible schemes that are applicable to an
environment where data can be replicated in several sites. We shall assume the
existence of the shared and exclusive lock modes.
1. Single Lock-Manager Approach

In the single lock-manager approach, the system maintains a single lock


manager that resides in a single chosen site—say Si. All lock and unlock requests
are made at site Si. When a transaction needs to lock a data item, it sends a lock
request to Si.

The lock manager determines whether the lock can be granted immediately.
If the lock can be granted, the lock manager sends a message to that effect to the
site at which the lock request was initiated. Otherwise, the request is delayed until
it can be granted, at which time a message is sent to the site at which the lock
request was initiated. The transaction can read the data item from any one of the
sites at which a replica of the data item resides. In the case of a write, all the sites
where a replica of the data item resides must be involved in the writing.

The scheme has these advantages:


Simple implementation. This scheme requires two messages for handling
lock requests, and one message for handling unlock requests.
Simple deadlock handling. Since all lock and unlock requests are made at one
site, the deadlock-handling algorithms can be applied directly to this environment.

The disadvantages of the scheme are:


Bottleneck. The site Si becomes a bottleneck, since all requests must be processed
there.
Vulnerability. If the site Si fails, the concurrency controller is lost. Either
processing must stop, or a recovery scheme must be used so that a backup site can
take over lock management from Si

2. Distributed Lock Manager


A compromise between the advantages and disadvantages can be achieved
through the distributed lock-manager approach, in which the lock-manager
function is distributed over several sites.

Each site maintains a local lock manager whose function is to administer the
lock and unlock requests for those data items that are stored in that site. When a
transaction wishes to lock data item Q, which is not replicated and resides at site
Si, a message is sent to the lock manager at site Si requesting a lock (in a particular
lock mode). If data item Q is locked in an incompatible mode, then the request is
delayed until it can be granted. Once it has determined that the lock request can be
granted, the lock manager sends a message back to the initiator indicating that it
has granted the lock request.

There are several alternative ways of dealing with replication of data items.

The distributed lock manager scheme has the advantage of simple


implementation, and reduces the degree to which the coordinator is a bottleneck. It
has a reasonably low overhead, requiring two message transfers for handling lock
requests, and one message transfer for handling unlock requests. However,
deadlock handling is more complex, since the lock and unlock requests are no
longer made at a single site:

Theremay be intersite deadlocks even when there is no deadlock within a


single site.

3. Majority Protocol
The majority protocol works this way: If data item Q is replicated in n
different sites, then a lock-request message must be sent to more than one-half of
the n sites in which Q is stored. Each lock manager determines whether the lock
can be granted immediately (as far as it is concerned). As before, the response is
delayed until the request can be granted. The transaction does not operate on Q
until it has successfully obtained a lock on a majority of the replicas of Q.

This scheme deals with replicated data in a decentralized manner, thus


avoiding the drawbacks of central control. However, it suffers from these
disadvantages:

Implementation. The majority protocol is more complicated to implement


than are the previous schemes. It requires 2(n/2 + 1) messages for handling
lock requests, and (n/2 + 1) messages for handling unlock requests.

Deadlock handling. In addition to the problem of global deadlocks due to


the use of a distributed lock-manager approach, it is possible for a deadlock
to occur even if only one data item is being locked. As an illustration, consider
a system with four sites and full replication. Suppose that transactions T1 and
T2 wish to lock data item Q in exclusive mode. Transaction T1 may succeed
in locking Q at sites S1 and S3, while transaction T2 may succeed in locking
Q at sites S2 and S4. Each then must wait to acquire the third lock; hence, a
deadlock has occurred. Luckily, we can avoid such deadlocks with relative
ease, by requiring all sites to request locks on the replicas of a data item in the
same predetermined order.

4. Biased Protocol
The biased protocol is another approach to handling replication. The
difference from the majority protocol is that requests for shared locks are given
more favorable treatment than requests for exclusive locks.

Shared locks.When a transaction needs to lock data item Q, it simply requests


a lock on Q from the lock manager at one site that contains a replica of Q.

Exclusive locks. When a transaction needs to lock data item Q, it requests a lock
on Q from the lock manager at all sites that contain a replica of Q.

As before, the response to the request is delayed until it can be granted.


The biased scheme has the advantage of imposing less overhead on read operations
than does the majority protocol. This savings is especially significant in common
cases in which the frequency of read is much greater than the frequency of write.
However, the additional overhead on writes is a disadvantage. Furthermore, the
biased protocol shares the majority protocol’s disadvantage of complexity in
handling deadlock.

19.5.2 Timestamping
The principal idea behind the timestamping scheme is that each transaction
is given a unique timestamp that the system uses in deciding the serialization
order. Our first task, then, in generalizing the centralized scheme to a distributed

scheme is to develop a scheme for generating unique timestamps. Then, the


various protocols can operate directly to the nonreplicated environment.

There are two primary methods for generating unique timestamps, one
centralized and one distributed. In the centralized scheme, a single site distributes
the timestamps. The site can use a logical counter or its own local clock for this
purpose. In the distributed scheme, each site generates a unique local timestamp by
using either a logical counter or the local clock. We obtain the unique global
timestamp by concatenating the unique local timestamp with the site identifier,
which also must be unique (Figure 19.2). The order of concatenation is important!
We use the site identifier in the least significant position to ensure that the global
timestamps generated in one site are not always greater than those generated in
another site. Compare this technique for generating unique timestamps with the
one that we presented in Section 19.2.3 for generating unique names.

We may still have a problem if one site generates local timestamps at a rate
faster than that of the other sites. In such a case, the fast site’s logical counter will
be larger than that of other sites. Therefore, all timestamps generated by the fast
site will be larger than those generated by other sites. What we need is a
mechanism to ensure that local timestamps are generated fairly across the system.
We define within each site Si a logical clock (LCi), which generates the unique
local timestamp. The logical clock can be implemented as a counter that is
incremented after a new local timestamp is generated. To ensure that the various
logical clocks are synchronized, we require that a site Si advance its logical clock
whenever a transaction Ti with timestamp <x,y> visits that site and x is greater
than the current value of LCi. In this case, site Si advances its logical clock to the
value x + 1.

If the system clock is used to generate timestamps, then timestamps will be


assigned fairly, provided that no site has a system clock that runs fast or slow.
Since clocks may not be perfectly accurate, a technique similar to that for logical
clocks must be used to ensure that no clock gets far ahead of or behind another
clock.

20.6.1 Pipelined Parallelism


pipelining forms an important source of economy of computation for
database query processing. Recall that, in pipelining, the output tuples of one
operation, A, are consumed by a second operation, B, even before the first
operation has produced the entire set of tuples in its output. The major advantage
of pipelined execution in a sequential evaluation is that we can carry out a
sequence of such operations without writing any of the intermediate results to disk.
Parallel systems use pipelining primarily for the same reason that sequential
systems do. However, pipelines are a source of parallelism as well, in the same
way that instruction pipelines are a source of parallelism in hardware design. It is
possible to run operations A and B simultaneously on different processors, so that
B consumes tuples in parallel with A producing them. This form of parallelism is
called pipelined parallelism.
Consider a join of four relations:
r1 @ r2 @ r3 @ r4
We can set up a pipeline that allows the three joins to be computed in parallel.
Suppose processor P1 is assigned the computation of temp1 ← r1 @ r2, and P2 is
assigned the computation of r3 @ temp1. As P1 computes tuples in r1 @ r2, it
makes these tuples available to processor P2. Thus, P2 has available to it some of
the tuples in r1 @ r2 before P1 has finished its computation. P2 can use those
tuples that are available to begin computation of temp1 @ r3, even before r1@ r2
is fully computed by P1. Likewise, as P2 computes tuples in (r1 @ r2) @ r3, it
makes these tuples available to P3, which computes the join of these tuples with
r4.
Pipelined parallelism is useful with a small number of processors, but does
not scale up well. First, pipeline chains generally do not attain suficient length to
provide a high degree of parallelism. Second, it is not possible to pipeline
relational operators that do not produce output until all inputs have been accessed,
such as the set-difference operation. Third, only marginal speedup is obtained for
the frequent cases in which one operator’s execution cost is much higher than are
those of the others.
All things considered, when the degree of parallelism is high, pipelining is a
less important source of parallelism than partitioning. The real reason for using
pipelining is that pipelined executions can avoid writing intermediate results to
disk.

You might also like