0% found this document useful (0 votes)
12 views44 pages

DDBMS

The document discusses Distributed Database Management Systems (DDBMS), highlighting the shift from centralized to distributed databases that enhance reliability, availability, and performance. It outlines the complexities of system design, including data fragmentation, replication, and the need for effective concurrency control and recovery mechanisms. Key components such as server, client, and communication software are detailed, along with techniques for ensuring global consistency and serializability across distributed transactions.

Uploaded by

mwendikimaiga21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

DDBMS

The document discusses Distributed Database Management Systems (DDBMS), highlighting the shift from centralized to distributed databases that enhance reliability, availability, and performance. It outlines the complexities of system design, including data fragmentation, replication, and the need for effective concurrency control and recovery mechanisms. Key components such as server, client, and communication software are detailed, along with techniques for ensuring global consistency and serializability across distributed transactions.

Uploaded by

mwendikimaiga21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

DDBMS

• In a centralised database system, all system


components reside at a single computer site.
The components include the data, the DBMS
software and the associated secondary
storage devices such as disks for on-line
database storage and tapes for backup. A
centralised database can be accessed
remotely via terminals connected to the site.
DDBMS
• In recent years there has been a rapid trend
towards the distribution of computer systems
over multiple sites that are interconnected via
a communication network.
• A distributed database is a collection of data
that belongs logically to the same system but
is physically spread over the sites of a
computer network. Advantages of a
distributed database system includes:
DDBMS
• Distributed nature of database application-some
companies have locations at different sites to
serve users local to the site and global users e.g
headquarters
• Increased reliability and availability- reliability
refers to the probability that a system is up at a
particular moment, one site may fail but others
continue operating, availability refers to the
probability that the system is continuously
available during a time interval.
DDBMS
• Allowing data sharing while maintaining some
measure of local control
• Improved performance
DDBMS
Distribution leads to increased complexity in the
system design and implementation. To achieve
the potential advantages above the following
additional functions have to be provided:
• The ability to access remote sites and transmit
queries and data among the various sites via a
communication network
• The ability to keep track of the data distribution
and replication in the DBMS catalog
DDBMS
• The ability to devise execution strategies for
queries and transactions that access data from
more than one site
• The ability to decide on which copy of a
replicated data item to access
• The ability to maintain the consistency of copies
of a replicated data item
• The ability to recover from individual site crashes
and from new types of failures such as failure of a
communication link
DDBMS
In a typical DDBMS its common to divide the
software modules into three levels namely:
(a) The server software which is responsible for
local data management at a site much like a
centralised DBMS
(b) The client software which is responsible for
most of the distribution functions. It accesses
data distribution information from the DDBMS
catalog and processes all requests that require
access to more than one site
DDBMS
(c) The communication software (sometimes in
conjuction with a distributed OS) provides the
communication primitives that are used by
the client to transmit commands and data
among the various sites.
DDBMS
• The client is responsible for generating a
distributed execution plan for queries &
transactions, it ensures consistency of
replicated copies of the data item by
employing distributed concurrency control
techniques, performs global recovery when
certain sites fail
DDBMS
• And it hides the details of data distribution
from the user ( it enables the user to write
global queries and transactions as though the
database were centralised without specifying
the sites at which the data referenced in the
query resides). This property is called
distribution transparency.
DDBMS
Techniques used in distributed database design
include:
(a) Data fragmentation:
this is where decisions must be made regarding
which site should be used to store which
portions of the database. Before the decision
is made on how to distribute the data, the
logical units of the database that are to be
distributed are determined.
DDBMS
The two ways used are: (i) horizontal
fragmentation where a horizontal fragment of
a relation is a subset of tuples in that relation
e.g. we may store the database information
relating to each department at the computer
site of that department. For the relation
employee we define three horizontal
fragments by specifying a condition on an
attribute i.e. (DNO=4) (DNO=5) and (DNO=1).
DDBMS
(ii) Vertical fragmentation on the other hand
keeps only certain attributes of the relation
e.g. we may fragment the employee relation
into two-the 1st fragment includes personal
information and the 2nd fragment includes
work related information. Then we have
(iii) mixed fragmentation which intermixes
vertical and horizontal fragmentation.
DDBMS
(b) Data Replication and Allocation
if a fragment is stored at more than one site, it is
said to be replicated. Fully replicated is the most
extreme case in replication where the whole
database is replicated at every site in the
distributed system.
No replication is the other extreme where each
fragment is stored at exactly one site. Between
these two extremes we have a wide spectrum of
partial replication of the data
DDBMS
Each fragment or each copy of a fragment must
be assigned to a particular site in the
distributed system. This process is called data
allocation or data distribution.
The choice of sites and the degree of replication
depend on the performance and availability
goals of the system and on the types and
frequencies of transactions submitted at the
site.
DDBMS
• DDBMS differ in some ways which is
dependent on:
(i) The degree of homogeneity: if all servers use
identical software and all clients use identical
software the DDBMS is called homogeneous
otherwise its heterogeneous
• In a heterogeneous system, one server may be
a relational DBMS, another a network DBMS,
object oriented or hierarchical,
DDBMS
• in such a case it is necessary to have a
canonical system language and to include
language translators in the client to translate
sub-queries from the canonical language to
the language of each server.
DDBMS
(ii) The degree of distribution transparency:
In a DDBMS the cost of communication among
sites is considered a major factor in
distributed query optimization. The major
point in distributed query processing is use of
the semi-join operation which aims at
reducing the number of tuples in a relation
before transferring it to another site.
DDBMS
• DDBMS that support transparency employ query
decomposition which breaks up a query into sub-
queries that can be executed at individual sites, it
also determines the particular replica referenced
by a process called materializing a replica.
• For a vertical fragmentation the attribute list is in
the catalog and for a horizontal fragmentation a
condition is kept for each fragment.
Concurrency Control and Recovery for
Distributed Databases
In concurrency control and recovery for
distributed database the following factors are
specifically addressed:
• Distributed commit (two-phase commit)
• Distributed deadlock
• Failure of communication
• Failure of individual sites
• Dealing with multiple copies of data items
Concurrency Control and Recovery for
Distributed Databases
• A distributed transaction accesses data stored at more
than one location. Each transaction is divided into a
number of sub transactions one for each site that has
to be accessed. A sub transaction is represented as an
agent.
• Consider a transaction T that prints out the names of
all staff using the fragmentation schema (the user does
not need to know that data is fragmented, database
accesses are based on the global schema, so the user
does not need to specify fragment names or data
locations) as S1, S2, S21, S22 and S23.
Concurrency Control and Recovery for
Distributed Databases
• Three sub transactions TS3, TS5 and TS7
represent the agents at sites 3, 5 and 7
respectively. Each sub transaction prints out
the names of staff at that site.
Concurrency Control and Recovery for
Distributed Databases
Concurrency Control and Recovery for
Distributed Databases
• The transaction manager co-ordinates
transactions on behalf of application
programs, communicates with the scheduler
responsible for implementing a particular
strategy for concurrency control.
• In case of failure occurring during a
transaction the recovery manager ensures the
database is restored to the state it was in
before the start of the transaction
Concurrency Control and Recovery for
Distributed Databases
• It also restores the database to a consistent
state following system failure. The buffer
manager is responsible for the transfer of
data between disk storage and main memory.
• In a distributed DBMS, these modules still
exist in the local DBMS.
Concurrency Control and Recovery for
Distributed Databases
• In addition there is also a global transaction
manager or transaction co-ordinator at each
site, to co-ordinate the execution of both the
global and local transactions initiated at the
site.
• Inter-site communication is through the data
communication component.
Concurrency Control and Recovery for
Distributed Databases
The procedure to execute a global transaction
initiated at site S1 is as follows:
• The transaction co-ordinator TC1, at site S1
divides the transactions into a number of sub-
transactions using information held in the
global system catalog.
• The data communication component at site S1
sends the sub transactions to the appropriate
sites say S2 and S3.
Concurrency Control and Recovery for
Distributed Databases
• The transaction co-ordinators at sites S2 and S3
co-ordinate these sub transactions. The results of
the sub transactions are communicated back to
TC1, via the data communication component.
• Communication between different local
databases and different processors (at same site
or different sites takes place through message
passing in a communication network).

Concurrency Control and Recovery for
Distributed Databases
The key issues in concurrency control in
distributed DBMS
a. The degree of distribution of database
hardware/software and control determines
the complexity of a distributed system. The
degree of cooperation among the different
processors determines the inter computer
message rate and the complexity of control
Concurrency Control and Recovery for
Distributed Databases
(b)The distributed scheduler has to ensure the
consistency of different local databases in
which replication and multi-version data
objects may be used, at the same time it
needs to ensure the global consistency of the
whole collection of databases. Thus the
distributed scheduler is essentially a scheduler
of schedulers.
Concurrency Control and Recovery for
Distributed Databases
(c) In a distributed system no one site may hold all
the global information to ensure a global
consistency check. Hence one has to obtain
information about actions at different sites and
then combine all the information to obtain the
global information on consistency. We should
therefore have a central coordinator or assign the
coordination job to one site. Thus communication
costs between sites should be considered and
issues related to communication delays and
failures must be considered.
Concurrency Control and Recovery for
Distributed Databases
(d)The conflict graphs (the precedence graphs),
locks, timestamps and certifier techniques
depend on the fundamental notion of total
ordering of events in time. These techniques
must be extended in order to achieve a
distributed schedule.
Concurrency Control and Recovery for
Distributed Databases
• In a distributed schedule each transaction
performs actions at several sites, 1,2,3.....S.
the sequence of actions performed by a single
transaction at any one site is called a sub
transaction. The sequence of actions
performed by different transactions on a
database at any one site is called a local
schedule.
Concurrency Control and Recovery for
Distributed Databases
• Thus when many transactions are performing
their sub transactions at many sites, we have a
schedule of many local schedules. Hence to
ensure the serializability of a distributed
schedule it is necessary that each local
schedule be serializable though this may not
be a sufficient condition.
Concurrency Control and Recovery for
Distributed Databases
• In order to achieve a multisite global
serialization we need to combine the local
information obtained from different sites at a
co-ordinating site and look for multisite or
global consistency. Since it is clear that such a
compilation of information is communication
intensive, one should look for ways and means
of minimizing the communication overhead.
Concurrency Control and Recovery for
Distributed Databases
• Each of the techniques looked at in a
centralized site can be extended to suit a
distributed environment. They have however
different communication and computational
complexities.
Distributed Serializability
• The concept of serializability can be extended
for the distributed environment to cater for
data distribution. If the schedule of
transaction execution at each site is
serializable, then the global schedule (union of
all local schedules) is also serializable provided
local serialization orders are identical.
Distributed Serializability
• This requires that all sub-transactions appear
in the same order in the equivalent serial
schedule at all sites.
• Thus if the sub transaction of Ti at site S1 is
denoted as , it must be ensured that if

• Then
Distributed Serializability
for all sites at which and have sub
transactions.
Distributed Serializability
• The solutions to concurrency control in a
distributed environment are based on the two
main approaches of locking and time stamping
• Given a set of transactions to be executed
concurrently then:
(a) Locking guarantees that the concurrent
execution is equivalent to some
(unpredictable) serial execution of those
transactions.
Distributed Serializability
(b) Timestamping guarantees that the
concurrent execution is equivalent to a
specific serial execution of those transactions,
corresponding to the order of the timestamps.
Global Serializability Conditions
(a) At each site the local schedule is serializable
(b) At each site the serialization order of
transactions dictated by every other site is
not violated. That is for each pair of
conflicting actions among transactions ,
an action of precedes an action of in any
local schedule if and only if precedes in
the total ordering of ALL transactions at all
sites.
Global Serializability Conditions
• Example: the example below describes a
single version distributed schedule of two
transactions 1 and 2 at two sites 1 and 2 on
data objects X and Y
Global Serializability Conditions
• The pairs in the matrix are concurrent at different
sites 1 and 2
• For example 2yW1 1yR1 2xW1

• This means 1 and 2 cannot be ordered either as


1,2 or as 2,1 to bring in total order.

• Now consider the distributed schedule with the


local schedules

You might also like