0% found this document useful (0 votes)
106 views74 pages

Unit 2adtnotes

This document discusses parallel databases and distributed databases. It begins by defining a parallel database management system (DBMS) as one that runs across multiple processors and disks designed to improve performance through parallel execution. The main architectures for parallel DBMS are shared memory, shared disk, and shared nothing. Shared memory provides tight coupling but lacks scalability. Shared disk eliminates the shared memory bottleneck while introducing data partitioning overhead. Shared nothing is highly scalable and partitions data by processor. The document also discusses interquery and intraquery parallelism in parallel databases and challenges like ensuring cache coherence.

Uploaded by

Jobi Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views74 pages

Unit 2adtnotes

This document discusses parallel databases and distributed databases. It begins by defining a parallel database management system (DBMS) as one that runs across multiple processors and disks designed to improve performance through parallel execution. The main architectures for parallel DBMS are shared memory, shared disk, and shared nothing. Shared memory provides tight coupling but lacks scalability. Shared disk eliminates the shared memory bottleneck while introducing data partitioning overhead. Shared nothing is highly scalable and partitions data by processor. The document also discusses interquery and intraquery parallelism in parallel databases and challenges like ensuring cache coherence.

Uploaded by

Jobi Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY

LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34


CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
TOPIC – I PARALLEL DATABASES SHARED MEMORY ARCHITECTURE:

Parallel DBMS

A DBMS running across multiple processors and disks designed to execute


operations in parallel, whenever possible, to improve performance.

● Based on premise that single processor systems can no longer meet


requirements for cost-effective scalability, reliability, and
performance.
● Parallel DBMSs link multiple, smaller machines to achieve same
throughput as single, larger machine, with greater scalability and
reliability.
● To provide multiple processors with common access to a single
database, a parallel DBMS must provide for shared resource
● Shared memory is a tightly coupled architecture in which multiple
management.
processors within a single system share system memory.
● Which resources are shared and how those shared resources are
● It is also known as symmetric multiprocessing (SMP), this
implemented, directly affects the performance and scalability of
approach has become popular on platforms ranging from personal
the system which, in turn, determines its appropriateness for a
workstations that support a few microprocessors in parallel, to
given application/ environment.
large RISC (Reduced Instruction Set Computer)- based machines,
● Main architectures for parallel DBMSs are:
all the way up to the largest mainframes.
o Shared memory, ● Advantage:This architecture provides high-speed data access for a
limited number of processors.
o Shared disk,
● Disadvantage: It is not scalable beyond about 64 processors when

o Shared nothing. the interconnection network becomes a bottleneck.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
introducing the overhead associated with physically partitioned
SHARED DISK ARCHITECTURE data.
● Shared disk systems are sometimes referred to as clusters.
SHARED NOTHING ARCHITECTURE

● Shared nothing, often known as massively parallel processing


● Shared disk is a loosely-coupled architecture optimized for (MPP), is a multiple processor architecture in which each
applications that are inherently centralized and require high processor is part of a complete system, with its own memory and
availability and performance. disk storage.
● Each processor can access all disks directly, but each has its own ● The database is partitioned among all the disks on each system
private memory. associated with the database, and data is transparently available to
● Advantage:Like the shared nothing architecture, the shared disk users on all systems.
architecture eliminates the shared memory performance bottleneck. ● This architecture is more scalable than shared memory and can
● Disadvantage:Unlike the shared nothing architecture, however, easily support a large number of processors.
the shared disk architecture eliminates this bottleneck without
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Disadvantage: Performance is optimal only when requested data systems need access to large volumes of data and must provide
is stored locally. timely responses to queries.
DIFFERENCE BETWEEN PARALLEL DBMS AND DISTRIBUTED ● A parallel DBMS can use the underlying architecture to improve
DBMS the performance of complex query execution using parallel scan,
SNO PARALLEL DBMS DISTRIBUTED DBMS join, and sort techniques that allow multiple processor nodes
1. A DBMS running across The software system that permits automatically to share the processing workload.
multiple processors and disks the management of the distributed
that is designed to execute database and makes the distribution
operations in parallel, transparent to users.
whenever possible, in order
to improve performance.
2. The distribution of data in a The shared nothing definition
parallel DBMS is based sometimes includes distributed
solely on performance DBMSs
considerations.
3. The nodes of a parallel The nodes of a DDBMS are
DBMS are typically within typically geographically
the same computer or within distributed, separately
the same site. administered, and have a slower
interconnection network.

USE OF PARALLEL DBMS:


● Parallel technology is typically used for very large databases
possibly of the order of terabytes (1012 bytes), or systems that
have to process thousands of transactions per second. These TOPIC 2: INTER AND INTRA QUERY PARALLELISM
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Interquery Parallelism o Processors have to perform some tasks, such as locking
● In interquery parallelism, different queries or transactions and logging, in a coordinated fashion, and that requires
execute in parallel with one another. that they pass messages to each other.
● Interquery parallelism is the easiest form of parallelism to support CACHE COHERENCE PROBLEM
in a database system—particularly in a shared-memory parallel ● A parallel database system must also ensure that two processors do
system. not update the same data independently at the same time.
● Advantage: Transaction throughput can be increased by this form ● When a processor accesses or updates data, the database system
of parallelism. must ensure that the processor has the latest version of the data in
● Disadvantage: The response times of individual transactions are its buffer pool. The problem of ensuring that the version is the
no faster than they would be if the transactions were run in latest is known as the cache-coherency problem.
isolation. ● Various protocols are available to guarantee cache coherency;
● Use: The primary use of interquery parallelism is to scaleup a often, cache-coherency protocols are integrated with
transaction-processing system to support a larger number of concurrency-control protocols so that their overhead is reduced.
transactions per second. ● Cache coherence protocol for a shared-disk system is this:
● Database systems designed for single-processor systems can be ● Before any read or write access to a page, a transaction locks the
used with few or no changes on a shared-memory parallel page in shared or exclusive mode, as appropriate. Immediately
architecture, since even sequential database systems support after the transaction obtains either a shared or exclusive lock on a
concurrent processing. page, it also reads the most recent copy of the page from the shared
● Transactions that would have operated in a time-shared concurrent disk.
manner ● Before a transaction releases an exclusive lock on a page, it flushes
● on a sequential machine operate in parallel in the shared-memory the page to the shared disk; then, it releases the lock.
parallel architecture.
● Supporting interquery parallelism is more complicated in a ● This protocol ensures that, when a transaction sets a shared or
shared-disk or sharednothing architecture. exclusive lock on a page, it gets the correct copy of the page.
Complex Protocols
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● More complex protocols avoid the repeated reading and writing to ● Suppose that the relation has been partitioned across multiple disks
disk required by the preceding protocol. by range partitioning on some attribute, and the sort is requested on
o Such protocols do not write pages to disk when exclusive the partitioning attribute.
locks are released. ● The sort operation can be implemented by sorting each partition in
o When a shared or exclusive lock is obtained, if the most parallel, then concatenating the sorted partitions to get the final
recent version of a page is in the buffer pool of some sorted relation.
processor, the page is obtained from there. ● We can parallelize a query by parallelizing individual operations.
o The protocols have to be designed to handle concurrent ● There is another source of parallelism in evaluating a query: The
requests. operator tree for a query can contain multiple operations.
Cache Coherence Protocol for Shared Nothing Architecture ● We can parallelize the evaluation of the operator tree by evaluating
● Each page has a home processor Pi, and is stored on disk Di. in parallel some of the operations that do not depend on one
● When other processors want to read or write the page, they send another.
requests to the home processor Pi of the page, since they cannot ● We may be able to pipeline the output of one operation to another
directly communicate with the disk. operation.
● The other actions are the same as in the shared-disk protocols. ● The two operations can be executed in parallel on separate
Intraquery Parallelism processors, one generating output that is consumed by the other,
● Intraquery parallelism refers to the execution of a single query in even as it is generated.
parallel on multiple processors and disks. ● In summary, the execution of a single query can be parallelized in
● Use:Using intraquery parallelism is important for speeding up two ways:
● long-running queries. • Intraoperation parallelism. We can speed up processing
● Interquery parallelism does not help in this task, since each query of a query by parallelizing the execution of each
is run sequentially. individual operation, such as sort, select, project, and join.
Example: • Interoperation parallelism.We can speed up processing
● To illustrate the parallel evaluation of a query, consider a query of a query by executing in parallel the different operations
that requires a relation to be sorted. in a query expression.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● The two forms of parallelism are complementary, and can be used The major advantage of pipelined execution in a sequential
simultaneously on a query. evaluation is that we can carry out a sequence of such operations
● Since the number of operations in a typical query is small, without writing any of the intermediate results to disk.
compared to the number of tuples processed by each operation, the ● It is possible to run operations A and B simultaneously on different
first form of parallelism can scale better with increasing processors, so that B consumes tuples in parallel with A producing
parallelism. them. This form of parallelism is called pipelined parallelism.
● With the relatively small number of processors in typical parallel ● Example:
systems today, both forms of parallelism are important.
Intraoperation Parallelism
● Since relational operations work on relations containing large sets o
of tuples, we can parallelize the operations by executing them in
P1:
parallel on different subsets of the relations.
● Since the number of tuples in a relation can be large, the degree of
parallelism is potentially enormous P2:
● Thus, intraoperation parallelism is natural in a database system.
Interoperation Parallelism
P3:
● There are two forms of interoperation parallelism: pipelined
parallelism, and independent parallelism.
Disadvantage of Piplined Parallelism:
Pipelined Parallelism
Pipelined parallelism is useful with a small number of processors, but does
● Pipelining forms an important source of economy of computation
not scale up well.
for database query processing.
● Pipeline chains generally do not attain sufficient length to provide
● In pipelining, the output tuples of one operation, A, are consumed
a high degree of parallelism.
by a second operation, B, even before the first operation has
produced the entire set of tuples in its output.
● Advantage:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● It is not possible to pipeline relational operators that do not
produce output until all inputs have been accessed, such as the
set-difference operation.
● Only marginal speedup is obtained for the frequent cases in which
one operator’s execution cost is much higher than are those of the
others.
● When the degree of parallelism is high, pipelining is a less
important source of parallelism than partitioning.
Advantage of Piplined Parallelism:
● The real reason for using pipelining is that pipelined executions
can avoid writing intermediate results to disk.
Independent Parallelism
● Operations in a query expression that do not depend on one
another can be executed in parallel. This form of parallelism is
called independent parallelism.
● Example:


● Disadvantage: TOPIC 3: DISTRIBUTED DATABASE FEATURES:
Independent parallelism does not provide a high degree of Distributed Database Technology:
parallelism, and is less useful in a highly parallel system. ● Mode of working from centralized to decentralized.
● Advantage: Applications:
It is useful with a lower degree of parallelism.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Rapid developments in network and data communication ● It is also capable of processing data stored on other computers in
technology, epitomized by the Internet, mobile and wireless the network.
computing, intelligent devices, and grid computing. ● Users access the distributed database via applications, which are
Introduction: classified as those that do not require data from other sites (local
● The shareability of the data and the efficiency of data access applications) and those that do require data from other sites
should be improved by the development of a distributed database (global applications).
system that reflects this organizational structure, makes the data in ● We require a DDBMS to have at least one global application.
all units accessible, and stores data proximate to the location where A DDBMS therefore has the following characteristics:
it is most frequently used. ● a collection of logically related shared data;
● Distributed DBMSs should help resolve the islands of information ● the data is split into a number of fragments;
problem. ● fragments may be replicated;
Concepts: ● fragments/replicas are allocated to sites;
Distributed Database ● the sites are linked by a communications network;
A logically interrelated collection of shared data (and a description of this ● the data at each site is under the control of a DBMS;
data), physically distributed over a computer network. ● the DBMS at each site can handle local applications,
Distributed DBMS autonomously;
Software system that permits the management of the distributed database ● each DBMS participates in at least one global application.
and makes the distribution transparent to users.
● Each fragment is stored on one or more computers under the
control of a separate DBMS, with the computers connected by a
communications network.
● Each site is capable of independently processing user requests that Distributed DBMS
require access to local data that is, each site has some degree of
local autonomy.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● The objective of transparency is to make the distributed system
appear like a centralized system.
● This is sometimes referred to as the fundamental principle of
distributed DBMSs
● Advantage of transparency in DDBMS: This requirement
provides significant functionality for the end-user.
● Disdvantage of transparency in DDBMS: It creates many
additional problems that have to be handled by the DDBMS.

● It is not necessary for every site in the system to have its own local
database.
● From the definition of the DDBMS, the system is expected to
make the distribution transparent (invisible) to the user. Thus, the
fact that a distributed database is split into fragments that can be
stored on different computers and perhaps replicated, should be
Distributed Processing
hidden from the user.
A centralized database that can be accessed over a computer network.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
SNO Distributed DBMS Distributed Processing
1. Software system that A centralized database that can
permits the management of be accessed over a computer
the distributed database and network.
makes the distribution
transparent to users.
2. The key point in the If the data is centralized, even
definition of a distributed though others may be
DBMS is that the system accessing the data over the
consists of data that is network it is called the
physically distributed across distributed processing.
a number of sites in the
network.
3.

Difference between Distributed DBMS and Distributed Processing

Advantages of DDBMSs
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Reflects organizational structure increased performance by exploiting the parallel processing
● Improved shareability and local autonomy capability of multiple sites.
● Improved availability Heterogeneous DDBMS
● Improved reliability ● Sites may run different DBMS products, with possibly different
● Improved performance underlying data models.
● Economics ● So the system may be composed of relational, network
● Modular growth hierarchical, and object-oriented DBMSs.
Disadvantages of DDBMSs ● Heterogeneous systems usually result when individual sites have
● Complexity implemented their own databases and integration is considered at a
● Cost later stage.
● Security ● In a heterogeneous system, translations are required to allow
● Integrity control more difficult communication between different DBMSs.
● Lack of standards ● To provide DBMS transparency, users must be able to make
● Lack of experience requests in the language of the DBMS at their local site.
● Database design more complex ● The system then has the task of locating the data and performing
any necessary translation.
Types of DDBMS ● Data may be required from another site that may have:
● Homogeneous DDBMS ▪ different hardware;
● Heterogeneous DDBMS ▪ different DBMS products;
Homogeneous DDBMS ▪ different hardware and different DBMS products.
● All sites use same DBMS product.
Advantage: different hardware
● Homogeneous systems are much easier to design and manage. ● If the hardware is different but the DBMS products are the same,
● This approach provides incremental growth, making the the translation is straightforward, involving the change of codes
addition of a new site to the DDBMS easy, and allows and word lengths
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
different DBMS products ● Open Group formed a Working Group to provide specifications
● If the DBMS products are different, the translation is complicated that will create a database infrastructure environment where there
involving the mapping of data structures in one data model to the is:
equivalent data structures in another data model. o Common SQL API that allows client applications to be
● For example, relations in the relational data model are mapped to written that do not need to know vendor of DBMS they
records and sets in the network model. are accessing.
● It is also necessary to translate the query language used. o Common database protocol that enables DBMS from one
different hardware and different DBMS products. vendor to communicate directly with DBMS from another
● If both the hardware and software are different, then both these vendor without the need for a gateway.
types of translation are required. o A common network protocol that allows communications
● This makes the processing extremely complex. between different DBMSs.
Gateway: ● Most ambitious goal is to find a way to enable transaction to span
● The typical solution used by some relational systems that are part DBMSs from different vendors without use of a gateway.
of a heterogeneous DDBMS is to use gateways. ● Group has now evolved into DBIOP Consortium and are working
● It converts the language and model of each different DBMS into in version 3 of DRDA (Distributed Relational Database
the language and model of the relational system. Architecture) standard.
Disadvantages of Gateways: Multidatabase System (MDBS)
● It may not support transaction management. DDBMS in which each site maintains complete autonomy.
● The gateway approach is concerned only with the problem of
translating a query expressed in one language into an equivalent ● DBMS that resides transparently on top of existing database and
expression in another language. file systems and presents a single database to its users.
● It does not address the issues of homogenizing the structural and ● Allows users to access and share data without requiring physical
representational differences between different schemas. database integration.
Open Database Access and Interoperability ● Two types of MDBS
● Unfederated MDBS (no local users) and federated MDBS.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Federated MDBS(FMDBS): o Local Area Network (LAN) intended for connecting
● Where there are no local users. computers at same site.
Unfederated MDBS(UFMDBS): o Wide Area Network (WAN) used when computers or
● A federated system is a cross between a distributed DBMS and a LANs need to be connected over long distances.
centralized DBMS. o WAN relatively slow and less reliable than LANs.
● It is a distributed system for global users and a centralized system DDBMS using LAN provides much faster response time
for local users. than one using WAN.
Functions of MDBS:
● An MDBS maintains only the global schema against which users
issue queries and updates and the local DBMSs themselves
maintain all user data. The global schema is constructed by
integrating the schemas of the local databases.
● The MDBS first translates the global queries and updates into
queries and updates on the appropriate local DBMSs.
● It then merges the local result and generates the final global result
for the user.
● The MDBS coordinates the commit and abort operations for global
transactions by the local DBMSs that processed them, to maintain
consistency of data within the local databases.
Functions of a DDBMS
● An MDBS controls multiple gateways and manages local
● Expect DDBMS to have at least the functionality of a DBMS.
databases through these gateways.
● Also to have following functionality:
Overview of Networking
● Extended communication services.
● Network - Interconnected collection of autonomous computers,
● Extended Data Dictionary.
capable of exchanging information.
● Distributed query processing.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Extended concurrency control. o transaction transparency;
● Extended recovery services. o performance transparency;
Date’s 12 Rules for a DDBMS o DBMS transparency.
0. Fundamental Principle :To the user, a distributed system Distribution Transparency
should look exactly like a nondistributed system. ● Distribution transparency allows the user to perceive the database
1. Local Autonomy as a single, logical entity.
2. No Reliance on a Central Site ● If a DDBMS exhibits distribution transparency, then the user does
3. Continuous Operation not need to know the data is fragmented (fragmentation
4. Location Independence transparency) or the location of data items (location
5. Fragmentation Independence transparency).
6. Replication Independence ● If the user needs to know that the data is fragmented and the
7. Distributed Query Processing location of fragments then we call this local mapping
8. Distributed Transaction Processing transparency.
9. Hardware Independence Transaction Transparency
10. Operating System Independence ● Transaction transparency in a DDBMS environment ensures that
11. Network Independence all distributed transactions maintain the distributed database’s
12. Database Independence integrity and consistency.
● Last four rules are ideals. ● A distributed transaction accesses data stored at more than one
location.
Transparencies in a DDBMS ● Each transaction is divided into a number of subtransactions, one
● The definition of a DDBMS states that the system should make the for each site that has to be accessed; a subtransaction is represented
distribution transparent to the user. by an agent
● Transparency hides implementation details from the user. Performance Transparency
● We can identify four main types of transparency in a DDBMS: ● Performance transparency requires a DDBMS to perform as if it
o distribution transparency; were a centralized DBMS.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● In a distributed environment, the system should not suffer any ● DDBMS must ensure that no two sites create a database object
performance degradation due to the distributed architecture, for with same name.
example the presence of the network. ● One solution is to create central name server. However, this results
● Performance transparency also requires the DDBMS to determine in:
the most cost-effective strategy to execute a request. o loss of some local autonomy;
o central site may become a bottleneck;
DBMS Transparency o low availability; if the central site fails, remaining sites
● DBMS transparency hides the knowledge that the local DBMSs cannot create any new objects.
may be different, and is therefore only applicable to heterogeneous
DDBMSs. ● Alternative solution - prefix object with identifier of site that
● It is one of the most difficult transparencies to provide as a created it.
generalization. ● For example, Branch created at site S1 might be named
Distribution Transparency S1.BRANCH.
● Distribution transparency allows user to perceive database as ● Also need to identify each fragment and its copies.
single, logical entity. ● Thus, copy 2 of fragment 3 of Branch created at site S1 might
● If DDBMS exhibits distribution transparency, user does not need to be referred to as S1.BRANCH.F3.C2.
know: ● However, this results in loss of distribution transparency.
o data is fragmented (fragmentation transparency), ● An approach that resolves these problems uses aliases for each
o location of data items (location transparency), database object.
o otherwise call this local mapping transparency. ● Thus, S1.BRANCH.F3.C2 might be known as LocalBranch by
● With replication transparency, user is unaware of replication of user at site S1.
fragments . ● DDBMS has task of mapping an alias to appropriate database
Naming Transparency object.
● Each item in a DDB must have a unique name.
Transaction Transparency
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Ensures that all distributed transactions maintain distributed ● DDBMS must ensure both global and local transactions do not
database’s integrity and consistency. interfere with each other.
● Distributed transaction accesses data stored at more than one ● Similarly, DDBMS must ensure consistency of all
location. subtransactions of global transaction.
● Each transaction is divided into number of subtransactions,
one for each site that has to be accessed. Classification of Transactions
● DDBMS must ensure the indivisibility of both the global ● In IBM’s Distributed Relational Database Architecture (DRDA),
transaction and each of the subtransactions. four types of transactions:
o Remote request
Example - Distributed Transaction o Remote unit of work
● T prints out names of all staff, using schema defined above as o Distributed unit of work
S1, S2, S21, S22, and S23. Define three subtransactions TS3, TS5, o Distributed request.
and TS7 to represent agents at sites 3, 5, and 7.

Concurrency Transparency
● All transactions must execute independently and be logically
consistent with results obtained if transactions executed one at
a time, in some arbitrary serial order.
● Replication makes concurrency more complex.
● Same fundamental principles as for centralized DBMS.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● If a copy of a replicated data item is updated, update must be o DDBMS should not suffer any performance degradation
propagated to all copies. due to distributed architecture.
● Could propagate changes as part of original transaction, making it o DDBMS should determine most cost-effective strategy to
an atomic operation. execute a request.
● However, if one site holding copy is not reachable, then transaction
is delayed until site is reachable. ● Distributed Query Processor (DQP) maps data request into ordered
● Could limit update propagation to only those sites currently sequence of operations on local databases.
available. Remaining sites updated when they become available ● Must consider fragmentation, replication, and allocation schemas.
again. ● Distributed Query Processing has to decide:
● Could allow updates to copies to happen asynchronously, o which fragment to access;
sometime after the original update. Delay in regaining consistency o which copy of a fragment to use;
may range from a few seconds to several hours. o which location to use.
● Distributed Query Processing produces execution strategy
Failure Transparency optimized with respect to some cost function.
● DDBMS must ensure atomicity and durability of global ● Typically, costs associated with a distributed request include:
transaction. o I/O cost;
● Means ensuring that subtransactions of global transaction either all o CPU cost;
commit or all abort. o communication cost.
● Thus, DDBMS must synchronize global transaction to ensure that Performance Transparency – Example
all subtransactions have completed successfully before recording a Property(propNo, city) 10000 records in London
final COMMIT for global transaction. Client(clientNo,maxPrice) 100000 records in Glasgow
● Must do this in presence of site and network failures. Viewing(propNo, clientNo) 1000000 records in London
SELECT p.propNo
Performance Transparency FROM Property p INNER JOIN
● DDBMS must perform as if it were a centralized DBMS. (Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo)
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
ON p.propNo = v.propNo
WHERE p.city=‘Aberdeen’ AND c.maxPrice > 200000;
Assume:
Each tuple in each relation is 100 characters long. TOPIC 4: Distributed DBMS Architecture
10 renters with maximum price greater than £200,000.
100 000 viewings for properties in Aberdeen. Reference Architecture for DDBMS
Computation time negligible compared to communication time. ● Due to diversity, no accepted architecture equivalent to
ANSI/SPARC 3-level architecture.
● A reference architecture consists of:
o Set of global external schemas.
o Global conceptual schema (GCS).
o Fragmentation schema and allocation schema.
o Set of schemas for each local DBMS conforming to
3-level ANSI/SPARC.
● Some levels may be missing, depending on levels of transparency
supported.
The edges in this figure represent mappings between the different schemas.
Global conceptual schema
● The global conceptual schema is a logical description of the whole
database, as if it were not distributed.
● This level corresponds to the conceptual level of the ANSI-SPARC
architecture and contains definitions of entities, relationships,
constraints, security, and integrity information.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● It provides physical data independence from the distributed
environment. The global external schemas provide logical data
independence.
Fragmentation and allocation schemas
● The fragmentation schema is a description of how the data is to be
logically partitioned.
● The allocation schema is a description of where the data is to be
located, taking account of any replication.
Local schemas
● Each local DBMS has its own set of schemas.
● The local conceptual and local internal schemas correspond to the
equivalent levels of the ANSI-SPARC architecture.
● The local mapping schema maps fragments in the allocation
schema into external objects in the local database.
● It is DBMS independent and is the basis for supporting
heterogeneous DBMSs.

Reference Architecture for a Federated MDBS


● Federated systems differ from DDBMSs in the level of local
autonomy provided.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● FMDBS is tightly coupled, that is, it has a global conceptual
● schema (GCS).
● In DDBMS, GCS is union of all local conceptual schemas.
● In FMDBS, GCS is subset of local conceptual schemas (LCS),
consisting of data that each local system agrees to share.
● GCS of tightly coupled system involves integration of either parts
of LCSs or local external schemas.
● FMDBS with no GCS is called loosely coupled.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Components Architecture of a DDBMS o It has its own local system catalog that stores information
about the data held at that site.
o In a homogeneous system, the LDBMS component is the
same product, replicated at each site.
o In a heterogeneous system, there would be at least two
sites with different DBMS products and/or platforms.
Data Communication Component:
o The DC component is the software that enables all sites to
communicate with each other.
o The DC component contains information about the sites
and the links
Global System Catalog:
o The GSC has the same functionality as the system catalog
of a centralized system.

● Independent of the reference architecture, we can identify a o The GSC holds information specific to the distributed

component architecture for avDDBMS consisting of four major nature of the system, such as the fragmentation,

components: replication, and allocation schemas.

o local DBMS (LDBMS) component; o It can itself be managed as a distributed database and so it

o data communications (DC) component; can be fragmented and distributed, fully replicated, or

o global system catalog (GSC); centralized, like any other relation.

o distributed DBMS (DDBMS) component. o A fully replicated GSC compromises site autonomy as

Local DBMS Component: every modification to the GSC has to be communicated to

o The LDBMS component is a standard DBMS, responsible all other sites.

for controlling the local data at each site that has a o A centralized GSC also compromises site autonomy and

database. is vulnerable to failure of the central site.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
R*: TOPIC 5: FRAGMENTATION
o The approach taken in the distributed system R*
overcomes these failings. Distributed Database Design
o In R* there is a local catalog at each site that contains the ● Three key issues:
metadata relating to the data stored at that site. o Fragmentation,
o For relations created at some site (the birth-site), it is the o Allocation,
responsibility of that site’s local catalog to record the o Replication.
definition of each fragment, and each replica of each ● Fragmentation
fragment, and to record where each fragment or replica is o Relation may be divided into a number of sub-relations,
located. which are then distributed.
o Whenever a fragment or replica is moved to a different ● Allocation
location, the local catalog at the corresponding relation’s o Each fragment is stored at site with “optimal”
birth-site must be updated. distribution.
o Thus, to locate a fragment or replica of a relation, the ● Replication
catalog at the relation’s birth-site must be accessed. The o Copy of fragment may be maintained at several sites.
birth-site of each global relation is recorded in each local
● Definition and allocation of fragments carried out strategically
GSC.
to achieve:
Distributed DBMS component
o The DDBMS component is the controlling unit of the o Locality of Reference.
entire system.
o Improved Reliability and Availability.

o Improved Performance.

o Balanced Storage Capacities and Costs.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Minimal Communication Costs. Acceptable performance
● Bad allocation may result in bottlenecks occurring, that is a site
● Involves analyzing most important applications, based on
may become inundated with requests from other sites, perhaps
quantitative/qualitative information.
causing a significant degradation in performance.

● The quantitative information may include: ● Alternatively, bad allocation may result in underutilization of

o the frequency with which a transaction is run; resources.

o the site from which a transaction is run; Balanced storage capacities and costs

o the performance criteria for transactions. ● Consideration should be given to the availability and cost of
● The qualitative information may include information about the storage at each site so that cheap mass storage can be used, where
transactions that are executed, such as: possible.

o the relations, attributes, and tuples accessed; ● This must be balanced against locality of reference.
o the type of access (read or write); Minimal communication costs
o the predicates of read operations. ● Consideration should be given to the cost of remote requests.
● Retrieval costs are minimized when locality of reference is
Objectives for allocation and definition of fragments maximized or when each site has its own copy of the data.
Locality of reference ● However, when replicated data is updated, the update has to be
● Where possible, data should be stored close to where it is used. performed at all sites holding a duplicate copy, thereby increasing
● If a fragment is used at several sites, it may be advantageous to communication costs.
store copies of the fragment at these sites. Data Allocation
Improved reliability and availability ● Four alternative strategies regarding placement of data:
● Reliability and availability are improved by replication: there is o Centralized,
another copy of the fragment available at another site in the event o Partitioned (or Fragmented),
of one site failing. o Complete Replication,
o Selective Replication.
Centralized:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● This strategy consists of a single database and DBMS stored at one ● Storage costs and communication costs for updates are the most
site with users distributed across the network. expensive. To overcome some of these problems, snapshots are
● Locality of reference is at its lowest as all sites, except the central sometimes used.
site, have to use the network for all data accesses. ● A snapshot is a copy of the data at a given time.
● Communication costs are high. ● The copies are updated periodically, for example, hourly or
● Reliability and availability are low, as a failure of the central site weekly, so they may not be always up to date.
results in the loss of the entire database system. ● Snapshots are also sometimes used to implement views in a
Fragmented (or partitioned) distributed database to improve the time it takes to perform a
● This strategy partitions the database into disjoint fragments, with database operation on a view.
each fragment assigned to one site. Selective replication
● If data items are located at the site where they are used most ● This strategy is a combination of fragmentation, replication, and
frequently, locality of reference is high. centralization.
● As there is no replication, storage costs are low; similarly, ● Some data items are fragmented to achieve high locality of
reliability and availability are low, although they are higher than in reference and others, which are used at many sites and are not
the centralized case as the failure of a site results in the loss of only frequently updated, are replicated; otherwise, the data items are
that site’s data. centralized.
● Performance should be good and communications costs low if the ● The objective of this strategy is to have all the advantages of the
distribution is designed properly. other approaches but none of the disadvantages. This is the most
Complete replication commonly used strategy because of its flexibility.
● This strategy consists of maintaining a complete copy of the
database at each site.
● Therefore, locality of reference, reliability and availability, and
performance are maximized.

Comparison of Strategies for Data Distribution


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Integrity.
▪ Integrity control may be more difficult if data
and functional dependencies are fragmented and
located at different sites.
Correctness of Fragmentation
There are three rules that must be followed during fragmentation:
(1) Completeness
If a relation instance R is decomposed into fragments R1, R2, . . . ,
Reasons for fragmenting a relation: Rn, each data item that can be found in R must appear in at least
● Usage one fragment. This rule is necessary to ensure that there is no loss
o Applications work with views rather than entire relations. of data during fragmentation.
● Efficiency (2) Reconstruction
o Data is stored close to where it is most frequently used. It must be possible to define a relational operation that will
o Data that is not needed by local applications is not stored. reconstruct the relation R from the fragments. This rule ensures
● Parallelism that functional dependencies are preserved.
o With fragments as unit of distribution, transaction can be (3) Disjointness
divided into several subqueries that operate on fragments. If a data item di appears in fragment Ri, then it should not appear
● Security in any other fragment. Vertical fragmentation is the exception to
o Data not required by local applications is not stored and this rule, where primary key attributes must be repeated to allow
so not available to unauthorized users. reconstruction. This rule ensures minimal data redundancy.
● Disadvantages
o Performance, ● In the case of horizontal fragmentation, a data item is a tuple.
▪ The performance of global applications that ● For vertical fragmentation, a data item is an attribute.
require data from several fragments located at Types of fragmentation
different sites may be slower.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● There are two main types of fragmentation: horizontal and ● Given a relation R, a horizontal fragment is defined as:
vertical.
● Horizontal fragments are subsets of tuples and
● vertical fragments are subsets of attributes where p is a predicate based on one or more attributes of
● There are also two other types of fragmentation: mixed, the relation.
and derived, a type of horizontal fragmentation. ● For example:
● Other possibility is no fragmentation: ● P1 = σ type=‘House’(PropertyForRent)
o If relation is small and not updated frequently, may be ● P2 = σ type=‘Flat’(PropertyForRent)
better not to fragment relation.

Horizontal Fragmentation

Horizontal fragment Consists of a subset of the tuples of a relation.

● Horizontal fragmentation groups together the tuples in a relation


that are collectively used by the important transactions.
● A horizontal fragment is produced by specifying a predicate that
performs a restriction on the tuples in the relation. ● This produces two fragments (P1 and P2), one consisting of those
● It is defined using the Selection operation of the relational algebra. tuples where the value of the type attribute is ‘House’ and the other
● The Selection operation groups together tuples that have some consisting of those tuples where the value of the type attribute is
common property; for example, the tuples are all used by the same ‘Flat’.
application or at the same site. ● The fragmentation schema satisfies the correctness rules:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Completeness ‘House’, type ‘Flat’} is complete, whereas the set {type
Each tuple in the relation appears in either fragment P1 or ‘House’} is not complete.
P2. ● On the other hand, with this requirement the predicate (city
o Reconstruction ‘Aberdeen’) would not be relevant.
The PropertyForRent relation can be reconstructed from Vertical Fragmentation
the fragments using the Union operation, thus: Vertical fragment Consists of a subset of the attributes of a relation.
P1 ∪P2 PropertyForRent ● Vertical fragmentation groups together the attributes in a relation
o Disjointness that are used jointly by the important transactions.
The fragments are disjoint; there can be no property type ● A vertical fragment is defined using the Projection operation of the
that is both ‘House’ and ‘Flat’. relational algebra.
● The predicates may be simple, involving single attributes, or ● Given a relation R, a vertical fragment is defined as:
complex, involving multiple attributes. ∏a1, ... ,an(R)
● The predicates for each attribute may be single-valued or where a1, . . . , an are attributes of the relation R.
multi-valued.
● The fragmentation strategy involves finding a set of minimal (that
is, complete and relevant) predicates that can be used as the basis
for the fragmentation schema.
● A set of predicates is complete if and only if any two tuples in the
same fragment are referenced with the same probability by any
transaction.

● A predicate is relevant if there is at least one transaction that
● For example:
accesses the resulting fragments differently.
▪ S1 = ∏staffNo, position, sex, DOB, salary(Staff)
● For example, if the only requirement is to select tuples from
▪ S2 = ∏staffNo, fName, lName, branchNo(Staff)
PropertyForRent based on the property type, the set {type
● Determined by establishing affinity of one attribute to another.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● This produces two fragments (S1 and S2) ▪ Each attribute in the Staff relation appears in
● Both fragments contain the primary key, staffNo, to enable the either fragment S1 or S2.
original relation to be reconstructed. o Reconstruction
▪ The Staff relation can be reconstructed from the
fragments using the
▪ Natural join operation, thus:

o Disjointness
● The fragments are disjoint except for
the primary key, which is necessary for
reconstruction.
● Vertical fragments are determined by establishing the affinity of
one attribute to another.
● One way to do this is to create a matrix that shows the number of
accesses that refer to each attribute pair.
● For example, a transaction that accesses attributes a1, a2, and a4 of
relation R with attributes (a1, a2, a3, a4), can be represented by the
following matrix:
Advantages of Vertical Fragmentation: a1 a2 a3 a4
● The fragments can be stored at the sites that need them. a1 1 0 1
● The performance is improved as the fragment is smaller than the a2 0 1
original base relation. a3 0
This fragmentation schema satisfies the correctness rules: a4
o Completeness
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● The matrix is triangular; the diagonal does not need to be filled in o Instead, mixed or hybrid fragmentation is required.
as the lower half is a mirror image of the upper half. Mixed fragment:
● The 1s represent an access involving the corresponding attribute o Consists of a horizontal fragment that is subsequently
pair, and are eventually replaced by numbers representing the vertically fragmented, or a vertical fragment that is then
transaction frequency. horizontally fragmented.
● A matrix is produced for each transaction and an overall matrix is ● A mixed fragment is defined using the Selection and Projection
produced showing the sum of all accesses for each attribute pair. operations of the relational algebra.
● Pairs with high affinity should appear in the same vertical ● Given a relation R, a mixed fragment is defined as:
fragment; pairs with low affinity may be separated. σ p(∏a1, ... ,an(R))
● If working with single attributes and all major transactions may be or
a lengthy calculation. ∏a1, ... ,an(σp(R))
● Therefore, if it is known that some attributes are related, it may be where p is a predicate based on one or more attributes of
prudent to work with groups of attributes instead. R and a1, . . . , an are attributes of R.
● This approach is known as splitting. Example - Mixed Fragmentation
o It produces a set of non-overlapping fragments, which S1 = ∏staffNo, position, sex, DOB, salary(Staff)
ensures compliance with the disjointness S2 = ∏staffNo, fName, lName, branchNo(Staff)
o The non-overlapping characteristic applies only to S21 = σ branchNo=‘B003’(S2)
attributes that are not part of the primary key. S22 = σ branchNo=‘B005’(S2)
o Primary key fields appear in every fragment and so can be S23 = σ branchNo=‘B007’(S2)
omitted from the analysis.

Mixed Fragmentation
o For some applications horizontal or vertical fragmentation of a
database schema by itself is insufficient to adequately distribute the
data.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES

● This produces three fragments (S21, S22, and S23), one consisting
of those tuples where the branch number is B003 (S21), one
consisting of those tuples where the branch number is B005 (S22),
and the other consisting of those tuples where the branch number is
B007 (S23)

● The fragmentation schema satisfies the correctness rules:


o Completeness
▪ Each attribute in the Staff relation appears in
either fragments S1 or S2; each (part) tuple
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
appears in fragment S1 and either fragment S21, ● We use the term child to refer to the relation that contains the
S22, or S23. foreign key and parent to the relation containing the targeted
o Reconstruction primary key.
▪ The Staff relation can be reconstructed from the ● Derived fragmentation is defined using the Semijoin operation of
fragments using the Union and Natural join the relational algebra.
operations, thus: ● Given a child relation R and parent S, the derived fragmentation of
R is defined as:

o Disjointness
o
▪ The fragments are disjoint; there can be no staff
▪ where w is the number of horizontal fragments
member who works in more than one branch and
defined on S and f is the join attribute.
S1 and S2 are disjoint except for the necessary
Example - Derived Horizontal Fragmentation
duplication of primary key.
● We may have an application that joins the Staff and
Derived Horizontal Fragmentation
PropertyForRent relations together. For this example, we assume
● Some applications may involve a join of two or more relations.
that Staff is horizontally fragmented according to the branch
● If the relations are stored at different locations, there may be a
number, so that data relating to the branch is stored locally:
significant overhead in processing the join.
o S3 = σ branchNo=‘B003’(Staff)
● To avoid overhead it may be more appropriate to ensure that the
o S4 = σ branchNo=‘B005’(Staff)
relations, or fragments of relations, are at the same location. This
o S5 = σ branchNo=‘B007’(Staff)
can be achieved using derived horizontal fragmentation.
● We also assume that property PG4 is currently managed by SG14.
Derived fragment :
● It would be useful to store property data using the same
● A horizontal fragment that is based on the horizontal fragmentation
fragmentation strategy.
of a parent relation.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● This is achieved using derived fragmentation to horizontally ● If a relation contains more than one foreign key, it will be
fragment the PropertyForRent relation according to branch necessary to select one of the referenced relations as the parent.
number: ● The choice can be based on the fragmentation used most frequently
or the fragmentation with better join characteristics, that is, the join
involving smaller fragments or the join that can be performed in

parallel to a greater degree.
● This produces three fragments (P3, P4, and P5), one consisting of
No fragmentation
those properties managed by staff at branch number B003 (P3),
● A final strategy is not to fragment a relation.
one consisting of those properties managed by staff at branch B005
● For example, the Branch relation contains only a small number of
(P4), and the other consisting of those properties managed by staff
tuples and is not updated very frequently.
at branch B007 (P5)
● Rather than trying to horizontally fragment the relation on, for
example, branch number, it would be more sensible to leave the
relation whole and simply replicate the Branch relation at each site.

This fragmentation schema satisfies the correctness rules.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES

TOPIC 7: DISTRIBUTED QUERY PROCESSING


● Distributed query optimization is more complex due to the
distribution of the data.
● Figure shows how the distributed query is processed and optimized
as a number of separate layers consisting of:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● This layer takes into account how the data has been distributed.
● A further iteration of optimization is performed by replacing the
global relations at the leaves of the relational algebra tree with
their reconstruction algorithms (sometimes called data localization
programs).
● The relational algebra operations that reconstruct the global
relations from the constituent fragments.
Global optimization
● This layer takes account of statistical information to find a near
optimal execution plan.
● The output from this layer is an execution strategy based on
fragments with communication primitives added to send parts of
the query to the local DBMSs to be executed there and to receive
the results.
Local optimization
● Whereas the first three layers are run at the control site (typically
the site that launched the query), this particular layer is run at each
of the local sites involved in the query.
● Each local DBMS will perform its own local optimization.
Query decomposition Data Localization
● This layer takes a query expressed on the global relations and ● The objective of this layer is to take a query expressed as some
performs a partial optimization. form of relational algebra tree.
● The output is some form of relational algebra tree based on global ● It take account of data distribution to perform further optimization
relations. using heuristic rules.
Data localization
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● To do this, we replace the global relationsbat the leaves of the tree ● Then examine each individual join to determine whether there are
with their reconstruction algorithms, that is the any useless joins that can be eliminated from result.
relationalbalgebra operations that reconstruct the global relations ● A useless join exists if fragment predicates do not overlap.
from the constituent fragments. Example Reduction for Primary Horizontal Fragmentation
● For horizontal fragmentation, the reconstruction algorithm is the SELECT *
Union operation. FROM Branch b, PropertyForRent p
● For vertical fragmentation, it is the Join operation. WHERE b.branchNo = p.branchNo AND p.type = ‘Flat’;
● The relational algebra tree formed by applying the reconstruction P1: σbranchNo=‘B003’ ∧ type=‘House’ (PropertyForRent)
algorithms is sometimes known as the generic relational algebra P2: σbranchNo=‘B003’ ∧ type=‘Flat’ (PropertyForRent)
tree. P3: σbranchNo!=‘B003’ (PropertyForRent)
● We use reduction techniques to generate a simpler and optimized B1: σbranchNo=‘B003’ (Branch)
query. B2: σbranchNo!=‘B003’ (Branch)
● The particular reduction technique we employ is dependent on the
type of fragmentation involved.
● We consider reduction techniques for the following types of
fragmentation:
o primary horizontal fragmentation;
o vertical fragmentation;
o derived horizontal fragmentation.
Reduction for Primary Horizontal Fragmentation
● If selection predicate contradicts definition of fragment, this
produces empty intermediate relation and operations can be
eliminated.
● For join, commute join with union.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES

Reduction for Vertical Fragmentation


● Reduction for vertical fragmentation involves removing those
vertical fragments that have no attributes in common with
Reduction for Derived Fragmentation
projection attributes, except the key of the relation.
Example Reduction for Vertical Fragmentation
● Use transformation rule that allows join and union to be
SELECT fName, lName
commuted.
FROM Staff;
● Using knowledge that fragmentation for one relation is based on
S1: ΠstaffNo, position, sex, DOB, salary(Staff)
the other and, in commuting, some of the partial joins should be
S2: ΠstaffNo, fName, lName, branchNo (Staff) redundant.
Example Reduction for Derived Fragmentation
SELECT *
FROM Branch b, Client c
WHERE b.branchNo = c.branchNo AND
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
b.branchNo = ‘B003’;
B1 = σbranchNo=‘B003’ (Branch) Distributed Joins
B2 = σbranchNo!=‘B003’ (Branch) ● The Join is one of the most expensive relational algebra operations.
One approach used in distributed query optimization is to replace
Joins by combinations of Semijoins.
● The Semijoin operation has the important property of reducing the
size of the operand relation.
● When the main cost component is communication time, the
Semijoin operation is particularly useful for improving the
processing of distributed joins by reducing the amount of data
transferred between sites.

Example:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● The use of Semijoins is beneficial if there are only a few tuples of o R* algorithm;
R1 that participate in the join of R1 and R2. o SDD-1 algorithm.
● The join approach is better if most tuples of R1 participate in the o AHY
join, because the Semijoin approach requires an additional transfer o Distributed Ingres
of a projection on the join attribute. Global Optimization – R*
Global Optimization ● R* uses a cost model based on total cost and static query
● Objective of this layer is to take the reduced query plan for the optimization.
data localization layer and find a near-optimal execution ● Like centralized System R optimizer, algorithm is based on an
strategy. exhaustive search of all join orderings, join methods (nested loop
● In distributed environment, speed of network has to be or sort-merge join), and various access paths for each relation.
considered when comparing strategies. ● When Join is required involving relations at different sites, R*
● If know topology is that of WAN, could ignore all costs other selects the sites to perform Join and method of transferring data
than network costs. between sites.
● LAN typically much faster than WAN, but still slower than
disk access. ● For a Join of R and S with R at site 1 and S at site 2, there are three
● Cost model could be based on total cost (time), as in candidate sites:
centralized DBMS, or response time. Latter uses parallelism o site 1, where R is located;
inherent in DDBMS. o site 2, where S is located;
o some other site (e.g., site of relation T, which is to be
joined with join of R and S).
● In R*, there are 2 methods for transferring data:
o Ship whole relation
o Fetch tuples as needed.
● First method incurs a larger data transfer but fewer message then
● The distributed query optimization algorithms:
second.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● R* considers only the following methods:
o Nested loop, ship whole outer relation to site of inner.
o Sort-merge, ship whole inner relation to site of outer.
o Nested loop, fetch tuples of inner relation as needed for
each tuple of outer relation.
o Sort-merge, fetch tuples of inner relation as needed for
each tuple of outer relation.
o Ship both relations to third site.
Global Optimization – SDD-1
● Based on an earlier method known as “hill climbing”, a greedy
algorithm that starts with an initial feasible solution which is then
iteratively improved.
● Modified to make use of Semijoin to reduce cardinality of join
operands.
● Like R*, SDD-1 optimizer minimizes total cost, although unlike
R* it ignores local processing costs and concentrates on
communication message size.
● Like R*, query processing timing used is static.
● Based on concept of “beneficial Semijoins”.
● Communication cost of Semijoin is simply cost of transferring join
attribute of first operand to site of second operand.
● “Benefit” of Semijoin is taken as cost of transferring irrelevant
tuples of first operand, which Semijoin avoids.
SDD-1 Algorithm proceeds as follows:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Phase 1 – Initialization: Perform all local reductions using
Selection and Projection. Execute Semijoins within same site
to reduce sizes of relations. Generate set of all beneficial
Semijoins across sites (Semijoin is beneficial if its cost is less
than its benefit).
● Phase 2 – Selection of beneficial Semijoins: Iteratively select
most beneficial Semijoin from set generated and add it to
execution strategy. After each iteration, update database
statistics to reflect incorporation of the Semijoin and update
the set with new beneficial Semijoins.
● Phase 3 – Assembly site selection: Select, among all sites, site
to which transmission of all relations incurs a minimum cost.
Choose site containing largest amount of data after reduction
phase so that sum of the amount of data transferred from other
sites will be minimum.
● Phase 4 – Postoptimization: Discard useless Semijoins; e.g. if
R resides in assembly site and R is due to be reduced by
Semijoin, but is not used to reduce other relations after
Semijoin, then since R need not be moved to another site
during assembly phase, Semijoin on R is useless and can be
discarded.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● In a distributed DBMS, these modules still exist in each local
DBMS.
● In addition, there is also a global transaction manager or
Topic:Distributed Transaction Processing transaction coordinator at each site to coordinate the execution
of both the global and local transactions initiated at that site.
● The objectives of distributed transaction processing are the same as ● Inter-site communication is still through the data communications
those of centralized systems, although more complex because the component (transaction managers at different sites do not
DDBMS must also ensure the atomicity of the global transaction communicate directly with each other).
and each component subtransaction. PROCEDURE TO EXECUTE A GLOBAL TRANSACTION
● The transaction manager coordinates transactions on behalf of INITIATED AT SITE IS AS FOLLOWS:
application programs, communicating with the scheduler. (i) The transaction coordinator (TC1) at site S1 divides the
● The Scheduler the module responsible for implementing a transaction into a number of subtransactions using information
particular strategy for concurrency control. The objective of the held in the global system catalog.
scheduler is to maximize concurrency without allowing (ii) The data communications component at site S1 sends the
concurrently executing transactions to interfere with one another subtransactions to the appropriate sites, S2 and S3, say.
and thereby compromise the consistency of the database. (iii) The transaction coordinators at sites S2 and S3 manage these
● In the event of a failure occurring during the transaction, the subtransactions.
recovery manager ensures that the database is restored to the state (iv) The results of subtransactions are communicated back to TC1
it was in before the start of the transaction, and therefore a via the data communications components.
consistent state. The recovery manager is also responsible for
restoring the database to a consistent state following a system
failure.
● The buffer manager is responsible for the efficient transfer of data
between disk storage and main memory.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
⮚ For global transactions, this task is much more complicated, since
several the failure of a communication link connecting these sites,
may result in erroneous computations.

SYSTEM STRUCTURE:
Local Transaction Manager:
⮚ Each site has its own local transaction manager.
⮚ Function is to ensure the ACID properties of those transactions that
execute at that site.
⮚ The various transaction managers cooperate to execute global
transactions.
Transaction Processing (Referred from Silberschatz)
⮚ Consider an abstract model of a transaction system, in which each
Distributed Transactions
site contains two subsystems:
⮚ Access to the various data items in a distributed system is usually
o The transaction manager:
accomplished through transactions, which must preserve the ACID
▪ It manages the execution of those transactions
properties.
(or subtransactions) that access data stored in a
⮚ There are two types of transaction that we need to consider.
local site.
⮚ The local transactions are those that access and update data in
▪ Each such transaction may be either a local
only one local database.
transaction (that is, a transaction that executes at
⮚ The global transactions are those that access and update data in
only that site) or
several local databases.
▪ part of a global transaction (that is, a transaction
⮚ Ensuring the ACID properties of the local transactions can be done
that executes at several sites).
easily.
o The transaction coordinator coordinates the execution
of the various transactions (both local and global) initiated
at that site.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
• Breaking the transaction into a number of subtransactions
and distributing these subtransactions to the appropriate
sites for execution
• Coordinating the termination of the transaction, which
may result in the transaction being committed at all sites
or aborted at all sites
System Failure Modes
● A distributed system may suffer from the same types of failure that
a centralized system.
● There are additional types of failure with which we need to deal in
a distributed environment.
● The basic failure types are
⮚ Each transaction manager is responsible for
• Failure of a site
• Maintaining a log for recovery purposes
• Loss of messages
• Participating in an appropriate concurrency-control
• Failure of a communication link
scheme to coordinate the concurrent execution of the
• Network partition
transactions executing at that site
● The loss or corruption of messages is always a possibility in a
distributed system.
⮚ A transaction coordinator, as its name implies, is responsible for
● The system uses transmission-control protocols, such as TCP/IP, to
coordinating the execution of all the transactions initiated at that
handle such errors.
site.
● If two sites A and B are not directly connected, messages from one
⮚ For each such transaction, the coordinator is responsible for
to the other must be routed through a sequence of communication
• Starting the execution of the transaction
links.
● If a communication link fails, messages that would have been
transmitted across the link must be rerouted.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● It is possible to find another route through the network, so that the
messages are able to reach their destination.
● In other cases, a failure may result in there being no connection
between some pairs of sites.
● A system is partitioned if it has been split into two (or more)
subsystems, called partitions that lack any connection between
them.

Topic:Distributed Concurrency Control


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Objectives (iv) multiple-copy consistency problem
Distributed Serializability ● Multiple-copy consistency problem occurs when a data item is
Locking Protocols replicated in different locations.
🡪Centralized 2PL ● To maintain consistency of the global database, when a replicated
🡪Primary Copy 2PL data item is updated at one site all other copies of the data item
🡪Distributed 2PL must also be updated.
🡪Majority Locking ● If a copy is not updated, the database becomes inconsistent.

🡪Timestamp Protocols Distributed Serializability:

Objectives ● If the schedule of transaction execution at each site is serializable,

All concurrency control mechanisms must ensure that: then the global schedule (the union of all local schedules) is also

(I) the consistency of data items is preserved and serializable provided local serialization orders are identical.

(II) that each atomic action is completed in a finite time. ● This requires that all subtransactions appear in the same order in

A good concurrency control mechanism for distributed DBMSs should: the equivalent serial schedule at all sites.

● be resilient to site and communication failure;


● permit parallelism to satisfy performance requirements;
● incur modest computational and storage overhead; ●

● perform satisfactorily in a network environment that has The solutions to concurrency control in a distributed environment

significant communication delay; are based on the two main approaches of locking and

● place few constraints on the structure of atomic actions timestamping,

The problem in distributed environment when multiple user’s access Given a set of transactions to be executed concurrently, then:

concurrently are: ● locking guarantees that the concurrent execution is equivalent to

(i) problems of lost update, some (unpredictable) serial execution of those transactions

(ii) uncommitted dependency, and ● Timestamping guarantees that the concurrent execution is

(iii) inconsistent analysis equivalent to a specific serial execution of those transactions,


corresponding to the order of the timestamps.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Locking Protocols o The coordinator can elect to use any copy of the data item
The protocols based on two-phase locking (2PL) that can be employed to for reads, generally the copy at its site, if one exists.
ensure serializability for distributed DBMSs: (2) The local transaction managers involved in the global
(i) centralized 2PL, transaction request and release locks from the centralized lock
(ii) primary copy 2PL, manager using the normal rules for two-phase locking.
(iii) distributed 2PL, The centralized lock manager checks that a request for a
(iv) and majority locking. lock on a data item is compatible with the locks that currently
Centralized 2PL exist. If it is, the lock manager sends a message back to the
⮚ With the centralized 2PL protocol there is a single site that originating site acknowledging that the lock has been granted.
maintains all locking information. Otherwise, it puts the request in a queue until the lock can be
⮚ There is only one scheduler, or lock manager, for the whole of the granted.
distributed DBMS that can grant and release locks. ● A variation of this scheme is for the transaction coordinator to
⮚ The centralized 2PL protocol for a global transaction initiated at make all locking requests on behalf of the local transaction
site S1 works as follows: managers.
(1) The transaction coordinator at site S1 divides the transaction ● The lock manager interacts only with the transaction coordinator
into a number of subtransactions, using information held in the and not with the individual local transaction managers.
global system catalog Advantage:
o The coordinator has responsibility for ensuring that ● Implementation is relatively straightforward.
consistency is maintained. ● Deadlock detection is no more difficult than that of a centralized
o If the transaction involves an update of a data item that is DBMS, because one lock manager maintains all lock information.
replicated, the coordinator must ensure that all copies of ● Communication costs are relatively low.
the data item are updated. o For example, a global update operation that has agents
o The coordinator requests exclusive locks on all copies (subtransactions) at n sites may require a minimum of 2n
before updating each copy and releasing the locks. 3 messages with a centralized lock manager:
o 1 lock request;
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o 1 lock grant message; ● The main difference is that when an item is to be updated, the
o n update messages; transaction coordinator must determine where the primary copy is,
o n acknowledgements; in order to send the lock requests to the appropriate lock manager.
o 1 unlock request. ● It is only necessary to exclusively lock the primary copy of the
Disadvantage: data item that is to be updated.
● The disadvantages with centralization in a distributed DBMS are ● Once the primary copy has been updated, the change can be
bottlenecks and lower reliability. propagated to the slave copies.
● As all lock requests go to one central site, that site may become a ● This propagation should be carried out as soon as possible to
bottleneck. prevent other transactions’ reading out-of-date values.
● The system may also be less reliable since the failure of the central ● It is not strictly necessary to carry out the updates as an atomic
site would cause major system failures. operation.
● This protocol guarantees only that the primary copy is current.
Primary copy 2PL ● Use: This approach can be used when data is selectively replicated,
● This protocol attempts to overcome the disadvantages of updates are infrequent, and sites do not always need the very latest
centralized 2PL by distributing the lock managers to a number of version of data.
sites. ● The disadvantages of this approach are that deadlock handling is
● Each lock manager is then responsible for managing the locks for a more complex owing to multiple lock managers, and that there is
set of data items. still a degree of centralization in the system: lock requests for a
● For each replicated data item, one copy is chosen as the primary specific primary copy can be handled only by one site.
copy; ● This latter disadvantage can be partially overcome by nominating
● The other copies are called slave copies. backup sites to hold locking information.
● The choice of which site to choose as the primary site is flexible, ● Advantage:This approach has lower communication costs and
and the site that is chosen to manage the locks for a primary copy better performance than centralized 2PL since there is less remote
need not hold the primary copy of that item. locking.
● The protocol is a straightforward extension of centralized 2PL. Distributed 2PL
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● This protocol again attempts to overcome the disadvantages of ● This could be reduced to 4n messages if the unlock requests are
centralized 2PL, this time by distributing the lock managers to omitted and handled by the final commit operation.
every site. ● Distributed 2PL is used in System R*.
● Each lock manager is then responsible for managing the locks for Majority locking
the data at that site. ● This protocol is an extension of distributed 2PL to overcome
● If the data is not replicated, this protocol is equivalent to primary having to lock all copies of a replicated item before an update.
copy 2PL. Otherwise, distributed 2PL implements a ● The system maintains a lock manager at each site to manage the
Read-One-Write-All (ROWA) replica control protocol. locks for all data at that site.
● This means that any copy of a replicated item can be used for a ● When a transaction wishes to read or write a data item that is
read operation, but all copies must be exclusively locked before an replicated at n sites, it must send a lock request to more than half
item can be updated. of the n sites where the item is stored.
● This scheme deals with locks in a decentralized manner, thus ● The transaction cannot proceed until it obtains locks on a majority
avoiding the drawbacks of centralized control. of the copies.
● The disadvantages of this approach are that deadlock handling is ● If the transaction does not receive a majority within a certain
more complex owing to multiple lock managers and that timeout period, it cancels its request and informs all sites of the
communication costs are higher than primary copy 2PL, as all cancellation.
items must be locked before update. ● If it receives a majority, it informs all sites that it has the lock.
● A global update operation that has agents at n sites, may require a ● Any number of transactions can simultaneously hold a shared lock
minimum of 5n messages with this protocol: on a majority of the copies; however, only one transaction can hold
o n lock request messages; an exclusive lock on a majority of the copies.
o n lock grant messages; ● Advantage: This scheme avoids the drawbacks of centralized
o n update messages; control.
o n acknowledgements; ● The disadvantages are that the protocol is more complicated,
o n unlock requests. deadlock detection is more complex, and locking requires at least
[(n 1)/2] messages for lock requests and [(n 1)/2] messages
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
for unlock requests. This technique works but is overly strong in ● On receiving a message, a site compares its timestamp with the
the case of shared locks: correctness requires only that a single timestamp in the message and, if its timestamp is smaller, sets it to
copy of a data item be locked, namely the item that is read, but this some value greater than the message timestamp.
technique requests locks on a majority of copies. ● For example, if site 1 with current timestamp <10, 1> sends a
Timestamp Protocols message to site 2 with current timestamp <15, 2>, then site 2
● The objective of timestamping is to order transactions globally in would not change its timestamp.
such a way that older transactions –transactions with smaller ● On the other hand, if the current timestamp at site 2 is <5, 2> then
timestamps – get priority in the event of conflict. site 2 would change its timestamp to <11, 2>.
● In a distributed environment, we still need to generate unique
timestamps both locally and globally.
● Using the system clock or an incremental event counter at each
site, would be unsuitable.
● Clocks at different sites would not be synchronized; equally well,
if an event counter were used, it would be possible for different
sites to generate the same value for the counter.
● The general approach in distributed DBMSs is to use the
concatenation of the local timestamp with a unique site identifier, TOPIC – I RECOVERY COMMIT PROTOCOLS
<local timestamp, site identifier>.
● The site identifier is placed in the least significant position to Failure in a Distributed Environment

ensure that events can be ordered according to their occurrence as


How failure affect recovery
opposed to their location.
● To prevent a busy site generating larger timestamps than slower 🡪Distributed Recovery Protocols
sites, sites synchronize their timestamps.
Two Phase Commit
● Each site includes its timestamp in inter-site messages.
🡪Phase I
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
🡪Phase II
🡪Termination Protocol for 2PC
🡺 Coordinator
Failures in a Distributed Environment
🡺 Participant
Recovery Protocols for 2PC ● There are four types of failure that are particular to distributed
Co ordinator Failure DBMSs:
Participant Failure ● the loss of a message;

Election Protocols ● the failure of a communication link;

Communication Topologies for 2PC ● the failure of a site;


● network partitioning.
The loss of messages, or improperly ordered messages, is the responsibility
Three Phase Commit of the underlying computer network protocol.
Termination Protocol for 3PC
🡺 Coordinator
🡺 Participant
Recovery Protocols for 3PC
Terminator Protocol following the election of new coordinator
● A DDBMS is highly dependent on the ability of all sites in the
network to communicate reliably with one another.
Network Partitioning
● Network technology has improved significantly and current
Identifying Updates
networks are much more reliable, communication failures can still
Maintaining Integrity
occur.
Pessimistic Protocol ● Communication failures can result in the network becoming split
Optimistic Protocol into two or more partitions, where sites within the same partition
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
can communicate with one another, but not with sites in other ● In some cases it is difficult to distinguish whether a
partitions. communication link or a site has failed.
● For example, suppose that site S1 cannot communicate with site S2
within a fixed (timeout) period.
● It could be that:
o site S2 has crashed or the network has gone down;
o the communication link has failed;
o the network is partitioned;
o site S2 is currently very busy and has not had time to
respond to the message.
● Choosing the correct value for the timeout, which will allow S1 to
conclude that it cannot communicate with site S2, is difficult.

How Failures Affect Recovery


● As with local recovery, distributed recovery aims to maintain the
atomicity and durability of distributed transactions.
● To ensure the atomicity of the global transaction, the DDBMS
must ensure that subtransactions of the global transaction either all
commit or all abort.
● If the DDBMS detects that a site has failed or become inaccessible,
it needs to carry out the following steps:
o Abort any transactions that are affected by the failure.
● Figure shows an example of network partitioning where, following o n Flag the site as failed, to prevent any other site from
the failure of the link connecting sites S1 → S2, sites (S1, S4, S5) trying to use it.
are partitioned from sites (S2, S3).
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Check periodically to see whether the site has recovered ● In addition, the modified protocol should cater for both site and
or, alternatively, wait for the failed site to broadcast that it communication failures to ensure that the failure of one site does
has recovered. not affect processing at another site.
o On restart, the failed site must initiate a recovery ● Operational sites should not be left blocked.
procedure to abort any partial transactions that were ● Protocols that obey this are referred to as non-blocking protocols.
active at the time of the failure.
o After local recovery, the failed site must update its copy ● We consider two common commit protocols suitable for
of the database to make it consistent with the rest of the distributed DBMSs: two-phase commit (2PC) and three-phase
system. commit (3PC), a non-blocking protocol.
● If a network partition occurs as in the above example, the DDBMS ● Every global transaction has one site that acts as coordinator (or
must ensure that if agents of the same global transaction are active transaction manager) for that transaction, which is generally the
in different partitions, then it must not be possible for site S1, and site at which the transaction was initiated.
other sites in the same partition, to decide to commit the global ● Sites at which the global transaction has agents are called
transaction, while site S2, and other sites in its partition, decide to participants (or resource managers).
abort it. This would violate global transaction atomicity. ● The coordinator knows the identity of all participants and that each
Distributed recovery protocols participant knows the identity of the coordinator but not
● Recovery in a DDBMS is complicated by the fact that atomicity is necessarily of the other participants.
required for both the local subtransactions and for the global
transactions. Two-Phase Commit (2PC)
● The recovery techniques guarantee the atomicity of ● 2PC operates in two phases: a voting phase and a decision phase.
subtransactions, but the DDBMS needs to ensure the atomicity of ● The basic idea is that the coordinator asks all participants whether
the global transaction. they are prepared to commit the transaction.
● This involves modifying the commit and abort processing so that a ● If one participant votes to abort, or fails to respond within a
global transaction does not commit or abort until all its timeout period, then the coordinator instructs all participants to
subtransactions have successfully committed or aborted. abort the transaction.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● If all vote to commit, then the coordinator instructs all participants o If a participant returns an ABORT vote, write an abort
to commit the transaction. record to the log file and forcewrite it to stable storage.
● The global decision must be adopted by all participants. If a Send a GLOBAL_ABORT message to all participants.
participant votes to abort, then it is free to abort the transaction Wait for participants to acknowledge within a timeout
immediately; in fact, any site is free to abort a transaction at any period.
time up until it votes to commit. This type of abort is known as a o If a participant returns a READY_COMMIT vote, update
unilateral abort. the list of participants who have responded. If all
● If a participant votes to commit, then it must wait for the participants have voted COMMIT, write a commit record
coordinator to broadcast either the global commit or global abort to the log file and force-write it to stable storage. Send a
message. GLOBAL_COMMIT message to all participants. Wait for
● This protocol assumes that each site has its own local log, and can participants to acknowledge within a timeout period.
therefore rollback or commit the transaction reliably. o Once all acknowledgements have been received, write an
● Two-phase commit involves processes waiting for messages from end_transaction message to the log file. If a site does not
other sites. acknowledge, resend the global decision until an
● To avoid processes being blocked unnecessarily, a system of acknowledgement is received.
timeouts is used.
● The coordinator must wait until it has received the votes from all
● The procedure for the coordinator at commit is as follows: participants.
Phase 1 ● If a site fails to vote, then the coordinator assumes a default vote of
o Write a begin_commit record to the log file and ABORT and broadcasts a GLOBAL_ABORT message to all
force-write it to stable storage. Send a PREPARE message participants.
to all participants. Wait for participants to respond within ● The procedure for a participant at commit is as follows:
a timeout period. o When the participant receives a PREPARE message, then
Phase 2 either:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
▪ write a ready_commit record to the log file and ● The processing for the case when participants vote COMMIT and
force-write all log records for the transaction to ABORT is shown in Figure.
stable storage. Send a READY_COMMIT
message to the coordinator,
or
▪ write an abort record to the log file and
force-write it to stable storage. Send an ABORT
message to the coordinator. Unilaterally abort the
transaction.
Wait for the coordinator to respond within a
timeout period.
o If the participant receives a GLOBAL_ABORT message,
write an abort record to the log file and force-write it to
stable storage. Abort the transaction and, on completion,
send an acknowledgement to the coordinator.
o If the participant receives a GLOBAL_COMMIT
message, write a commit record to the log file and
force-write it to stable storage. Commit the transaction,
releasing any locks it holds, and on completion send an
● The participant has to wait for either the GLOBAL_COMMIT or
acknowledgement to the coordinator.
GLOBAL_ABORT instruction from the coordinator.
● If the participant fails to receive the instruction from the
● If a participant fails to receive a vote instruction from the
coordinator, or the coordinator fails to receive a response from a
coordinator, it simply times out and aborts.
participant, then it assumes that the site has failed and a
● Therefore, a participant could already have aborted and performed
termination protocol must be invoked.
local abort processing before voting.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Only operational sites follow the termination protocol; sites that
have failed follow the recovery protocol on restart.
Termination protocols for 2PC
● A termination protocol is invoked whenever a coordinator or
participant fails to receive an expected message and times out.
● The action to be taken depends on whether the coordinator or
participant has timed out and on when the timeout occurred.
Coordinator
● The coordinator can be in one of four states during the commit
process: INITIAL, WAITING, DECIDED, and COMPLETED, as
shown in the state transition diagram in Figure (a),

but can time out only in the middle two states. The actions to be taken
are as follows:
o Timeout in the WAITING state The coordinator is waiting
for all participants to acknowledge whether they wish to
commit or abort the transaction. In this case, the
coordinator cannot commit the transaction because it has
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
not received all votes. However, it can decide to globally
abort the transaction.
o Timeout in the DECIDED state The coordinator is
waiting for all participants to acknowledge whether they
have successfully aborted or committed the transaction. In
this case, the coordinator simply sends the global decision
again to sites that have not acknowledged.
Participant
● The simplest termination protocol is to leave the participant
process blocked until communication with the coordinator is
re-established.
● The participant can then be informed of the global decision and
resume processing accordingly.
● There are other actionsBthat may be taken to improve
performance.
● A participant can be in one of four states during the commit
process: INITIAL, PREPARED, ABORTED, and COMMITTED,
as shown in Figure (b). ● However, a participant may time out only in the first two states as
follows:
o Timeout in the INITIAL state The participant is waiting
for a PREPARE message from the coordinator, which
implies that the coordinator must have failed while in the
INITIAL state. In this case, the participant can
unilaterally abort the transaction. If it subsequently
receives a PREPARE message, it can either ignore it, in
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
which case the coordinator times out and aborts the global Recovery protocols for 2PC
transaction, or it can send an ABORT message to the ● We now consider the action to be taken by a failed site on
coordinator. recovery.
o Timeout in the PREPARED state The participant is ● The action on restart again depends on what stage the coordinator
waiting for an instruction to globally commit or abort the or participant had reached at the time of failure.
transaction. The participant must have voted to commit Coordinator failure
the transaction, so it cannot change its vote and abort the ● We consider three different stages for failure of the coordinator:
transaction. Equally well, it cannot go ahead and commit o Failure in INITIAL state The coordinator has not yet
the transaction, as the global decision may be to abort. started the commit procedure. Recovery in this case starts
Without further information, the participant is blocked. the commit procedure.
However, the participant could contact each of the other o Failure in WAITING state The coordinator has sent the
participants attempting to find one that knows the PREPARE message and although it has not received all
decision. This is known as the cooperative termination responses, it has not received an abort response. In this
protocol. A straightforward way of telling the participants case, recovery restarts the commit procedure.
who the other participants are is for the coordinator to o Failure in DECIDED state The coordinator has instructed
append a list of participants to the vote instruction. the participants to globally abort or commit the
● The cooperative termination protocol reduces the likelihood of transaction. On restart, if the coordinator has received all
blocking, blocking is still possible and the blocked process will acknowledgements, it can complete successfully.
just have to keep on trying to unblock as failures are repaired. Otherwise, it has to initiate the termination protocol.
● If it is only the coordinator that has failed and all participants Participant failure
detect this as a result of executing the termination protocol, then ● The objective of the recovery protocol for a participant is to ensure
they can elect a new coordinator and resolve the block. that a participant process on restart performs the same action as all
other participants, and that this restart can be performed
independently (that is, without the need to consult either the
coordinator or the other participants).
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● We consider three different stages for failure of a participant: ● If a site Sk receives a message from a lower-numbered participant,
o Failure in INITIAL state The participant has not yet voted then Sk knows that it is not to be the new coordinator and stops
on the transaction. Therefore, on recovery it can sending messages.
unilaterally abort the transaction, as it would have been ● This protocol is relatively efficient and most participants stop
impossible for the coordinator to have reached a global sending messages quite quickly.
commit decision without this participant’s vote. ● Eventually, each participant will know whether there is an
o Failure in PREPARED state The participant has sent its operational participant with a lower number.
vote to the coordinator. In this case, recovery is via the ● If there is not, the site becomes the new coordinator.
termination protocol. ● If the newly elected coordinator also times out during this process,
o Failure in ABORTED/COMMITTED states The the election protocol is invoked again.
participant has completed the transaction. Therefore, on ● After a failed site recovers, it immediately starts the election
restart, no further action is necessary. protocol.
Election protocols ● If there are no operational sites with a lower number, the site forces
● If the participants detect the failure of the coordinator (by timing all higher-numbered sites to let it become the new coordinator,
out) they can elect a new site to act as coordinator. regardless of whether there is a new coordinator or not.
● One election protocol is for the sites to have an agreed linear Communication topologies for 2PC
ordering. ● There are several different ways of exchanging messages, or
● We assume that site Si has order i in the sequence, the lowest being communication topologies, that can be employed to implement
the coordinator, and that each site knows the identification and 2PC.
ordering of the other sites in the system, some of which may also o Centralized 2PC,
have failed. ▪ In centralized 2PC all communication is
● One election protocol asks each operational participant to send a funneled through the coordinator, as shown in
message to the sites with a greater identification number. Figure (a).
● Thus, site Si would send a message to sites Si+1, Si+2, . . . , Sn in
that order.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES

● A number of improvements to the centralized 2PC protocol have


been proposed that attempt to improve its overall performance,
either by reducing the number of messages that need to be
exchanged, or by speeding up the decision-making process. ● In linear 2PC, sites are ordered 1, 2, . . . , n, where site 1 is the

● These improvements depend upon adopting different ways of coordinator and the remaining sites are the participants. The 2PC

exchanging messages. protocol is implemented by a forward chain of communication

Linear 2PC from coordinator to participant n for the voting phase and a

● In linear 2PC, participants can communicate with each other, as backward chain of communication from participant n to the

shown in Figure. coordinator for the decision phase.


● In the voting phase, the coordinator passes the vote instruction to
site 2, which votes and then passes its vote to site 3.
● Site 3 then combines its vote with that of site 2 and transmits the
combined vote to site 4, and so on.
● When the nth participant adds its vote, the global decision is
obtained and this is passed backwards to participants n − 1, n − 2,
etc. and eventually back to the coordinator.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Although linear 2PC incurs fewer messages than centralized 2PC, ● This in effect eliminates the need for the decision phase of the 2PC
the linear sequencing does not allow any parallelism. protocol, since the participants can reach a decision consistently,
● Linear 2PC can be improved if the voting process adopts the but independently.
forward linear chaining of messages, while the decision process Three-Phase Commit (3PC)
adopts the centralized topology, so that site n can broadcast the ● 2PC is not a non-blocking protocol, since it is possible for sites to
global decision to all participants in parallel. become blocked in certain circumstances.
Distributed 2PC ● For example, a process that times out after voting commit but
● A third proposal, known as distributed 2PC, uses a distributed before receiving the global instruction from the coordinator, is
topology, as shown in Figure (c). blocked if it can communicate only with sites that are similarly
unaware of the global decision.
● The probability of blocking occurring in practice is sufficiently
rare that most existing systems use 2PC.
● An alternative non-blocking protocol, called the three-phase
commit (3PC) protocol, has been proposed.
● Three-phase commit is nonblocking for site failures, except in the
event of the failure of all sites.
● Communication failures can, however, result in different sites
reaching different decisions, thereby violating the atomicity of
global transactions.
● The coordinator sends the PREPARE message to all participants
o The protocol requires that:
which, in turn, send their decision to all other sites.
o no network partitioning should occur;
● Each participant waits for messages from the other sites before
o at least one site must always be available;
deciding whether to commit or abort the transaction.
o at most K sites can fail simultaneously (system is
classified as K-resilient).
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● The basic idea of 3PC is to remove the uncertainty period for
participants that have voted COMMIT and are waiting for the
global abort or global commit from the coordinator.
● Three-phase commit introduces a third phase, called pre-commit,
between voting and the global decision.
● On receiving all votes from the participants, the coordinator sends
a global PRE-COMMIT message.
● A participant who receives the global pre-commit knows that all
other participants have voted COMMIT and that, in time, the
participant itself will definitely commit, unless it fails.
● Each participant acknowledges receipt of the PRE-COMMIT
message and, once the coordinator has received all
acknowledgements, it issues the global commit.
● An ABORT vote from a participant is handled in exactly the same
way as in 2PC.
● The new state transition diagrams for coordinator and participant
are shown in Figure
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
committed or aborted without waiting for the coordinator to
recover.
● If none of the operational sites have received a PRE-COMMIT
message, they will abort the transaction.
● The processing when all participants vote COMMIT is shown in
Figure.

● Both the coordinator and participant still have periods of waiting, Termination protocols for 3PC
but the important feature is that all operational processes have ● As with 2PC, the action to be taken depends on what state the
been informed of a global decision to commit by the coordinator or participant was in when the timeout occurred.
PRE-COMMIT message prior to the first process committing, and Coordinator
can therefore act independently in the event of failure. ● The coordinator can be in one of five states during the commit
● If the coordinator does fail, the operational sites can communicate process as shown in Figure but can timeout in only three states.
with each other and determine whether the transaction should be
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
acknowledge whether they wish to commit or abort the
transaction, so it can decide to globally abort the
transaction.
o Timeout in the PRE-COMMITTED state The participants
have been sent the PRECOMMIT message, so
participants will be in either the PRE-COMMIT or
READY states. In this case, the coordinator can complete
the transaction by writing the commit record to the log file
and sending the GLOBAL-COMMIT message to the
participants.
o Timeout in the DECIDED state This is the same as in
2PC. The coordinator is waiting for all participants to
acknowledge whether they have successfully aborted or
committed the transaction, so it can simply send the
global decision to all sites that have not acknowledged.
Participant
● The participant can be in one of five states during the commit
process as shown in Figure

The actions to be taken are as follows:


o Timeout in the WAITING state This is the same as in 2PC.
The coordinator is waiting for all participants to
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
PRE-COMMIT or ABORT message. In this case, the
participant will follow an election protocol to elect a new
coordinator for the transaction and terminate.
o Timeout in the PRE-COMMITTED state The participant
has sent the acknowledgement to the PRE-COMMIT
message and is waiting for the COMMIT message. Again,
the participant will follow an election protocol to elect a
new coordinator for the transaction and terminate as we
discuss below.
Recovery protocols for 3PC
● As with 2PC, the action on restart depends on what state the
coordinator or participant had reached at the time of the failure.
Coordinator failure
● We consider four different states for failure of the coordinator:
o Failure in the INITIAL state The coordinator has not yet
started the commit procedure. Recovery in this case starts
the commit procedure.
o Failure in the WAITING state The participants may have
but can timeout in only three states.
elected a new coordinator and terminated the transaction.
The actions to be taken are as follows:
On restart, the coordinator should contact other sites to
o Timeout in the INITIAL state This is the same as in 2PC.
determine the fate of the transaction.
The participant is waiting for the PREPARE message, so
o Failure in the PRE-COMMITTED state Again, the
can unilaterally abort the transaction.
participants may have elected a new coordinator and
o Timeout in the PREPARED state The participant has sent
terminated the transaction. On restart, the coordinator
its vote to the coordinator and is waiting for the
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
should contact other sites to determine the fate of the ● The election protocol discussed for 2PC can be used by
transaction. participants to elect a new coordinator following a timeout. The
o Failure in the DECIDED state The coordinator has newly elected coordinator will send a STATE-REQ
instructed the participants to globally abort or commit the ● message to all participants involved in the election in an attempt to
transaction. On restart, if the coordinator has received all determine how best to continue with the transaction.
acknowledgements, it can complete successfully. ● The new coordinator can use the following rules:
Otherwise, it has to initiate the termination protocol. ▪ If some participant has aborted, then the global
Participant decision is abort.
● We consider four different states for failure of a participant: ▪ If some participant has committed the
o Failure in the INITIAL state The participant has not yet transaction, then the global decision is commit.
voted on the transaction. Therefore, on recovery, it can ▪ If all participants that reply are uncertain, then
unilaterally abort the transaction. the decision is abort.
o Failure in the PREPARED state The participant has sent ▪ If some participant can commit the transaction
its vote to the coordinator. In this case, the participant (is in the PRE-COMMIT state), then the global
should contact other sites to determine the fate of the decision is commit. To prevent blocking, the new
transaction. coordinator will first send the PRE-COMMIT
o Failure in the PRE-COMMITTED state The participant message and, once participants have
should contact other sites to determine the fate of the acknowledged, send the GLOBAL-COMMIT
transaction. message.
o Failure in the ABORTED/COMMITTED states Network Partitioning
Participant has completed the transaction. Therefore, on ● When a network partition occurs, maintaining the consistency of
restart no further action is necessary. the database may be more difficult, depending on whether data is
Termination protocol following the election of new coordinator replicated or not. If data is not replicated, we can allow a
transaction to proceed if it does not require any data from a site
outside the partition in which it is initiated.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● Otherwise, the transaction must wait until the sites to which it
needs access are available again.
● If data is replicated, the procedure is much more complicated.
● We consider two examples of anomalies that may arise with Maintaining integrity
replicated data in a partitioned network based on a simple bank ● Successfully completed update operations by users in different
account relation containing a customer balance. partitions can easily violate integrity constraints, as illustrated in
Identifying updates Figure.
● Successfully completed update operations by users in different
partitions can be difficult to observe, as illustrated in Figure.

● Assume that a bank places a constraint on a customer account


(with balance balx) that it cannot go below £0. In partition P1, a
transaction has withdrawn £60 from the account and in partition
P2, a transaction has withdrawn £50 from the same account.

● In partition P1, a transaction has withdrawn £10 from an account ● Assuming at the start both partitions have £100 in balx, then on

(with balance balx) and in partition P2, two transactions have each completion one has £40 in balx and the other has £50. Importantly,

withdrawn £5 from the same account. Assuming at the start both neither has violated the integrity constraint.

partitions have £100 in balx, then on completion they both have ● However, when the partitions recover and the transactions are both

£90 in balx. fully implemented, the balance of the account will be –£10, and the

● When the partitions recover, it is not sufficient to check the value integrity constraint will have been violated.

in balx and assume that the fields are consistent if the values are ● Processing in a partitioned network involves a tradeoff in

the same. availability and correctness.

● In this case, the value after executing all three transactions should ● Absolute correctness is most easily provided if no processing of

be £80. replicated data is allowed during partitioning.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
● On the other hand, availability is maximized if no restrictions are approach to concurrency control, in which updates are allowed to
placed on the processing of replicated data during partitioning. proceed independently in the various partitions.
● In general, it is not possible to design a non-blocking atomic ● Therefore, inconsistencies are likely when sites recover.
commit protocol for arbitrarily partitioned networks. ● To determine whether inconsistencies exist, precedence graphs
● Since recovery and concurrency control are so closely related, the can be used to keep track of dependencies among data. Precedence
recovery techniques that will be used following network graphs are similar to wait-for graphs and show which transactions
partitioning will depend on the particular concurrency control have read and written which data items.
strategy being used. ● While the network is partitioned, updates proceed without
● Methods are classified as either pessimistic or optimistic. restriction and precedence graphs are maintained by each partition.
Pessimistic protocols ● When the network has recovered, the precedence graphs for all
● Pessimistic protocols choose consistency of the database over partitions are combined.
availability and would therefore not allow transactions to execute ● Inconsistencies are indicated if there is a cycle in the graph. The
in a partition if there is no guarantee that consistency can be resolution of inconsistencies depends upon the semantics of the
maintained. transactions, and thus it is generally not possible for the recovery
● The protocol uses a pessimistic concurrency control algorithm such manager to re-establish consistency without user intervention.
as primary copy 2PL or majority locking.
● Recovery using this approach is much more straightforward, since
updates would have been confined to a single, distinguished
partition.
● Recovery of the network involves simply propagating all the
updates to every other site.
Optimistic protocols
● Optimistic protocols, on the other hand, choose availability of the
database at the expense of consistency, and use an optimistic
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o data can be partitioned and each processor can work
independently on its own partition.
Parallel Databases
⮚ Queries are expressed in high level language (SQL, translated to

⮚ Parallel machines are becoming quite common and affordable relational algebra)

o Prices of microprocessors, memory and disks have o makes parallelization easier.

dropped sharply ⮚ Different queries can be run in parallel with each other.

o Recent desktop computers feature multiple processors and Concurrency control takes care of conflicts.

this trend is projected to accelerate ⮚ Thus, databases naturally lend themselves to parallelism.

⮚ Databases are growing increasingly large I/O Parallelism

o large volumes of transaction data are collected and stored


⮚ Reduce the time required to retrieve relations from disk by
for later analysis.
partitioning
o multimedia objects like images are increasingly stored in
⮚ the relations on multiple disks.
databases
⮚ Horizontal partitioning – tuples of a relation are divided among
⮚ Large-scale parallel database systems increasingly used for:
many disks such that each tuple resides on one disk.
o storing large volumes of data
⮚ Partitioning techniques (number of disks = n):
o processing time-consuming decision-support queries
Round-robin:
o providing high throughput for transaction processing
⮚ Send the ith tuple inserted in the relation to disk i mod n.
Hash partitioning:
Parallelism in Databases
o Choose one or more attributes as the partitioning
⮚ Data can be partitioned across multiple disks for parallel I/O.
attributes.
⮚ Individual relational operations (e.g., sort, join, aggregation) can
o Choose hash function h with range 0…n - 1
be executed in parallel
o Let i denote result of hash function h applied to the
partitioning attribute value of a tuple. Send tuple to disk i.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
⮚ Range partitioning:
o Choose an attribute as the partitioning attribute.
Round robin:
o A partitioning vector [vo, v1, ..., vn-2] is chosen.
o Let v be the partitioning attribute value of a tuple. Tuples ⮚ Advantages
such that vi ≤ vi+1 go to disk I + 1. Tuples with v < v0 go o Best suited for sequential scan of entire relation on each
to disk 0 and tuples with v ≥ vn-2 go to disk n-1. query.
E.g., with a partitioning vector [5,11], a tuple with partitioning attribute o All disks have almost an equal number of tuples; retrieval
value of 2 will go to disk 0, a tuple with value 8 will go to disk 1, while a work is thus well balanced between disks.
tuple with value 20 will go to disk2. ⮚ Range queries are difficult to process
o No clustering -- tuples are scattered across all disks

Comparison of Partitioning Techniques


Hash partitioning:
⮚ Evaluate how well partitioning techniques support the following
⮚ Good for sequential access
types of data access:
o Assuming hash function is good, and partitioning
1.Scanning the entire relation.
attributes form a key, tuples will be equally distributed
2.Locating a tuple associatively – point queries. between disks
o Retrieval work is then well balanced between disks.
l E.g., r.A = 25.
⮚ Good for point queries on partitioning attribute
3.Locating all tuples such that the value of a given attribute lies within a
o Can lookup single disk, leaving others available for
specified range – range queries.
answering other queries.

l E.g., 10 ≤ r.A < 25. o Index on partitioning attribute can be local to disk,
making lookup and update more efficient
⮚ No clustering, so difficult to answer range queries
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES

Range partitioning: Handling of Skew


⮚ Provides data clustering by partitioning attribute value.
⮚ The distribution of tuples to disks may be skewed — that is, some
⮚ Good for sequential access
disks have many tuples, while others may have fewer tuples.
⮚ Good for point queries on partitioning attribute: only one disk
⮚ Types of skew:
needs to be accessed.
o Attribute-value skew.
⮚ For range queries on partitioning attribute, one to a few disks may
▪ Some values appear in the partitioning attributes
need to be accessed
of many tuples; all the tuples with the same value
Remaining disks are available for other queries.
for the partitioning attribute end up in the same
Good if result tuples are from one to a few blocks.
partition.
If many blocks are to be fetched, they are still fetched
▪ Can occur with range-partitioning and
from one to a few disks, and potential parallelism in disk
hash-partitioning.
access is wasted
o Partition skew.
​ Example of execution skew.
▪ With range-partitioning, badly chosen partition
Partitioning a Relation across Disks
vector may assign too many tuples to some
⮚ If a relation contains only a few tuples which will fit into a single partitions and too few to others.
disk block, then assign the relation to a single disk. ▪ Less likely with hash-partitioning if a good
⮚ Large relations are preferably partitioned across all the available hash-function is chosen.
disks. Handling Skew in Range-Partitioning
⮚ If a relation consists of m disk blocks and there are n disks
⮚ To create a balanced partitioning vector (assuming partitioning
available in the system, then the relation should be allocated
attribute forms a key of the relation):
min(m,n) disks.
o Sort the relation on the partitioning attribute.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Construct the partition vector by scanning the relation in
sorted order as follows.
▪ After every 1/nth of the relation has been read,
the value of the partitioning attribute of the next
tuple is added to the partition vector.
o n denotes the number of partitions to be constructed.
o Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
⮚ Alternative technique based on histograms used in practice

Handling Skew using Histograms


Handling Skew Using Virtual Processor Partitioning
⮚ Balanced partitioning vector can be constructed from histogram in
a relatively straightforward fashion ⮚ Skew in range partitioning can be handled elegantly using virtual
o Assume uniform distribution within each range of the processor partitioning:
histogram o create a large number of partitions (say 10 to 20 times the
⮚ Histogram can be constructed by scanning relation, or sampling number of processors)
(blocks containing) tuples of the relation o Assign virtual processors to partitions either in
round-robin fashion or based on estimated cost of
processing each virtual partition
⮚ Basic idea:
o If any normal partition would have been skewed, it is very
likely the skew is spread over a number of virtual
partitions
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
o Skewed virtual partitions get spread across a number of o Before reading/writing to a page, the page must be locked
processors, so work gets distributed evenly! in shared/exclusive mode.
o On locking a page, the page must be read from disk
o Before unlocking a page, the page must be written to disk
Interquery Parallelism
if it was modified.

⮚ Queries/transactions execute in parallel with one another. ⮚ More complex protocols with fewer disk reads/writes exist.

⮚ Increases transaction throughput; used primarily to scale up a ⮚ Cache coherency protocols for shared-nothing systems are similar.

transaction processing system to support a larger number of Each database page is assigned a home processor. Requests to

transactions per second. fetch the page or write it to disk are sent to the home processor.

⮚ Easiest form of parallelism to support, particularly in a


shared-memory parallel database, because even sequential
Intraquery Parallelism
database systems support concurrent processing.
⮚ More complicated to implement on shared-disk or shared-nothing ⮚ Execution of a single query in parallel on multiple
architectures processors/disks; important for speeding up long-running queries.
o Locking and logging must be coordinated by passing ⮚ Two complementary forms of intraquery parallelism :
messages between processors. o Intraoperation Parallelism – parallelize the execution of
o Data in a local buffer may have been updated at another each individual operation in the query.
processor. o Interoperation Parallelism – execute the different
l Cache-coherency has to be maintained — reads and operations in a query expression in parallel.
writes of data in buffer must find latest version of data. the first form scales better with increasing parallelism because
the number of tuples processed by each operation is typically more than the
number of operations in a query
Cache Coherency Protocol

⮚ Example of a cache coherency protocol for shared disk systems:


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT II
DISTRIBUTED DATABASES
Design of Parallel Systems

Some issues in the design of parallel systems:

⮚ Parallel loading of data from external sources is needed in order to


handle large volumes of incoming data.
⮚ Resilience to failure of some processors or disks.
o Probability of some disk or processor failing is higher in a
parallel system.
o Operation (perhaps with degraded performance) should be
possible in spite of failure.
o Redundancy achieved by storing extra copy of every data
item at another processor.
⮚ On-line reorganization of data and schema changes must be
supported.
o For example, index construction on terabyte databases can
take hours or days even on a parallel system.
▪ Need to allow other processing
(insertions/deletions/updates) to be performed on
relation even as index is being constructed.
o Basic idea: index construction tracks changes and
``catches up'‘ on changes at the end.
⮚ Also need support for on-line repartitioning and schema changes
(executed concurrently with other processing).

You might also like