0% found this document useful (0 votes)

403 views58 pages

Distributed Databases

The document discusses key issues in distributed database design such as fragmentation, allocation, and replication of data across multiple sites. It also describes different techniques for data distribution including synchronous and asynchronous replication as well as primary site and peer-to-peer replication. Query processing in distributed databases requires considering communication costs and data placement during query optimization. Semi-joins can help reduce data transfer sizes between sites during query execution.

Uploaded by

api-26355935

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

403 views58 pages

Distributed Databases

Uploaded by

api-26355935

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 58

Distributed Databases

By
Sudarshan

MCA Sem V 12/10/2007

Distributed Database Design

• Three key issues:

– Fragmentation
• Relation may be divided into a number of sub-
relations, which are then distributed.

– Allocation
• Each fragment is stored at site with “optimal”
distribution.

– Replication.
• Copy of fragment may be maintained at several
sites.
Data Allocation

• An allocation schema describes the

allocation of fragments to sites of the
DDBs.
• It is mapping that specifies for each
fragment the site at which it is stored.
• If a fragment is stored at more than one
site it is said to be replicated.
Distributed Catalog Management

• Must keep track of how data is

distributed across sites.
• Must be able to name each replica of
each n fragment. To preserve local
autonomy:
– <local-name, birth-site>
• Site Catalog: Describes all objects
(fragments, replicas) at a site + Keeps
track of replicas of relations created at
this site.
– To find a relation, look up its birth-site
catalog.
Data Replication

• Fully replicated : each fragment at

each site
• Partially replicated : each fragment
at some of the sites
• Types of replication
– Synchronous Replication
– Asynchronous Replication

• Rule of thumb:

– If replication is
advantageous,

– otherwise replication may cause

Synchronous Replication

• All copies of a modified relation

(fragment) must be updated before the
modifying transaction commits.
– Data distribution is made transparent to
users.
• 2 techniques for synchronous replication
– Voting
– Read-any Write-all
Asynchronous Replication

• Allows modifying transaction to commit

before all copies have been changed (and
readers nonetheless look at just one copy).
• Copies of a modified relation are only
periodically updated; different copies may get
out of synch in the meantime.
– Users must be aware of data distribution.
– Current products follow this approach.
• 2 techniques for asynchronous replication
– Primary Site Replication
– Peer-to-Peer Replication
• Difference lies in how many copies are
“updatable’’ or “master copies’’.
Techniques for Synchronous Replication

• Voting: Transactions must write a majority of

copies to modify an object; must read enough
copies to be sure of seeing at least one most
recent copy.
– E.g., 10 copies; 7 written for update; 4 copies read.
– Each copy has version number.
– Not attractive usually because reads are common.
• Read-any Write-all: Writes are slower and
reads are faster, relative to Voting.
– Most common approach to synchronous
replication.
• Choice of technique determines which locks to
set.
Primary Site Replication

• Exactly one copy of a relation is designated

the primary or master copy. Replicas at other
sites cannot be directly updated.
– The primary copy is published.
– Other sites subscribe to (fragments of) this
relation; these are secondary copies.

• Main issue: How are changes to the primary

copy propagated to the secondary copies?
– Done in two steps.
• First, capture changes made by committed transactions;
• Then apply these changes.
Implementing the Capture Step

• Log-Based Capture: The log (kept for

recovery) is used to generate a Change
Data Table (CDT).
– If this is done when the log tail is written to
disk, it must somehow remove changes due
to subsequently aborted transactions.
• Procedural Capture: A procedure that is
automatically invoked (trigger) does the
capture; typically, just takes a snapshot.
• Log-Based Capture is better (cheaper,
faster) but relies on proprietary log
details.
Implementing the Apply Step

• The Apply process at the secondary site

periodically obtains (a snapshot or) changes
to the CDT table from the primary site, and
updates the copy.
– Period can be timer-based or user/application
defined.
• Replica can be a view over the modified
relation!
– If so, the replication consists of incrementally
updating the materialized view as the relation
changes.

• Log-Based Capture plus continuous Apply

minimizes delay in propagating changes.
• Procedural Capture plus application-driven
Apply is the most flexible way to process
Peer-to-Peer Replication

• More than one of the copies of an object

can be a master in this approach.
• Changes to a master copy must be
propagated to other copies somehow.
• If two master copies are changed in a
conflicting manner, this must be
resolved. (e.g., Site 1: Joe’s age
changed to 35; Site 2: to 36)
• Best used when conflicts do not arise:
– E.g., Each master site owns a disjoint
fragment.
– E.g., Updating rights owned by one master
Distributed Query Processing
Sailors (sid : int, sname : str, rating : int, age : int)
Reserves (sid : int, bid : int, day : date, rname : string)

SELECT AVG(S.age)
FROM Sailors S
WHERE S.rating > 3 AND S.rating < 7
• Horizontally Fragmented: Tuples with rating < 5 at
Mumbai, >= 5 at Delhi.
– Must compute SUM(age), COUNT(age) at both sites.
– If WHERE contained just S.rating>6, just one site.
• Vertically Fragmented: sid and rating at Mumbai,
sname and age at Delhi, tid at both.
– Must reconstruct relation by join on tid, then evaluate the
query.
• Replicated: Sailors copies at both sites.
– Choice of site based on local costs, shipping costs.
Distributed Query Optimization

• Cost-based approach; consider all plans,

pick cheapest; similar to centralized
optimization.
– Difference 1: Communication costs must be
considered.
– Difference 2: Local site autonomy must be
respected.
– Difference 3: New distributed join methods.
• Query site constructs global plan, with
suggested local plans describing
processing at each site.
Distributed Query Processing – Example
Distributed Query Processing – Example

• This query will return 10,000 records

(every employee belongs to one
department).
• Each record will be 40 bytes long
(FNAME + LNAME + DNAME = 15 + 15
+ 10 = 40).
• Thus the result set will be 400,000
bytes.
• Assume cost to transfer query text
between nodes can be safely ignored.
Distributed Query Processing – Example

• Three alternatives:
– Copy all EMPLOYEE and DEPARTMENT records to
node 3. Perform the join and display the results.
Total Cost = 1,000,000 + 3,500 = 1,003,500 bytes
– Copy all EMPLOYEE records (1,000,000 bytes) from
node 1 to node 2. Perform the join, then ship the
results (400,000 bytes) to node 3.
Total cost = 1,000,000 + 400,000 = 1,400,000
bytes
– Copy all DEPARTMENT records (3,500) from node 2
to node 1. Perform the join. Ship the results from
node 1 to node 3 (400,000).
Total cost = 3,500 + 400,000 = 403,500 bytes
Distributed Query Processing – Example

• For each department, retrieve the department

name and the name of the department
manager. ∏FNAME,LNAME,DNAME (DEPARTMENT XMGRSSN=SSN
EMPLOYEE)
• Total no of tuples in relation = 100. each of 40 bytes.

• Transfer both the relations to site 3 and perform join

Total size = 1,000,000 + 3500 = 1,003,500 bytes
• Transfer EMPLOYEE relation to site 2 and perform join. Send the
results to site 3.
Total size = 1,000,000 + 4000 = 1,004,000 bytes
• Transfer DEPARTMENT relation to site 1 and perform join. Send
the results to site 3.
Total size = 3500 + 4000 = 7500 bytes
Distributed Query Processing – Example

• Taking the same example:

– Copy just the FNAME, LNAME and DNO
columns from Site 1 to Site 3 (cost = 34
bytes times 10,000 records = 340,000
bytes)
– Copy just the DNUMBER and DNAME
columns from site 2 to site 3 (cost = 14
bytes times 100 records = 1,400 bytes)
– Perform the join at site 3 and display the
results.
Total cost = 341,400
Semi-Join

• The semijoin of r1 with r2, is denoted by:

r1 r2
• The idea is to reduce the number of tuples in
a relation before transferring it to another
site.
• Send the joining column of a relation R (say
r1) to a site where the other relation S (say r2)
is located, and perform a join with r2.
• Then, the join attribute along with the other
required attributes are projected and send to
the original site.
• A join operation is performed at this site.
Semi-Join Example

London Site: Sailors (sid: int, sname: str, rating:

int, age: int)
Paris Site: Reserves (sid: int, bid: int, day: date,
rname: string)

• At London, project Sailors onto join columns

and ship this to Paris.
• At Paris, join Sailors projection with Reserves.
– Result is called reduction of Reserves wrt Sailors.
• Ship reduction of Reserves to London.
• At London, join Sailors with reduction of
Reserves.
• Especially useful if there is a selection on
Semi-Join Example
• Project the join attribute of DEPARTMENT at site 2,
and transfer them at site 1.
F = ∏DNUMBER (DEPARTMENT)
Size = 4*100 = 400 bytes
• Join the transferred file with EMPLOYEE relation at site
1, and transfer the required attributes from the
resulting file to site 2.
R= ∏DNO,FNAME.LNAME (F X DNUMBER=DNO EMPLOYEE)
Size = 34*10,000= 340,000 bytes.
• Execute a join of transferred file R with DEPARTMENT,
and present the result to site 3.
Size = 400,000 bytes

Total Size = 400+340,000+400,000 = 740,400 bytes

Semi-Join Example

• Project the join attribute of DEPARTMENT at site 2,

and transfer them at site 1.
F = ∏MGRSSN (DEPARTMENT)
Size = 9*100 = 900 bytes
• Join the transferred file with EMPLOYEE relation at site
1, and transfer the required attributes from the
resulting file to site 2.
R= ∏MGRSSN,FNAME.LNAME (F X MGRSSN = SSN EMPLOYEE)
Size = 39*100= 3,900 bytes.
• Execute a join of transferred file R with DEPARTMENT,
and present the result to site 3.
Size = 4,000 bytes

Total Size = 900+3,900+4,000 = 8,800 bytes

Joins - Fetch as Needed

• Perform a page oriented

Nested Loop Join in London
with Sailors as the outer and
for each Sailors page, fetch all Reserves
pages from Paris.
• Cache all the pages.
• Fetch as Needed,
– Cost: 500 D + 500 * 1000 (D+S)
– D is cost to read/write page; S is cost to ship page.
– If query was not submitted at London, must add
cost of shipping result to query site.
• Can also do Indexed Nested Loop Join at
London, fetching matching Reserves tuples to
London as needed.
Joins - Ship to One Site

• Transfer Reserves to London.

– Cost: 1000 S + 4500 D

• Transfer Sailors to Paris.

– Cost: 500 S + 4500 D

• If result size is very large, may be better

to ship both relations to result site and
then join them!
Bloomjoins

• At London, compute a bit-vector of some size

k:
– Hash join column values into range 0 to k-1.
– If some tuple hashes to i, set bit i to 1 (i from 0 to
k-1).
– Ship bit-vector to Paris.
• At Paris, hash each tuple of Reserves
similarly, and discard tuples that hash to 0 in
Sailors bit-vector.
– Result is called reduction of Reserves wrt Sailors.
• Ship bit-vector reduced Reserves to London.
• At London, join Sailors with reduced Reserves.
Distributed Transactions

• Distributed Concurrency Control:

– How can locks for objects stored across
several sites be managed.
– How can deadlocks be detected in a
distributed database
• Distributed Recovery:
– Transaction atomicity must be ensured.
Distributed Transactions
Concurrency Control and Recovery

• Dealing with multiple copies of the data

items
• Failure of individual sites
• Failure of communication links
• Distributed commit
• Distributed deadlock
Distributed Locking

• How do we manage locks for objects across

many sites?
– Centralized: One site does all locking.
• Vulnerable to single site failure.
– Primary Copy: All locking for an object done at
the primary copy site for this object.
• Reading requires access to locking site as well as site
where the object is stored.
– Fully Distributed: Locking for a copy done at site
where the copy is stored.
• Locks at all sites while writing an object.
• Obtaining and releasing of locks is determined
by the concurrency control protocol.
Deadlock Handling

Consider the following two transactions and history, with

item X and transaction T1 at site 1, and item Y and
transaction
T: T2 (X)
write at site 2: T: write (Y)
1 2
write (Y) write (X)
X-lock on X
write (X) X-lock on Y
write (Y)
wait for X-lock on X
Wait for X-lock on Y

Result: deadlock which cannot be detected locally at either site

Local and Global Wait-For Graphs

Local

Global
Distributed Deadlock – Solution

• Three solutions:
– Centralized (send all local graphs to one
site);
– Hierarchical (organize sites into a hierarchy
and send local graphs to parent in the
hierarchy);
– Timeout (abort transaction if it waits too
long).
Centralized Approach
• A global wait-for graph is constructed and
maintained in a single site; the deadlock-
detection coordinator
– Real graph: Real, but unknown, state of the
system.
– Constructed graph:Approximation generated by
the controller during the execution of its algorithm
.
• The global wait-for graph can be constructed
when:
– a new edge is inserted in or removed from one of
the local wait-for graphs.
– a number of changes have occurred in a local
wait-for graph.
Example Wait-For Graph for False Cycles
Initial state:
False Cycles (Cont.)
• Suppose that starting from the state shown in
figure,
1. T2 releases resources at S1
• resulting in a message remove T1 → T2 message from
the Transaction Manager at site S1 to the coordinator)

2. And then T2 requests a resource held by T3 at

site S2
• resulting in a message insert T2 → T3 from S2 to the
coordinator
• Suppose further that the insert message reaches
before the delete message (this can happen due to
network delays)
• The coordinator would then find a false cycle
T1 → T2 → T3 → T1
Unnecessary Rollbacks

• Unnecessary rollbacks may result when

deadlock has indeed occurred and a
victim has been picked, and meanwhile
one of the transactions was aborted for
reasons unrelated to the deadlock.
• Unnecessary rollbacks can result from
false cycles in the global wait-for graph;
however, likelihood of false cycles is
low.
Distributed Recovery

• Two new issues:

– New kinds of failure, e.g., links and remote
sites.
– If “sub-transactions” of an transaction+
execute at different sites, all or none must
commit. Need a commit protocol to achieve
this.
• A log is maintained at each site, as in a
centralized DBMS, and commit protocol
actions are additionally logged.
Coordinator Selection

• Backup coordinators
– site which maintains enough information locally to
assume the role of coordinator if the actual
coordinator fails
– executes the same algorithms and maintains the
same internal state information as the actual
coordinator fails executes state information as the
actual coordinator
– allows fast recovery from coordinator failure but
involves overhead during normal processing.
• Election algorithms
– used to elect a new coordinator in case of failures
– Example: Bully Algorithm - applicable to systems
where every site can send a message to every
Bully Algorithm

• If site Si sends a request that is not answered by the

coordinator within a time interval T, assume that the
coordinator has failed Si tries to elect itself as the new
coordinator.
• Si sends an election message to every site with a
higher identification number, Si then waits for any of
these processes to answer within T.
• If no response within T, assume that all sites with
number greater than i have failed, Si elects itself the
new coordinator.
• If answer is received Si begins time interval T’,
waiting to receive a message that a site with a higher
identification number has been elected.
Bully Algorithm

• If no message is sent within T’, assume the

site with a higher number has failed; Si
restarts the algorithm.
• After a failed site recovers, it immediately
begins execution of the same algorithm.
• If there are no active sites with higher
numbers, the recovered site forces all
processes with lower numbers to let it
become the coordinator site, even if there is a
currently active coordinator with a lower
number.
Distributed Concurrency Control

• Idea is to designate a particular copy of each

data item as a distinguished copy.
• The locks for this data item are associated
with the distinguished copy and all the locking
and unlocking requests are sent to the site
that contains the copy.
• Methods for concurrency control
– Primary Site Technique
– Primary Site with Backup Site
– Primary Copy Technique
– Voting
Distributed Concurrency Control

• Primary site with backup site

– Overcomes the second disadvantage of
primary site technique
– All locking information maintained at the
primary as well as backup site.
– In case of failure of primary site, backup
site takes the control and becomes a
primary site.
– It also chooses a site as a backup site and
copies the lock information.
Distributed Concurrency Control

• Primary copy technique

– Attempts to distribute load of lock
coordination by having distinguished copies
of different data items stored at different
sites.
– Failure of a site affects transactions that
access locks on that particular site.
– Other transactions can continue to run.
– Can use the method of backup to increase
availability and reliability.
Distributed Concurrency Control

• Based on voting
– To lock a data item:
• Send a message to all nodes that maintain a
replica of this item.
• If a node can safely lock the item, then vote
"Yes", otherwise, vote "No".
• If a majority of participating nodes vote "Yes"
then the lock is granted.
• Send the results of the vote back out to all
participating sites.
Normal Execution and Commit Protocols

• Commit protocols are used to ensure

atomicity across sites
– a transaction which executes at multiple sites must
either be committed at all the sites, or aborted at
all the sites.
– not acceptable to have a transaction committed at
one site and aborted at another
• The two-phase commit (2PC) protocol is
widely used
• The three-phase commit (3PC) protocol is
more complicated and more expensive, but
avoids some drawbacks of two-phase commit
protocol. This protocol is not used in practice.
Two-Phase Commit (2PC)

• Site at which transaction originates is

coordinator; other sites at which it executes
are subordinates.
• When an transaction wants to commit:
– Coordinator sends prepare msg to each
subordinate.
– Subordinate force-writes an abort or prepare log
record and then sends a no or yes msg to
coordinator.
– If coordinator gets unanimous yes votes, force-
writes a commit log record and sends commit
msg to all subs. Else, force-writes abort log rec,
and sends abort msg.
– Subordinates force-write abort/commit log rec
based on msg they get, then send ack msg to
Two-Phase Commit (2PC)

• Two rounds of communication: first,

voting; then, termination. Both initiated by
coordinator.
• Any site can decide to abort an
transaction.
• Every message reflects a decision by the
sender; to ensure that this decision
survives failures, it is first recorded in the
local log.
• All commit protocol log records for an
transactions contain Transaction_id and
Coordinator_id. The coordinator’s
Handling of Failures - Site Failure

When site Si recovers, it examines its log to determine

the fate of
transactions active at the time of the failure.
• Log contain <commit T> record: site executes redo
(T)
• Log contains <abort T> record: site executes undo
(T)
• Log contains <ready T> record: site must consult Ci
to determine the fate of T.
– If T committed, redo (T)
– If T aborted, undo (T)
• The log contains no control records concerning T
replies that Sk failed before responding to the
prepare T message from Ci
– since the failure of Sk precludes the sending of such a
Handling of Failures- Coordinator Failure

• If coordinator fails while the commit protocol for T is

executing then participating sites must decide on T’s
fate:
★ If an active site contains a <commit T> record in its log,
then T must be committed.
★ If an active site contains an <abort T> record in its log, then
T must be aborted.
★ If some active participating site does not contain a <ready
T> record in its log, then the failed coordinator Ci cannot
have decided to commit T. Can therefore abort T.
★ If none of the above cases holds, then all active sites must
have a <ready T> record in their logs, but no additional
control records (such as <abort T> of <commit T>). In this
case active sites must wait for Ci to recover, to find decision.
• Blocking problem : active sites may have to wait for
failed coordinator to recover.
Handling of Failures - Network Partition

• If the coordinator and all its participants

remain in one partition, the failure has no
effect on the commit protocol.
• If the coordinator and its participants belong
to several partitions:
– Sites that are not in the partition containing the
coordinator think the coordinator has failed, and
execute the protocol to deal with failure of the
coordinator.
• No harm results, but sites may still have to wait for
decision from coordinator.
• The coordinator and the sites are in the same
partition as the coordinator think that the
sites in the other partition have failed, and
follow the usual commit protocol.
• Again, no harm results
Recovery and Concurrency Control

• In-doubt transactions have a <ready T>, but

neither a
<commit T>, nor an <abort T> log record.
• The recovering site must determine the commit-abort
status of such transactions by contacting other sites;
this can slow and potentially block recovery.
• Recovery algorithms can note lock information in the
log.
– Instead of <ready T>, write out <ready T, L> L = list of
locks held by T when the log is written (read locks can be
omitted).
– For every in-doubt transaction T, all the locks noted in the
<ready T, L> log record are reacquired.
• After lock reacquisition, transaction processing can
resume; the commit or rollback of in-doubt
transactions is performed concurrently with the
execution of new transactions.
Restart after a Failure

• If we have a commit or abort log record for

transaction T, but not an end record, must
redo/undo T.
– If this site is the coordinator for T, keep sending
commit/abort msgs to subs until acks received.
• If we have a prepare log record for transaction
T, but not commit/abort, this site is a
subordinate for T.
– Repeatedly contact the coordinator to find status
of T, then write commit/abort log record;
redo/undo T; and write end log record.
• If we don’t have even a prepare log record for
T, unilaterally abort and undo T.
– This site may be coordinator! If so, subs may send
msgs.
Observations on 2PC

• Ack msgs used to let coordinator know

when it can “forget” an transaction;
until it receives all acks, it must keep T
in the transaction Table.
• If coordinator fails after sending prepare
msgs but before writing commit/abort
log records, when it comes back up it
aborts the transaction .
• If a sub-transaction does no updates, its
commit or abort status is irrelevant.
2PC with Presumed Abort

• When coordinator aborts T, it undoes T and

removes it from the transaction Table
immediately.
– Doesn’t wait for acks; “presumes abort” if
transaction not in transaction Table. Names of subs
not recorded in abort log rec.
• Subordinates do not send acks on abort.
• If sub- transaction does not do updates, it
responds to prepare msg with reader instead
of yes/no.
• Coordinator subsequently ignores readers.
• If all sub- transaction are readers, 2nd phase
not needed.
Three Phase Commit (3PC)
• Assumptions:
– No network partitioning
– At any point, at least one site must be up.
– At most K sites (participants as well as
coordinator) can fail
• Phase 1: Obtaining Preliminary
Decision: Identical to 2PC Phase 1.
– Every site is ready to commit if instructed
to do so
Three-Phase Commit (3PC)

• Phase 2 of 2PC is split into 2 phases, Phase 2 and

Phase 3 of 3PC
– In phase 2 coordinator makes a decision as in 2PC (called
the pre-commit decision) and records it in multiple (at
least K) sites
– In phase 3, coordinator sends commit/abort message to all
participating sites,
• Under 3PC, knowledge of pre-commit decision
can be used to commit despite coordinator failure
– Avoids blocking problem as long as < K sites fail
• Drawbacks:
– higher overheads
– assumptions may not be satisfied in practice

Adbms Unit 3
No ratings yet
Adbms Unit 3
85 pages
4.data Interpretation
No ratings yet
4.data Interpretation
47 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Process Migration
63% (8)
Process Migration
21 pages
نموذج التميز المؤسسي
No ratings yet
نموذج التميز المؤسسي
44 pages
Advanced Database System Course Outline
No ratings yet
Advanced Database System Course Outline
2 pages
What Does A Database Administrator
No ratings yet
What Does A Database Administrator
27 pages
Sushil Manula Testing
No ratings yet
Sushil Manula Testing
5 pages
3.1 SQL
No ratings yet
3.1 SQL
56 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Create Excel Spreadsheets With PLSQL
0% (1)
Create Excel Spreadsheets With PLSQL
8 pages
Binary Search, Hashing and File Structures
No ratings yet
Binary Search, Hashing and File Structures
23 pages
Unit 4 DDBMS
No ratings yet
Unit 4 DDBMS
58 pages
Answer 1
No ratings yet
Answer 1
9 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
Distributed Database System
No ratings yet
Distributed Database System
5 pages
Midterm Spring (Apr 2024)
No ratings yet
Midterm Spring (Apr 2024)
3 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
Association
No ratings yet
Association
40 pages
Unit 4
No ratings yet
Unit 4
29 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Web Mining
No ratings yet
Web Mining
73 pages
How To Cache DTO Projections With Hibernate
No ratings yet
How To Cache DTO Projections With Hibernate
3 pages
17 DatabaseArchitectures
No ratings yet
17 DatabaseArchitectures
41 pages
Vu Lec 35
No ratings yet
Vu Lec 35
42 pages
DDB Module-5 Mid-2 Notes Q and A 10-12-2024
No ratings yet
DDB Module-5 Mid-2 Notes Q and A 10-12-2024
8 pages
Unit V
No ratings yet
Unit V
23 pages
CS Practical Record 24-25
No ratings yet
CS Practical Record 24-25
52 pages
Unit I Distributed Databases
No ratings yet
Unit I Distributed Databases
15 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
Storage Donvito Chep 2013
No ratings yet
Storage Donvito Chep 2013
43 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Unit-2 - Distributed Database System
No ratings yet
Unit-2 - Distributed Database System
7 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
DBMS
No ratings yet
DBMS
17 pages
7 Distributed DB
No ratings yet
7 Distributed DB
38 pages
Systems Design Study Guide
No ratings yet
Systems Design Study Guide
32 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Unit V
No ratings yet
Unit V
22 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Chap 7
No ratings yet
Chap 7
47 pages
1.4 Storage Notes
No ratings yet
1.4 Storage Notes
2 pages
Database MC A
No ratings yet
Database MC A
16 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Distributed Database - Unit 5
No ratings yet
Distributed Database - Unit 5
4 pages
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
Lecture 1 Advance Database Systems Concepts
No ratings yet
Lecture 1 Advance Database Systems Concepts
54 pages
USER-MANUAL - Customer View
No ratings yet
USER-MANUAL - Customer View
8 pages
Distributed Database System
No ratings yet
Distributed Database System
4 pages
DATABASES Lesson 5 & 6
No ratings yet
DATABASES Lesson 5 & 6
8 pages
Digital Collection Proposal
No ratings yet
Digital Collection Proposal
6 pages
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Database
No ratings yet
Database
6 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
David M. Kroenke and David J. Auer David M. Kroenke and David J. Auer
No ratings yet
David M. Kroenke and David J. Auer David M. Kroenke and David J. Auer
68 pages
Financial MArkets Module 1 NCFM
100% (21)
Financial MArkets Module 1 NCFM
92 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
Chapter 4 Bing
No ratings yet
Chapter 4 Bing
5 pages
Chap 13
No ratings yet
Chap 13
20 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Design Distributed Database
No ratings yet
Design Distributed Database
2 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
BMS Institute of Technology and Management: Department of Master of Computer Applications
No ratings yet
BMS Institute of Technology and Management: Department of Master of Computer Applications
4 pages
Assignment-3 OF: Mobile Computing
No ratings yet
Assignment-3 OF: Mobile Computing
10 pages
Cs 403
No ratings yet
Cs 403
6 pages
CS 3002D Database Management Systems
No ratings yet
CS 3002D Database Management Systems
6 pages
Distributed Databases: Chapter 22.6-22.14
No ratings yet
Distributed Databases: Chapter 22.6-22.14
26 pages
Concurrency Control in Distributed Datab
No ratings yet
Concurrency Control in Distributed Datab
5 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Student's Name: - Bogun Pathmaraj - ID: - 300829172
No ratings yet
Student's Name: - Bogun Pathmaraj - ID: - 300829172
9 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Star Schema
100% (3)
Star Schema
45 pages
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
No ratings yet
Distributed Database: Database Storage Devices CPU Database Management System Computers Network
9 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
CRM - Notes by Sudarshan
100% (4)
CRM - Notes by Sudarshan
45 pages
Chap 15
No ratings yet
Chap 15
42 pages
Unit-V: Database Management System
No ratings yet
Unit-V: Database Management System
5 pages
Concurrency Control With Locking
No ratings yet
Concurrency Control With Locking
7 pages
Fundamentals 2007 Chap1
No ratings yet
Fundamentals 2007 Chap1
36 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
Distributed Databases: by Allyson Moran
No ratings yet
Distributed Databases: by Allyson Moran
37 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Wireless Question and Answers
No ratings yet
Wireless Question and Answers
2 pages
Unit 1
No ratings yet
Unit 1
28 pages
B 00 Midterm
No ratings yet
B 00 Midterm
6 pages
Chap 12
No ratings yet
Chap 12
36 pages
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=Unix+commands+reference+card.pdf HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: 155fb7fa517a5becb07621cfee52141124ac069c User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 X-Requested-With: XMLHttpRequest Referer: http://www.scribd.com/upload-document?archive_doc=1249
No ratings yet
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> <STYLE type="text/css"></STYLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR noshade size="1px"> <P> While trying to process the request: <PRE> TEXT http://www.scribd.com/titlecleaner?title=Unix+commands+reference+card.pdf HTTP/1.1 Host: www.scribd.com Proxy-Connection: keep-alive Accept: */* Origin: http://www.scribd.com X-CSRF-Token: 155fb7fa517a5becb07621cfee52141124ac069c User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 X-Requested-With: XMLHttpRequest Referer: http://www.scribd.com/upload-document?archive_doc=1249
2 pages
200 Ways To Revive Hard Disk
No ratings yet
200 Ways To Revive Hard Disk
74 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
Chap 8
No ratings yet
Chap 8
37 pages
Chap 10
No ratings yet
Chap 10
43 pages
Chap 4
No ratings yet
Chap 4
26 pages
Chap 2
No ratings yet
Chap 2
39 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Chap 9
No ratings yet
Chap 9
24 pages
NFS
No ratings yet
NFS
27 pages
Distributed Computing Notes
No ratings yet
Distributed Computing Notes
14 pages
Distributed Database Systems: January 2002
No ratings yet
Distributed Database Systems: January 2002
25 pages
Chap 14
No ratings yet
Chap 14
33 pages
WTQuestion Bank
No ratings yet
WTQuestion Bank
2 pages
DVD ISO Ripping Instructions: Step 1
No ratings yet
DVD ISO Ripping Instructions: Step 1
1 page
WT Paper by Vaze
No ratings yet
WT Paper by Vaze
1 page
BVIT Prelim DS
No ratings yet
BVIT Prelim DS
1 page
Question Bank Hiray DSfirst3chap
No ratings yet
Question Bank Hiray DSfirst3chap
1 page
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Wireless Security
100% (1)
Wireless Security
19 pages
SVM - Disk Replacement
No ratings yet
SVM - Disk Replacement
3 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Clock Synchronization Final
No ratings yet
Clock Synchronization Final
5 pages
Wireless Technology Questions1 - Answer
100% (11)
Wireless Technology Questions1 - Answer
11 pages
RHCSA
No ratings yet
RHCSA
7 pages

Distributed Databases

Uploaded by

Distributed Databases

Uploaded by

Distributed Databases

MCA Sem V 12/10/2007

• Three key issues:

• An allocation schema describes the

• Must keep track of how data is

• Fully replicated : each fragment at

– otherwise replication may cause

• All copies of a modified relation

• Allows modifying transaction to commit

• Voting: Transactions must write a majority of

• Exactly one copy of a relation is designated

• Main issue: How are changes to the primary

• Log-Based Capture: The log (kept for

• The Apply process at the secondary site

• Log-Based Capture plus continuous Apply

• More than one of the copies of an object

• Cost-based approach; consider all plans,

• This query will return 10,000 records

• For each department, retrieve the department

• Transfer both the relations to site 3 and perform join

• Taking the same example:

• The semijoin of r1 with r2, is denoted by:

London Site: Sailors (sid: int, sname: str, rating:

• At London, project Sailors onto join columns

Total Size = 400+340,000+400,000 = 740,400 bytes

• Project the join attribute of DEPARTMENT at site 2,

Total Size = 900+3,900+4,000 = 8,800 bytes

• Perform a page oriented

• Transfer Reserves to London.

• Transfer Sailors to Paris.

• If result size is very large, may be better

• At London, compute a bit-vector of some size

• Distributed Concurrency Control:

• Dealing with multiple copies of the data

• How do we manage locks for objects across

Consider the following two transactions and history, with

Result: deadlock which cannot be detected locally at either site

2. And then T2 requests a resource held by T3 at

• Unnecessary rollbacks may result when

• Two new issues:

• If site Si sends a request that is not answered by the

• If no message is sent within T’, assume the

• Idea is to designate a particular copy of each

• Primary site technique

• Primary site with backup site

• Primary copy technique

• Commit protocols are used to ensure

• Site at which transaction originates is

• Two rounds of communication: first,

When site Si recovers, it examines its log to determine

• If coordinator fails while the commit protocol for T is

• If the coordinator and all its participants

• In-doubt transactions have a <ready T>, but

• If we have a commit or abort log record for

• Ack msgs used to let coordinator know

• When coordinator aborts T, it undoes T and

• Phase 2 of 2PC is split into 2 phases, Phase 2 and

You might also like