Unit i Distributed Databases
Unit i Distributed Databases
If centralized system fails,entire system ishalted If one system fails,system continues work
with other site
Centralized database
Distributed database
Data Fragmentation
What is fragmentation?
Types of Fragmentation
Fragment 1 (r1)=account 1
Fragment 2(r2)=account 2
\
To reconstruct the relation r by taking union of all fragments
r=r1 U r2 U …… rn
2. Vertical Fragmentation
r(R) =R1,R2,R3…Rn
r=r1 r2 r3 ……….. rn
To reconstruct include
o primary key of R in each R i
o Any super key can be used
o Add a special attribute called a tuple –id (the logical or physical
address can be used as tuple-id)
Advantages of Fragmentation
o Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located where they
are most frequently accessed
o Vertical:
allows tuples to be split so that each part of the tuple is stored
where it is most frequently accessed
tuple-id attribute allows efficient joining of vertical fragments
allows parallel processing on a relation
3) Hybrid Fragmentation
Hybrid fragmentation can be achieved by performing horizontal and vertical
partition together.
Mixed fragmentation is group of rows and columns in relation.
Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40
Fragmentation2:
SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000
Example:
Fragmentation1:
SELECT * FROM Acc_NO
Fragmentation2:
SELECT * FROM Balance
Transparency :
o Data transparency: Degree to which system user may remain unaware
of the details of how and where the data items are stored in a
distributed system
o Data Transparency can take several forms:
Fragmentation transparency
Replication transparency
Location transparency
o DATA ITEMS AND NAMING
–Fragmentation Transparency:
•Users are not required to know how a relation has been fragmented.
–Replication Transparency:
•Users view to data is always unique, but for various constraints same data may be
replicated at different sites. Users don’t need to be concerned of where data objects
have been replicated and placed.
–Location Transparency:
•Users aren’t required to know the physical location of data. The distributed database
should be able to find any data as long as data identifier is supplied by user
transactions.
DATA ITEMS AND NAMING
Data items in databases are Relations, Fragments and Replicas.
These Data items must have unique names. That is – In distributed database
environment we must take care to ensure that two sites don’t use same name for
distinct data items.
Solution to this problem is Use of a registered central name server.
Name Server helps to ensure that same name doesn’t get used for different data
items.
–This approach however has several drawbacks:
•First the name server may become a performance bottleneck when data items are
located by their names, resulting in poor performance.
•Second, if the name server crashes, it may not be possible for any site in the
distributed system to continue to run.
The second approach uses a mechanism – Each site prefixes its own site identifier to
any name that it generates. Although the approach ensures no two sites generate the
same name.
–This solution, however, fails to achieve location transparency – Given site
identifiers are attached to names.
–To address this problem, the database system can create a set of alternative names
or aliases, for data items. A user may hence refer to data items by simple names that
are translated by the system to complete names.
–Plus users will be unaffected if the database administrator decides to move a data
item form one site to another.
[Transactions –eg :read ,write,update]
Transactions concurrency and atomicity ]
Distributed Transactions
DISTRIBUTED TRANSACTIONS
Preserve ACID properties
Two types of transactions Local ,Global
SYSTEM STRUCTURE
Transaction Manager [Preserve ACID ]
responsibility[maintain log, concurrency ]
Transaction Coordinator [Manage & coordinate various transactions ]
responsibility [start transaction,dis.subtrans,termination of trans]
Data is distributed and the transaction can be executed on different nodes called
Distributed Transactions
–Ensuring ACID properties of the local transactions isn‘t any issue however
achieving ACID properties for Global Transactions is a tedious and complicated
process as failure of communication link is obvious in distributed environment.
SYSTEM STRUCTURE
Transaction may access data at several sites and each site contains two sub-systems:
•Transaction Manager:
Each site has its own local transaction manager, whose function is to ensure ACID
properties of those transactions that execute at that site .
The various transaction managers cooperate with each other to manage Global
Transactions.
Transaction may access data at several sites.
Each site has a local transaction manager responsible for:
o Maintaining a log for recovery purposes
o Participating in coordinating the concurrent execution of the transactions
executing at that site.
Each site has a transaction coordinator, which is responsible for:
o Starting the execution of transactions that originate at the site.
o Distributing sub transactions at appropriate sites for execution.
o Coordinating the termination of each transaction that originates at the
site, which may result in the transaction being committed at all sites or
aborted at all sites as shown in below figure.
•In a local database system, for committing a transaction, the transaction manager has
to only convey the decision to commit to the recovery manager.
–If the site fails before responding with a ready T message to Ci, the coordinator
assumes that it responded an abort T message.
–If the site fails after the coordinator has received the ready T message from the site,
the coordinator executes the rest of commit protocol in the normal fashion, ignoring
the failure of the site.
•When site Si recovers, it examines its log to determine the fate of transactions active
at the time of the failure.
•Log contain <commit T> record: site executes redo (T)
•Log contains <abort T> record: site executes undo (T)
•Log contains <ready T> record: site must consult Ci to determine the fate of T.
–If Ci is up, it notifies Sk regarding whether T committed or aborted.
•If T committed, redo (T)
•If T aborted, undo (T)
–If Ci is down, Sk must try to find the fate of T from other sites. It does so by
sending query status T message to all sites in the system. On receiving such message
a site must consult its log whether T has executed there, if Yes ,it must notify Sk
about the outcome.
–If no site has information regarding T, Sk must wait until any site recovers and
coveys the outcome.
•The log contains no control records concerning T
–implies that Sk failed before responding to the prepare T message from Ci
–Sk must execute undo (T)
•If coordinator fails while the commit protocol for T is executing then participating
sites must decide on T‗s fate:
1.If an active site contains a <commit T> record in its log, then T must be committed.
2.If an active site contains an <abort T> record in its log, then T must be aborted.
3.If some active participating site does not contain a <ready T> record in its log, then
the failed coordinator Ci cannot have decided to commit T.
•Can therefore abort T.
4.If none of the above cases holds, then all active sites must have a <ready T> record
in their logs, but no additional control records (such as <abort T> of <commit T>).
•In this case active sites must wait for Ci to recover, to find decision.
•Blocking problem: active sites may have to wait for failed coordinator to recover.
NETWORK PARTITION
•Given we are in a distributed environment a network failure is quite obvious leading
to a situation known as Network Partition.
•When a Network Partitions, two possibilities exist;
–The coordinator and all its participants remain in one partition. So failure has no
effect on the commit protocol.
–The coordinator and its participants belong to several partitions.
•Sites that are not in the partition containing the coordinator think the coordinator has
failed, and execute the protocol to deal with failure of the coordinator.
•No harm results, but sites may still have to wait for decision from coordinator.
•The coordinator and the sites are in the same partition as the coordinator think that
the sites in the other partition have failed, and follow the usual commit protocol.
•Again, no harm results