Rohini College of Engineering & Technology: Cs3492-Database Management Systems
Rohini College of Engineering & Technology: Cs3492-Database Management Systems
Data Replication
If relation r is replicated, a copy of relation r is stored in two or more sites. In the most
extreme case, we have full replication, in which a copy is stored in every site in the system.
There are a number of advantages and disadvantages to replication. Availability: If one
of the sites containing relation r fails, then the relation r can be found in another site. Thus, the
system can continue to process queries involving r, despite the failure of one site.
Increased parallelism. In the case where the majority of accesses to the relation r result in only
the reading of the relation, then several sites can process queries involving r in parallel.
The more replicas of r there are, the greater the chance that the needed data will be
found in the site where the transaction is executing. Hence, data replication minimizes
movement of data between sites.
There are two different schemes for fragmenting a relation: horizontal fragmentation and
vertical fragmentation.
Horizontal fragmentation splits the relation by assigning each tuple of r to one or more
fragments.
Vertical fragmentation splits the relation by decomposing the scheme R of relation r.
In horizontal fragmentation, a relation r is partitioned into a number of subsets, r1,
r2,...,rn. Each tuple of relation r must belong to at least one of the fragments, so that the original
relation can be reconstructed, if needed.
account1 = branch name = Hillside‖ (account)
account2 = branch name = Valleyview‖ (account)
Horizontal fragmentation is usually used to keep tuples at the sites where they are used the most,
to minimize data transfer.
In general, a horizontal fragment can be defined as a selection on the global relation r.
That is, we use a predicate Pi to construct fragment ri:
We reconstruct the relation r by taking the union of all fragments; that is: r = r1 r2 ··· rn
Transperency :
The user of a distributed database system should not be required to know where the data
are physically located nor how the data can be accessed at the specific local site. This
characteristic, called data transparency, can take several forms:
Fragmentation transparency. Users are not required to know how a relation has been
fragmented.
Replication transparency. Users view each data object as logically unique. The distributed
system may replicate an object to increase either system performance or data availability. Users
do not have to be concerned with what data objects have been replicated, or where replicas
have been placed.
Location transparency. Users are not required to know the physical location of the data. The
distributed database system should be able to find any data as long as the data identifier is
supplied by the user transaction.
• Breaking the transaction into a number of sub transactions and distributing these sub
transactions to the appropriate sites for execution.
• Coordinating the termination of the transaction, which may result in the transaction being
committed at all sites or aborted at all sites.
System Failure Modes
o Failure of a site.
o Loss of messages.
o Failure of a communication link.
o Network partition.