2006
2006
DISTRIBUTED DATABASE
(SEMESTER—7)
II) which of the following computing models is used by distributed database system?
a) Mainframe computing model.
b) Disconnected, personal computing model
c) Client / Server computing model c
d) none of these.
III) Which of the following refers to the operation of copying and maintaining database objects in multiple databases
belonging to a distributed system?
a) Backup b) Recovery
c) Replication d) None of these. c
iv) Which of the following refers to the fact the command used to perform a task independent of the location of the
data and location of the system where the command used?
a) Naming transparency
b) Location transparency
c) Fragmentation transparency
d) All of these, d
v) Which of the following is the probability that the system is up and running at a certain point in time?
a) Reliability b) Availability
c) Maintainability d) None of these. c
C8/B.TECH (CSE)/SEM-7/CS-704 A/06 4
vii) Which of. the following is the recovery management technique in the entire distributed system?
. a) Deferred update b) Immediate update
c) Two-phase commit d) None of these. a
x) Which of the following strategies is designed to ensure that either all the databases are updated or none of them
are?
a) Two-phase commit b) two - phase locking
c) two-phase update N) none of these d
Group—B
(Short Answer Questions)
Ans. Query processing and optimization is much more difficult in distributed environment than in
centralized environment because:
• A large number of parameters affect the performance of distributed queries.
• Relations involved in a distributed query may be fragmented and/or replicated.
• With many sites to access, query response time may become very high.
It is quiet evident that performance of DBMS is critically dependent upon
the ability of the query optimization algorithm to derive efficient query processing
strategies.DDBS query optimization algorithms attempts to reduce the quantity of data
transferred.Minimising the quality of data transferred is a desirable optimization criterion
since more data transported across telecommunication networks require more time and
labour.The distributed query optimization has several problems that relate to: cost model,
large set of queries, optimization cost and execution cost trade-off, and
optimization/reoptimization interval.
Ans. There are certain benefits of this model. They are the following-
• It is possible to repair, replace, upgrade or even relocate a server while its client remains
both unaware and unaffected.
• Since data storage is centralized, updates to the data are far easier to the administrator.
• Many mature client server technologies are already available which were designed to ensure
security, friendliness of the user interface, and easy to use.
• It functions with multiple different clients with different capabilities.
7.
a. Simplify the following query using idempotency rules:-
SELECT ENO
From ASG
Where (NOT (TITLE=”programmer”)
AND (TITLE=”programmer”)
OR (TITLE=”elect.engg”)
AND NOT (TITLE=”ELECT.ENGG”))
OR ENAME=”J.Das”
ANS:-
Let P1 be <TITLE=”programmer”>
P2 be <TITLE=”elect.engg”>
P3 be <ENAME=”J.Das”>
Then the query becomes (¬ P1 ^ (P1 ˇ P2) ^ ¬ P2) ˇ P3
Therefore, the disjunctive normal form is( ¬ P1^ (( P1^ ¬ P2) ˇ ( P2 ^ ¬P2))) ˇ P3
It is obtained by applying rule 5 that is, P1 ^ (P2 ˇ P3) <=> (P1 ^ P2) ˇ (P1 ^ P3)
Then apply following rule:-
P1 ^ (P2 ^ P3) <=> (P1 ^ P2) ^ P3
The reduced query is then,
(¬P1 ^ P1 ^ ¬P2) ˇ (¬P1 ^ P2 ^ ¬P2) ˇ P3
=> (false ^ ¬P2) ˇ (¬P1 ^ false) ˇ P3
=> (false ˇ false) ˇ P3
=>P3
Therefore the result of simplified query is P3 i.e., ENAME=J.Das
b) Describe the two phase commit protocol. What are the demerits on this protocol?
Ans:
We are listing down all the testcases for the Two phase commit protocol and the one phase
commit protocols. For executing and testing the following testcases you only need to run the
participating site for the participants once. and then you need to run the CoordinateSite 9 times
for each of the testcases. We have written a simmulation function in the participating site for
demonstration that takes care of diffrent testcases automatically.In that function we change the
state of the transaction to Precommit or Abort, and set the flag for deferred constraints checks
to be pass or fail. Based on the requirements of the different testcases, diffrent values are set for
these variables and the protocol then shows the corresponding behaviors. The correctness can
be ensured from the logs and the debug messages left for demonstration.
1. Transaction is successfully executed. It reaches the precommit state and votes yes. No
delays takes place so no site fails. The output is PREPARE T and COMMIT log in
the coordinator sites log file and the READY and COMMIT message in the logs of the
participating sites.
2. Transaction is successfully executed. It reaches the precommit state and votes yes. During
precommit delay takes place at participating sites so participating site fails. The output
is PREPARE T and ABORT log in the coordinator sites log file and the READY and
ABORT message in the logs of the participating sites.
3. Transaction is successfully executed. It reaches the precommit state and votes yes. During
commit delays takes place so participating site fails. The output is PREPARE T and
COMMIT log in the coordinator sites log file and the READY and COMMIT message in
the logs of the participating sites except for the sites where commit delayed.
4. Transaction is successfully executed. It reaches the precommit state, deferred constraints
failed and it votes NO. No delays takes place so no site fails. The output is PREPARE T
log in the coordinator sites log file and the READY and ABORT message in the logs of
the participating sites.
5. Transaction is successfully executed and reaches the precommit state. Deferred constraints
failed and site votes NO. During precommit delay takes place at participating sites so
participating site fails. The output is PREPARE T log in the coordinator sites log file
and the READY and ABORT message in the logs of the participating sites.
6. Transaction is successfully executed and reaches the precommit state. Deferred constraints
failed and site votes NO. During abort delays takes place so participating site fails. The
output is PREPARE T log in the coordinator sites log file and the READY and ABORT
message in the logs of the participating sites except for the sites where abort delayed.
7. Transaction fails and goes to abort state ,sites votes NO. No delays takes place so no site
fails. The output is PREPARE T log in the coordinator sites log file and the READY and
ABORT message in the logs of the participating sites.
8. Transaction fails and goes to abort state ,sites votes NO. During precommit delay takes
place at participating sites so participating site fails. The output is PREPARE T log in
the coordinator sites log file and the READY and ABORT message in the logs of the
participating sites.
9. Transaction fails and goes to abort state , votes NO. During abort delays takes place so
participating site fails. The output is PREPARE T log in the coordinator sites log file
and the READY and ABORT message in the logs of the participating sites.
This is the most widely used protocol which commences in two phases
Voting , to ensure that all sites are ready to commit
Decision , to ensure uniformity at abort or commit at all sites.
2PC is largely used in all distributed systems.
1. Advantages
• It ensures atomicity even in the presence of deferred constraints.
• It ensures independent recovery of all sites.
• Since it takes place in twophases, it can handle network failures, disconnections and
in their presence assure atomicity.
2. Disadvantages
• Involves a great deal of message complexity.
• Greater communication overheads as compared to simple optimistic protocols.
• Blocking of site nodes in case of failure of coordinator.
• Multiple forced writes of log, which increase latency.
• Its performance is again a trade off,especially for short lived transactions, like Internet
applications.
8.
a) describe how the deadlock is detected in distributed system?
b) what is the false deadlock?
c)what is the different between the reliability and the availability?
Ans
a)
An algorithm for detecting deadlocks in a distributed system was proposed by Chaudy, Misra, and
Haas in 1983. It allows that processes to request multiple resources at once (this speeds up the
growing phase). Some processes may wait for resources (either local or remote). Cross-machine arcs
make looking for cycles (detecting deadlock) hard.
The algorithm works this way: When a process has to wait for a resource, a probe message is sent
to the process holding the resource. The probe message contains three components: the process that
blocked, the process that is sending the request, and the destination. Initially, the first two
components will be the same. When a process receives the probe: if the process itself is waiting on a
resource, it updates the sending and destination fields of the message and forwards it to the resource
holder. If it is waiting on multiple resources, a message is sent to each process holding the resources.
This process continues as long as processes are waiting for resources. If the originator gets a message
and sees its own process number in the blocked field of the message, it knows that a cycle has been
taken and deadlock exists. In this case, some process (transaction) will have to die. The sender may
choose to commit suicide or a ring election algorithm may be used to determine an alternate victim
b)
False Deadlocks : False Deadlocks One possible way to prevent false deadlock is to use the
Lamport’s algorithm to provide global timing for the distributed systems. When the coordinator gets a
message that leads to a suspect deadlock: It send everybody a message saying “I just received a message
with a timestamp T which leads to deadlock. If anyone has a message for me with an earlier timestamp,
please send it immediately” When every machine has replied, positively or negatively, the coordinator will
see that the deadlock has really occurred or not.
c)
As stated earlier, availability represents the probability that the system is capable of conducting its
required function when it is called upon given that it is not failed or undergoing a repair action.
Therefore, not only is availability a function of reliability, but it is also a function of maintainability.
Table 1 below displays the relationship between reliability, maintainability and availability. Please note
that in this table, an increase in maintainability implies a decrease in the time it takes to perform
maintenance actions
9. Short note:-
a. Primary copy locking.
b. Fragmentation transparency.
c. Network partitioning.
d. Pessimistic approach of concurrency control algorithm.
e. Checkpoints.
Ans:-
a. Primary copy locking:-
The primary copy locking is one type pessimistic concurrency control algorithm. In primary copy locking, One of the
copies (if there are multiple copies) of each lock unit is designated as the primary copy, and it is this copy that has to
be locked for the purpose of accessing that particular unit.
For example, if lock unit x is replicated at sites 1, 2, and 3, one of these sites ( say, 1 ) is selected as the
primary site for x. all transactions desiring access to x obtain their lock at site 1 before they can access a copy of x. If
the database is not replicated (i.e., there is only one copy of each lock unit), the primary copy locking mechanisms
distribute the lock management responsibility among a number of sites.
b. Fragmentation transparency:-
The final form of transparency that needs to be addressed within the context of a distributed database system is that of
fragmentation transparency. Fragmentation transparency the highest degree of transparency. This transparency
consists of the fact that the user or the application programmer works on global relation.
It is commonly desirable to divide each database relation into smaller fragments and treat each fragment as a
separate database object (i.e., another relation). This is commonly done for the reason of performance, availability,
and reliability. Furthermore, fragmentation can reduce the negative effect of replication. Each replica is not the full
relation but only a subset of it; thus less space is required and fewer data items need to be managed.
.
c. Network partitioning:-
Network partitions are due to communication line failures and may cause the loss of messages, depending on the
implementation of the communication subnet. A partitioning is called the simple partitioning if the network is divided
into only two components; otherwise, it is called multiple partitioning.
The termination protocols for network partitioning address the termination of the transactions that were active
in each partition at the time of partitioning. If one can develop non-blocking protocols to terminate these transactions,
it is possible for the sites in each partition to reach a termination decision (for a given transaction) which is consistent
with the sites in other partitions. This would imply that the sites in each partition can continue executing transactions
despite the partitioning. Unfortunately, it isn’t in general possible to find non-blocking termination protocols in the
presence of network partitions. It also means that if network partitioning occurs, we can’t continue normal operations
in all partitions, which limits the availability of the entire distributed database system. However, it is possible to
design non-blocking atomic commit protocols that are resilient to simple partitions. But if multiple partitions occur, it
isn’t possible to design such protocols.
Centralized Basic
Distributed Conservative
1. Locking:-
In locking based approach, the synchronization of transactions is achieved by employing physical or logical locks on
some portion of the database.
i. Centralized:-
One of the sites in network is designated as primary site where the lock table for entire database is stored & is charged
with the responsibility of granting locks to transactions.
ii. Primary copy:-
One of the copies of each lock unit is designated as the primary copy & it is the copy that has to be locked for the
purpose of accessing that particular unit.
iii. Distributed:-
The lock management duty is shared by all the sites of a network. The execution of a transaction involves the
participation & co-ordination of schedulers at more than one site. Each lock scheduler is responsible for the lock unit
local to that site.
2. Timestamp ordering:-
i. Basic:-
The coordinating transaction manager assigns the timestamp to each transaction, determines the sites where each data
item is stored, & sends the relevant operations to that site.
ii. Multi version:-
The update doesn’t modify the database. Each write operation creates a new version of that data item. Each version is
marked by the timestamp of the transaction that creates it.
iii. Conservative:-
The operations of each transactions are buffered until an ordering can be established so that rejections aren’t possible,
& they are executed in that order.
3. Hybrid:-
In some locking based algorithm, timestamps are also used. This is done primarily to improve efficiency & the level
of concurrency. This algorithm are called hybrid.
e. Checkpoints:-
In most of the Local Reliability Protocol, the execution of the recovery action requires searching the entire log. This is
significant overhead because the LRM is trying to find all the transactions that need to be undone & redone. The
overhead can be reduced if it is possible to build a wall which signifies that the database at that point is up-to-date &
consistent. In that case, the redo has to start from the point on & the undo only has to go back to that point. This
process of building the wall is called checkpointing.
Checkpointing is achieved in three steps:-
1. Write a begin_checkpoint record into the log.
2. Collect the checkpoint data into the stable storage.
3. Write an end_checkpoint record into the log.
The 1st & 3rd steps enforce the atomicity of checkpointing operation. If a system failure occurs during checkpointing,
the recovery process will not find an end-checkpoint record & will consider checkpointing not completed.
5. Enter the needed information in the setup dialog box that appears.
Operation:
1. User requests data from the local host.
2. The goes out over the network to submit the request for data or service to a remote host.
3. Remote host processes the request and sends the data or the results back to the local host.
4. Local host hands the reply to the client, which is unaware that the request was executed by multiple servers.
Benefits:
1. Placement of data closer its source.
2. Automatic movement of data to where it is most needed.
3. Placement of data closer to the users (through replication).
4. Higher data availability through data replication.
5. Higher fault tolerance through elimination of a single point of failure.
6. Potentially more efficient data access (higher throughput and greater potential for parallelism).
7. Better scalability w.r.t. the application and users' needs.
In computer networking and databases, the three-phase commit protocol (3PC) is a distributed
algorithm which lets all nodes in a distributed system agree to commit a transaction. Unlike the two-phase
commit protocol (2PC) however, 3PC is non-blocking. Specifically, 3PC places an upper bound on the
amount of time required before a transaction either commits or aborts. This property ensures that if a given
transaction is attempting to commit via 3PC and holds some resource locks, it will release the locks after
the timeout. The basic observation is that in 2PC, while one site is in the “prepared to commit” state, the
other may be in either the “commit” or the “abort” state. From this analysis, 3PC is developed to avoid such
states and it is thus resilient to such failures.
Protocol description:
In describing the protocol, we use terminology similar to that used in the two-phase commit protocol. Thus
we have a single coordinator site leading the transaction and a set of one or more cohorts being directed
by the coordinator.
Coordinator:
1. The coordinator receives a transaction request. If there is a failure at this point, the coordinator aborts the
transaction (i.e. upon recovery, it will consider the transaction aborted). Otherwise, the coordinator sends
a canCommit? message to the cohorts and moves to the waiting state.
2. If there is a failure, timeout, or if the coordinator receives a No message in the waiting state, the coordinator
aborts the transaction and sends an abort message to all cohorts. Otherwise the coordinator will
receive Yes messages from all cohorts within the time window, so it sends preCommit messages to all cohorts
and moves to the prepared state.
3. If the coordinator fails in the prepared state, it will move to the commit state. However if the coordinator
times out while waiting for an acknowledgement from a cohort, it will abort the transaction. In the case where
all acknowledgements are received, the coordinator moves to the commit state as well.
Cohort:
1. The cohort receives a canCommit? message from the coordinator. If the cohort agrees it sends
a Yes message to the coordinator and moves to the prepared state. Otherwise it sends
a No message and aborts. If there is a failure, it moves to the abort state.
2. In the prepared state, if the cohort receives an abort message from the coordinator, fails, or times
out waiting for a commit, it aborts. If the cohort receives a preCommit message, it sends
an ACK message back and commits.
Disadvantages:
The main disadvantage to this algorithm is that it cannot recover in the event the network is segmented in
any manner. The original 3PC algorithm assumes a fail-stop model, where processes fail by crashing and
crashes can be accuratly detected, and does not work with network partitions or asynchronous
communication.
11. Write the advantages of Distributed DBMS over Centralized DBMS. Give an example of a Bank application,
accessing a database which is distributed over the branches of the bank, in which the relevant predicates for data
distribution are not in the text of the application program. Discuss different levels of distribution transparency.
2. Interconnection of existing databases: Distributed databases are the natural solution when several
databases already exist in the organization and the necessity of performing global application
arises. In this case, the distributed database is created bottom-up from the preexisting local
databases. This process may require a certain a degree of local restructuring; however, the effort
which is required by this restructuring is much less than that needed for the creation of a completely
new centralized database.
4. Reduced communication overhead: In a geographically distributed database, the fact that many
applications are local clearly reduces the communication overhead with respect to a centralized
database. Therefore, the maximization of the locality of applications is one of the primary objectives
in distributed database design.