0% found this document useful (0 votes)
15 views37 pages

ADBMS Chapter No. 3

The document outlines the curriculum for a course on Advanced Database Management Systems, focusing on Distributed Databases. It covers various topics including types of distributed databases, data storage methods, transaction management, and commit protocols. Key concepts such as homogeneous and heterogeneous databases, replication, fragmentation, and concurrency control are also discussed, along with learning objectives and exam questions related to the subject.

Uploaded by

aaradhnavishwas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views37 pages

ADBMS Chapter No. 3

The document outlines the curriculum for a course on Advanced Database Management Systems, focusing on Distributed Databases. It covers various topics including types of distributed databases, data storage methods, transaction management, and commit protocols. Key concepts such as homogeneous and heterogeneous databases, replication, fragmentation, and concurrency control are also discussed, along with learning objectives and exam questions related to the subject.

Uploaded by

aaradhnavishwas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

MCA Distributed Databases

NOTES:

1. Subject Code : IT -34 Subject Name : Advanced Database


management System

2. Learning Objectives of the Course ADBMS :

To know about different database handling techniques. To gain an awareness of the


basic issues in objected oriented data models, learn about the Web-DBMS integration
technology and XML for Internet database applications, familiarize with the data-
warehousing and data-mining techniques and other advanced topics.

3. Unit Name : Distributed Databases

4. Contents of the Unit

3.1 Introduction,

3.2 Homogeneous and Heterogeneous Databases

3.3 Distributed data storage,

3.4 Distributed transactions

3.5 Commit protocols

3.6 Concurrency control

3.7 Availability

3.8 Cloud based databases,

3.9 Directory systems

Learning Objectives of the Unit : to study Commit protocol to work in the distributed
environment.

5. Key Definitions, Key Words in the definitions

Distributed Databases : Distributed Database refers to collection of multiple


logically interrelated databases distributed over network

6. Key Concepts

Homogeneous & Heterogeneous Databases, commit protocol, concurrency control,


Cloud based databases, Directory Database

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

7. Questions Asked in the University Exam

Q1.) Explain various Concurrency Control approaches in DDBMS (Nov. 2009 6M, Apr
2010 6M, Nov 2012 10M, Apr 2013 6M)

Q2.) Explain lock manager in Distributed Databases (Nov. 2009 6M)


Q3.) Explain Deadlock Handling in DDBMS. (Nov. 2010 6M)

Q4.) Compare With example Heterogeneous & Homogeneous databases (Apr. 2010 6M,
Nov. 2010 6M, Apr 2013 6M)

Q5.Explain 2PC in detail.(Apr. 2011 10M, Nov. 2011 10M)

Questions For Practice :

Q.1) Write Short Note on : 1.Cloud Based Databases

2. Directory System

Learning Resources :

Reference Book :

1. Data Mining Concepts & Techniques – Jiawaei & Micheline Kamber ,


ELSEVIER second Edition.

2. Database system concepts', 6th Edition –Abraham Silberschatz, Henry Korth,


S, Sudarshan, (McGraw Hill International )

3. Database systems : "Design implementation and management"- Rob


Coronel, 4thEdition, (Thomson Learning Press)

4. Database Management Systems - Raghu Ramkrishnan, Johannes Gehrke Second Edition,


(McGraw Hill International )

Reference Link : http://www.oracle.com/technetwork/articles/sql/11g-dw-olap-100058.html

http://www.cs.ccsu.edu/~markov/ccsu_courses/DataMining-3.html

Distributed Databases
Prof. Khandagale S.P. UNIT NO. 3
MCA Distributed Databases

Distributed Database refers to collection of multiple logically interrelated


databases distributed over network. A distributed database system consists of loosely coupled
sites that share no physical component. Database systems that run on each site are independent
of each other. Transactions may access data at one or more sites. Data in distributed database
system is stored across several sites, and each site is typically managed by a DBMS that run
independent of other sites. DDBMS consist of single large logical database this database is divided
into number of fragments. Applications handled by DDBMS are often divided into two categories
local application & global application Users should be able to ask queries without specifying where
the referenced relations or copies or fragments of the relations are located.
Users should be able to write transactions that access and update data at several sites just as they
would be write transactions over purely local data. In particular, the effect of a transaction across
sites should continue to be atomic.

Types of Distributed Databases:


1.Homogenoeus Distributed Databases:
If data is distributed but all servers run the same DBMS software then it is known as
homogeneous distributed databases.
Designing of homogeneous system is simple, also the addition of new site is easy.
They also cooperate with other sites in exchanging information about transaction.
These site are aware of each others existence.
2.Heterogeneous Distributed Databases:
If different sites run under the control of different DBMS’s essentially autonomously, and are
connected some how to enable access to data from multiple sites, then it is known as
Heterogeneous distributed databases.
Different site may use different schema & different DBMS s/w.
The sites may not be aware of one another & they may provide limited facilities for cooperation in
transaction processing.
The different schemas are major problem for query processing. Also different s/w become an
obstacle for processing transaction from different sites.
Manipulation of information located in heterogeneous distributed database requires additional
s/w layer, this layer exist on the top of the existing database system.
So it is also known as a Multi-database system. In such type of system each site maintains
complete autonomy.
Types of Heterogeneous Databases
– Federated (Single Schema): Each site may run different database system but the data
access is managed through a single conceptual schema. This implies that the degree of
local autonomy is minimum. Each site must adhere to a centralized access policy.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

– Multidatabase (No Schema): There is no one conceptual global schema. For data access.
a schema is constructed dynamically as needed by the application software.
Distributed Database Architecture:
There are 3 distributed database architectures, they are:
Client-Server System Architecture:
Client-server system has one or more client processes and one or more server processes,
and a client process can send a query to any one server process.
Clients are responsible for user-interface issues and servers manage data and execute
transactions.
Thus a client process could run on a personal computer and send queries to a server
running on a mainframe.
This architecture has been popular due to several reasons.
First, it is relatively simple to implement due to clean separation of functionality because
the server is centralized.
Second, Client are responsible to break queries into subqueries get executed by other sites
& merge the result.
And third users can run a GUI that they are familiar with, rather than the user interface on
the server.
Collaborating Server System Architecture:
In this we have a collection of database servers, When a server receives query that requires
access to data at other servers
It generate appropriate sub queries.
These sub queries are executed by other servers.
The results are put together to form final result.
Middleware System Architecture:
It is designed to allow a single query to span multiple servers without requiring all
database servers to be capable of managing such multi-site execution strategies.
In this one database server capable of managing queries and transactions spanning
multiple servers, the remaining servers need to handle only queries and transactions.
This special server as a layer of software that co-ordinates the execution of queries and
transactions across one or more independent database servers, such software is often
called as middleware system.
It is capable of executing joins and other relational operations on data obtained from the
other servers but, typically does not itself maintain any data.

Distributed Data Storage:


Consider a relation ‘r’ i.e to be stored in database. There are two approaches to storing
this relation in the distributed databases.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

1. Replication: The system maintains several identical copies or replicas of the relation, and store
each replica at a different sites.
The alternative to replication is to store only one copy of relation ‘r’.
2. Fragmentation: The system partitions the relation into several fragments, and stores each
fragment at different site.
Fragmentation and replication can be combined like a relation can be partitioned into
several fragments and there may be several replicas of each fragment.
Data Replication:
If a relation ‘r’ is replicated, a copy of relation ‘r’ is stored in 2 or more sites. In the most
extreme case, we have full replication, in which a copy is stored in every site in the system.
Advantages of data replication:
Availability: If one of the site is containing relation ‘r’ fails then the relation ‘r’ can be
found in another site. Thus the system can continue to process queries involving r despite
failure of one site.
Increased Parallelism:
In this case where the majority of access to the relation ‘r’ result in only the reading of the
relation then several sites can process queries involving ‘r’ in parallel.
The more replicas of ‘r’ there are, the greater the chance that the needed data will be
found in the site where the transaction is executing.
Hence data replication minimizes movement of data between sites.
Disadvantage of data replication:
Increased overhead on update:
The system ensure that all replicas of a relation ‘r’ are consistent; otherwise erroneous
computation may result.
Thus whenever ‘r’, is updated the updated must propagated to all sites containing replicas.
The result is increased overhead.
For Ex: In a banking system where account information is replicated in various sites, it is
necessary to ensure that the balance in a particular account agrees in all sites.
Replication increases the availability of the data to read any transaction.
But controlling updates by several transactions to replicated data is more complex than in
centralized systems.
Data Fragmentation:
If relation ‘r’ is fragmented, ‘r’ is divided into a number of fragments r1,r2,…..,rn.
There are two different schemas for fragmenting a relation:
Horizontal fragmentation and vertical fragmentation.
We can illustrate fragmentation taking the relation account schema.
Account Schema=(account-number, branch-name, balance).
Horizontal Fragmentation:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

In Horizontal Fragmentation a relation ‘r’ is partitioned into a number of subsets r1,r2,…..,rn,


each tuple of relation ‘r’ must belong to atleast one of the fragment, so that the original relation
can be reconstructed if needed.
Ex: Tuples of accounts belonging to a particular branch, if bank has two branches Pune and
Mumbai, than there are two different fragments.
Account1=branch-name
=“Pune” (account).
Account2=branch-name=“Mumbai” (account).
Horizontal fragmentation is usually used to keep tuples at the sites where they are used to
most, to minimize data transfer.
Vertical fragmentation:
Vertical fragmentation of relation ‘r’ contains several subsets of attributes i.e columns of the
relation are fragmented.
Fragmentation should be done in such a way that we can construct relation r from the
fragments by taking the natural join.
r=r1joinr2joinr3join……..rn
Advantages of fragmentation:
Efficiency - data stored close to where it is most frequently used.
Parallelism - a transaction can divided into several sub-queries to increase degree of
concurrency.
Security - data more secure - only stored where it is needed.
Disadvantages of fragmentation:
Performance - may be
slower. Integrity - more
difficult. Data Transparency:
The user of a database system should not be required to know either where the data are
physically located or how the data can be accessed at the specified local site.
This characteristic feature is called Data Transparency. It is of 3 types:
1. Fragmentation Transparency:
Users are not required to know how a relation has been
fragmented 2. Replication Transparency:
Users do not have to be concerned with what data object have been replicated, or where
replicas have been placed.
3. Location Transparency:
Users are not required to know the physical location of the data.
The distributed database system should be able to find any data as long as the data identifier
is supplied by the user transaction.
Distributed Transactions:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Transaction is collection of operations that form a single logical unit of work


To ensure data integrity, the DBMS follows the following transaction
properties: Atomicity:
Either all operations or none are reflected in database. This is also known as ‘all or
nothing’. Consistency:
Execution of a transaction in isolation (i.e with no other executing concurrently), preserving
the consistency of database.
Isolation:
Each transaction is unaware of other transaction executing in a
system. Durability:
After a transaction is successfully completed, the changes it has made to the database persist,
even if the system failures.
Types of Transactions:
Local Transaction:
It is one that access data only from sites where the transaction was initiated.
Global transaction:
It access data in a site different from the one at which the transaction initiated, or accessed
data in several different sites.

Distributed SQL Statements:


Distributed query:
These statements query statement retrieves information from 2 or more nodes
For Ex: The following query access data from the local database as well as the remote
branch database:
Select Emp_name, dep_name From Pune.Employee As e,
Pune.department As d @branch.Inl.net WHERE e.dept.no=d.dept.no;
Distributed update:
These statements update data on 2 or more nodes.
In oracle database environment, a distributed update is possible using a PL/SQL
subprogram unit such as a procedure or trigger that includes 2 or more remote updates
that access data on different nodes.
For Ex:
The following PL/SQL program unit updates tables on the local database and the remote
sales database
BEGIN
UPDATE
[email protected]
SET Manager = “Mathews Leaon”

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

WHERE Dept.no= 10;


UPDATE Pune.employee
SET dept.no=15 WHERE dept.no=10;
END;
COMMIT;
Distributed Transaction:
Is a transaction that includes one or more statements that individually or as a group, update
on 2 or more distinct nodes of a distributed database.
Commit:
A COMMIT statement means that the changes made during a transaction to a particular
database (or a set of databases) are permanent and become visible to other users.
Abort:
When a Transaction is failed to execute, then it is said aborted or failed.
Roll Back:
A ROLLBACK statement will allow you to cancel the modifications made during the current
transaction.
Transaction Manager :each site has a local transaction manager.
TM is responsible for to initiate a transaction. If transaction is local only one TM is needed For
global transaction Various TM Cooperate with each other
TM is responsible for -
Maintaining log recovery
-Partitioning in concurrency control
Transaction Coordinator / Recovery Manager :
Performs 2PC. Starting the execution of transaction Breaking transaction into number of sub
transaction & distribute them on appropriate sites.
System Failure : It may Happen due to following reasons
Software errors
Hardware errors
Disk Crashes
Failure of site
Failure of communication link
Commit Protocols:
Atomicity states that database modifications must follow an “all or nothing” rule.
A transaction which executes at multiple sites must either be committed at all the sites, or
aborted at all the sites.
Commit protocol is of two types -Two
phase commit protocol -
Three phase commit protocol

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Two Phase Commit protocol (2PC):


Two-phase commit is a transaction protocol designed for the complications that arise with
distributed resource managers.
The commit process proceeds as follows:
Phase 1 or Prepare Phase:

Step 1 Coordinator asks all participants to <prepare> to commit transaction Ti.

  log to stable storage (a log is a file which
Ci adds the records <prepare T> to the log and forces
maintains a record of all changes to the database).
 
 Sends <prepare T> messages to all sites where T executed.

 
 Step 2 Upon
 receiving message, transaction manager at site determines if it can commit the
transaction.
 
If not:
Add a record <no T> to the log and send <abort T> message to Ci
If the transaction can be committed, then:
1) Add the record <ready T> to the log.
2) Force all records for T to stable storage.
3) Send <ready T> message to Ci..
Phase 2 or Execute Phase:

Step 1 T can be committed if Ci received a <ready T> message from all the participating sites:
otherwise T must be <aborted>.

Step 2 Coordinator adds a decision record, <commit T> or <abort T>, to the log and
forces. record onto stable storage. Once the record is in stable storage, it cannot be revoked (even
if failures occur).
 
 Step 3  Coordinator sends a message to each participant informing it of the decision (commit or
abort).
  
Step 4 Participants take appropriate action locally.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Handling of failures in 2pc


-Failure of participating site:
- Failure of Coordinator
- Network Failuer
Handling of Failures - Site Failure When site Si recovers, it examines its log to determine the
fate of transactions active at the time of the failure.
Log contain <commit T> record: site executes redo (T)
Log contains <abort T> record: site executes undo (T)
Log contains <ready T> record: site must consult Ci to determine the fate of T.
 
 If T committed, redo (T)
 
If T aborted, undo (T)
The log contains no control records concerning T implies that Sk failed before responding to the
prepare T message from Ci Sk must execute undo (T)

Handling of Failures- Coordinator Failure


If coordinator fails while the commit protocol for T is executing then participating sites must
decide on T’s fate:
1. If an active site contains a <commit T> record in its log, then T must be committed.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

2. If an active site contains an <abort T> record in its log, then T must be aborted.
recover.
3. If some active participating site does not contain a <ready T> record in its log,
then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T.
4. If none of the above cases holds, then all active sites must have a <ready T>
record in their logs, but no additional control records (such as <abort T> of <commit T>).
In this case active sites must wait for Ci to recover, to find decision.
Blocking problem: active sites may have to wait for failed coordinator
to Handling of Failures - Network Partition
If the coordinator and all its participants remain in one partition, the failure has no effect on the
commit protocol.
If the coordinator and its participants belong to several partitions:

Sites that are not in the partition containing the coordinator think the coordinator has
failed, and execute the protocol to deal with failure of the coordinator.
No harm results, but sites may still have to wait for decision from coordinator.
The coordinator and the sites are in the same partition as the coordinator think
that the sites in the other partition have failed, and follow the usual commit
protocol. Again, no harm results
Recovery and Concurrency Control
In-doubt transactions have a <ready T>, but neither a <commit T>, nor an <abort T> log
record.
The recovering site must determine the commit-abort status of such transactions by
contacting other sites; this can slow and potentially block recovery.
Recovery algorithms can note lock information in the log.
Instead of <ready T>, write out <ready T, L> L = list of locks held by T when the
log is written (read locks can be omitted).
For every in-doubt transaction T, all the locks noted in the <ready T, L> log
record are reacquired.
After lock reacquisition, transaction processing can resume; the commit or rollback of
in-doubt transactions is performed concurrently with the execution of new transactions.

Three Phase Commit protocol (3PC):


It is an extension of 2PC .
It avoids blocking problem under certain assumptions.
It assumes that network partition do not occur & not more than k sites fail. Here, K is
predetermined number.
3PC avoids blocking by adding third phase in 2PC.
In this phase multiple sites are involved in decision making.
When transaction is about to commit , k sites are made aware of this.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

This commit is not stored directly in persistent storage. If


TC fails then the participating sites selects a new TC
The new TC checks the protocol status from the other sites.
If previous TC about to commit new TC will omit it otherwise it aborts the transaction.
Drawback : Partition network will appear & will lead to blocking problem.
Protocol must be implemented with care, else it will result in inconsistencies.
Due to this 3PC protocol is rarely used.

Concurrency Control in Distributed Databases


In distributed System we assume that commit protocol ensure global transaction atomicity.
Means data updating is done at all site where data is replicated. Locking Protocols provides
concurrency control.
Some of such approaches are as:
1) Single lock Manager Approach
2) Distributed lock Manager Approach
Single-Lock-Manager Approach
System maintains a single lock manager that resides in a single chosen site, say Si
When a transaction needs to lock a data item, it sends a lock request to S i and lock
manager determines whether the lock can be granted immediately
 
 If yes, lock manager sends a message to the site which initiated the request
 
If no, request is delayed until it can be granted, at which time a message is sent to
the initiating site
The transaction can read the data item from any one of the sites at which a replica of
the data item resides.
Writes must be performed on all replicas of a data item
Advantages of scheme:
 
 Simple implementation : requires only two messages for handling Lock & Unlock request
 
Simple deadlock handling :Since all lock & unlock request handled at one site
deadlock handling is easy.
Disadvantages of scheme are:

 Bottleneck: lock 
manager site becomes a bottleneck, since all the requests must be
processed there.
 
Vulnerability: system is vulnerable to lock manager site failure. Means if site Si is
failed concurrency control is lost
Distributed Lock Manager
In this approach, functionality of locking is implemented by lock managers at each site
 
 Lock managers control access to local data items
 
But special protocols may be used for replicas

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Advantage: work is distributed and can be made robust to


failures Disadvantage: deadlock detection is more complicated

Lock managers cooperate for deadlock detection
Several variants of this approach
Primary copy
Majority protocol
Biased protocol
Quorum consensus
Primary Copy
Choose one replica of data item to be the primary copy.
 
 Site containing the replica is called the primary site for that data item
 
Different data items can have different primary sites
When a transaction needs to lock a data item Q, it requests a lock at the primary site of .

Implicitly(without requesting) gets lock on all replicas of the data item

Concurrency Control in Distributed Database


Concurrency control schemes dealt with handling of data as part of concurrent transactions.
Various locking protocols are used for handling concurrent transactions in centralized database
systems. There are no major differences between the schemes in centralized and distributed
databases. The only major difference is that the way the lock manager should deal with the
replicated data. The following topics discusses about several possible schemes that are
applicable to an environment where data can be replicated in several sites. We shall assume
the existence of the shared and exclusive lock modes.
Locking protocols
1. Single lock manager approach
2. Distributed lock manager approach
a) Primary Copy protocol
b) Majority protocol
c) Biased protocol
d) Quorum Consensus protocol

Some assumptions:
Our distributed database system consists of n sites (servers/computers in different
locations) Data are replicated in two or more sites

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Single Lock Manager - Concurrency Control in Distributed Database


Concurrency Control in Distributed Database - Single Lock Manager ApproachIn this approach, the
distributed database system which consists of several sites, maintains a single lock manager at a
chosen site as shown in Figure 1.Observe Figure 1 for Distributed Sites S1, S2, …, S6 with
Site S3 chosen as Lock-Manager Site.

Figure 1 - Single Lock Manager Approach

The technique works as follows;


When a transaction request for locking some data items, the request must be forwarded
to the chosen lock manager site for locks. This is done by the Transaction manager of site where
the request is initiated.
The lock manager at the chosen lock-manager site decides to grant the lock request
immediately based on the usual procedure. [That is, if a lock is already held on the requested
data item by some other transactions in an incompatible mode, lock cannot be granted. If the
data item is free or data item is locked in a compatible mode, the lock manager grants the lock]
If lock request granted, the transaction can read from any site where the replica is
available.
On successful completion of transaction, the Transaction manager of initiating site can
release the lock through unlock request to the lock-manager site.
Example – Transaction handling in Single Lock-Manager Approach:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 2 - Steps of Transaction handling in Single Lock Manager approach

Let us assume that the Transaction T1 is initiated at Site S5 as shown in Figure 2 (Step 1). Also,
assume that the requested data item D is replicated in Sites S1, S2, and S6. The steps are
numbered in the Figure 2. According to the discussion above, the technique works as follows;
Step 2 - The initiator site S5’s Transaction manager sends the lock request to lock data
item D to the lock-manager site S3.
The Lock-manager at site S3 will look for the availability of the data item D.
Step 3 - If the requested item is not locked by any other transactions, the lock-manager
site responds with lock grant message to the initiator site S5.
Step 4 - As the next step, the initiator site S5 can use the data item D from any of the
sites S1, S2, and S6 for completing the Transaction T1.
Step 5 - After successful completion of the Transaction T1, the Transaction manager of
S5 releases the lock by sending the unlock request to the lock-manager site S3.
Advantages:
Locking can be handled easily. We need two messages for lock (one for request, the
other for grant), and one message for unlock requests. Also, this method is simple as it
resembles the centralized database.
Deadlocks can be handled easily. The reason is, we have one lock manager who is
responsible for handling the lock requests.
Disadvantages:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

The lock-manager site becomes the bottleneck as it is the only site to handle all the lock
requests generated at all the sites in the system.
Highly vulnerable to single point-of-failure. If the lock-manager site failed, then we lose
the concurrency control.

Distributed Lock Manager - Concurrency Control in Distributed Database


Distributed Lock-Manager Approach In this approach, the function of lock-manager
is distributed over several sites. [Every DBMS server (site) has all the components
like Transaction Manager, Lock-Manager, Storage Manager, etc.] In Distributed Lock-Manager,
every site owns the data which is stored locally. This is true for a table that is fragmented into n
fragments and stored in n sites. In this case, every fragment is unique from every other
fragment and completely owned by the site in which it is stored. For those fragments, the local
Lock-Manager is responsible to handle lock and unlock requests generated by the same site or
by other sites. If the data stored in a site is replicated in other sites, then a site cannot own the
data completely. In such case, we cannot handle any lock request for a data item stored in a
site as the case of fragmented data. If we handle like fragmented data, it leads to inconsistency
problems as there are multiple copies stored in several sites. This case can be handled using
several protocols which are specifically designed for handling lock requests on replicated data.
The protocols are,
Primary Copy Protocol
Majority Based Protocol
Biased Protocol
Quorum Consensus Protocol

Advantages:
Simple implementation is required for the data which are fragmented. They can be
handled as in the case ofSingle Lock-Manager approach.
For replicated data, again the work can be distributed over several sites using one of the
above listed protocols.
Lock-Manager site is not the bottleneck as the work of lock-manager is distributed over
several sites.
Disadvantages:

Handling of Deadlock is difficult, because, a transaction T1 which acquired a lock on a


data item Q at site S1 may be waiting for lock on another data item R as site S2. This wait is
genuine or a deadlock has occurred is not easily identifiable.

Primary Copy Protocol - Distributed Lock Manager Approach :Primary Copy Distributed Lock
Manager Approach / Primary Copy based Distributed Concurrency Control Approach
Primary Copy Protocol:
Assume that we have the data item Q which is replicated in several sites and we choose one of
the replicas of data item Q as the Primary Copy (only one replica). The site which stores the
Primary Copy is designated as the Primary Site. Any lock requests generated for data item Q at

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

any sites must be routed to the Primary site. Primary site’s lock-manager is responsible for
handling lock requests, though there are other sites with same data item and local lock-
managers.
We can choose different sites as lock-manager sites for different data items.
How does Primary Copy protocol work?
Figure 1 shows the Primary Copy protocol implementation.
In the figure

Q, R, and S are different data items that are replicated.


Q is replicated in sites S1, S2, S3 and S5 (represented in blue colored text). Site S3 is
designated as Primary site for Q (represented in purple colored text).
R is replicated in sites S1, S2, S3, and S4. Site S6 is designated as Primary site for R.
S is replicated at sites S1, S2, S4, S5, and S6. Site S1 is designated as Primary site for S.

Then, the concurrency control mechanism works this way;

Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Even though the
data item available locally at site S5, the lock-manager of S5 cannot grant the lock. The reason
is, in our example, Site S3 is designated as primary site for Q. Hence, the request must be
routed to the site S3 by the Transaction manager of S5.
Step 2: S5 requests S3 for lock on Q. S5 sends lock request message to S3.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Step 3: If the lock on Q can be granted, S3 grants lock and send a message to S5.
On receiving lock grant, S5 executes the Transaction T1 (Executed on the data item Q available
locally. If no local copy, S3 has to execute the transaction in other sites where Q is available). Step 4:
On successful completion of Transaction, S5 sends unlock message to the Primary Site S3.

Note: If the transaction T1 writes the data item Q, then the changes must be forward to all the
sites where Q is replicated. If the transaction read the data item Q, then no problem.
Advantages:
Handling of concurrency control on replicated data is like unreplicated data. Simple
implementation.
Only 3 messages to handle lock and unlock requests (1 Lock request, 1 Granted, and 1
Unlockmessage) for both read and write.
Disadvantages:
Possible single point-of-failure. If the Primary Site of a data item, say Q fails, even
though the other sites with the same copy of Q available, the data item Q is inaccessible.

Majority Based Protocol - Distributed Lock Manager Concurrency Control


Majority Based Protocol:
Assume that we have the data item Q which is replicated in several sites and the Majority Based
protocol works as follows; A transaction which needs to lock data item Q has to request and
lock data item Q in half+one sites in which Q is replicated (i.e, majority of the sites in which Q is
replicated).
The lock-managers of all the sites in which Q is replicated are responsible for handling lock
and unlock requests locally individually.
Irrespective of the lock types (read or write, i.e, Shared or Exclusive), we need to lock half+one
sites.

Example:
Let us assume that Q is replicated in 6 sites. Then, we need to lock Q in 4 sites (half+one = 6/2 +
1 = 4). Whentransaction T1 sends the lock request message to those 4 sites, the lock-managers
of those sites have to grant the locks based on the usual lock procedure.
How does Majority Based protocol
work? Implementation:
Figure 1 show the implementation of Majority Based protocol.
In the figure,
Q, R, and S are the different data items.
Q is replicated in sites S1, S2, S3 and S6.
R is replicated in sites S1, S2, S3, and S4.
S is replicated at sites S1, S2, S4, S5, and S6.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 1 - Majority Based Protocol Implementation


Let us assume that Transaction T1 needs data item Q to be locked (either read or write mode).
Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Q is available in S1,
S2, S3 and the site S6. According to the protocol, T1 has to lock Q in half+one sites, i.e, in our
example, we need to lock any 3 out of 4 sites. Assume that we have chosen sites S1, S2, and S3.
Step 2: S5 requests S1, S2 and S3 for lock on Q. The lock request is represented in purple color.
Step 3: If the lock on Q can be granted, S1, S2, and S3 grant lock and send a message to S5.
On receiving lock grant, S5 executes the Transaction T1 (Executed on the data item Q which is
taken from any one of the locked sites). The lock grant is represented in green color.
Step 4: On successful completion of Transaction, S5 sends unlock message to all the sites S1, S2,
and S3. The unlock message is represented in blue color.

Note: If the transaction T1 writes the data item Q, then the changes must be forward to all the
sites where Q is replicated. If the transaction read the data item Q, then no problem.
Advantages:

Replicated data handled in decentralized manner. Hence, no single point-of-failure problem.

Disadvantages:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Implementation is complex. We need to send (n/2 + 1) lock request messages, (n/2 + 1)


lock grant messages, and (n/2 + 1) unlock messages, irrespective of lock requested (read or
write).
Both read and write involves same level of complexity.
Deadlock would occur. (To handle this deadlock, we may need to impose an order in
which the locks can be requested)

Points to note:
-Transaction can execute only after successful acquisition of locks on majority of the replicas. -
Needs to send more messages, 2(n/2+1) lock messages and (n/2+1) unlock messages, when
compared to Primary Copy protocol (2n+1 messages).
-Local lock-managers are responsible for granting or denying the locks on requested items.
-Not suitable for applications where read operation is frequent.

-When writing the data item, a transaction performs writes on all replicas.
-When handling with unreplicated data, both requests can be handled by requesting the site at
which the data item available

Biased Protocol - Distributed Lock Manager Concurrency Control


Biased Protocol Concurrency Control / Variants of Distributed Lock Manager based Concurrency
Control mechanism.
Biased Protocol
Biased protocol is one of the many protocols to handle concurrency control in distributed
database system, in case of replicated database. Few of the others are Primary copy, Majority
Based protocol, and Quorum Consensus protocol.
Protocol:
If a data item Q is replicated over n sites, then a read lock (Shared lock) request message must
be sent to any one of the sites in which Q is replicated and, a write lock (Exclusive lock) request
message must be sent to all the sites in which Q is replicated. The lock-managers of all the sites
in which Q is replicated are responsible for handling lock and unlock requests locally
individually. Example:
In the figures,
Q, R, and S are the different data items.
Q is replicated in sites S1, S2, S3 and S6.
R is replicated in sites S1, S2, S3, and
S4. S is replicated at sites S1, S2, S4, S5, and S6.
How does Biased protocol handle Shared and Exclusive
locks? Shared Lock (Read Lock)
Figure 1 shows Biased protocol implementation for handling read request (Shared lock).

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 1 - Biased Protocol for Read Lock

Let us assume that Transaction T1 needs data item Q.

Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Q is available in S1,
S2, S3 and the site S6. According to the protocol, T1 has to lock Q in any one site in which Q is
replicated, i.e, in our example, we need to lock any 1 out of 4 sites where Q is replicated.
Assume that we have chosen the site S3.
Step 2: S5 requests S3 for shared lock on Q. The lock request is represented in purple color.
Step 3: If the lock on Q can be granted, S3 can grant lock and send a message to S5.
On receiving lock grant, S5 executes the Transaction T1 (Reading can be done in the locked site,
in our case, it is S3).
Step 4: On successful completion of Transaction, S5 sends unlock message to the site S3.

Exclusive Lock (Write Lock)


Figure 2 shows Biased protocol implementation for handling write request (Exclusive lock).

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 2 - Biased Protocol for Write Lock

Let us assume that Transaction T1 needs data item Q. Q is available in S1, S2, S3 and the site S6.
Sites S4 and S5 do not have Q in them, are represented in red color

Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. According to the
protocol, T1 has to lock Q in all the sites in which Q is replicated, i.e, in our example, we need to
lock all the 4 sites where Q is replicated.
Step 2: S5 requests S1, S2, S3, and S6 for exclusive lock on Q. The lock request is represented in
purple color.
Step 3: If the lock on Q can be granted at every site, all the sites will respond with grant lock
message to S5. (If any one or more sites cannot grant, T1 cannot be continued)
On receiving lock grant, S5 executes the Transaction T1 (When writing the data item,
transaction performs writes on all replicas).
Step 4: On successful completion of Transaction, S5 sends unlock message to all sites S1, S2, S3,
and S6.
Advantages:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Read operation can be handled faster compared to Majority based protocol.If read
operations are performed frequently in an application, biased approach can be suggested.
Disadvantages:
Additional overhead on Write operation.
Implementation is complex. We need to send (n/2 + 1) lock request messages, (n/2 + 1)
lock grant messages, and (n/2 + 1) unlock messages for write operation.
Deadlock would occur as discussed in the Majority Based protocol.

Quorum Consensus Protocol - Distributed Lock Manager Concurrency Control


Quorum Consensus Protocol / One of the Concurrency Control mechanisms in Distributed Lock
Manager / Variants of Distributed Lock based Concurrency Control

Quorum Consensus Protocol


This is one of the distributed lock manager based concurrency control protocol in distributed
database systems. It works as follows;
1. The protocol assigns each site that have a replica with a weight.
2. For any data item, the protocol assigns a read quorum Qr and write quorum Qw. Here, Qr and
Qw are two integers (sum of weights of some sites). And, these two integers are chosen
according to the following conditions put together;
Qr + Qw > S – this rule avoids read-write conflict. (i.e, two transactions cannot read and
write concurrently)
2 * Qw > S – this rule avoids write-write conflict. (i.e, two transactions cannot write
concurrently)
Here, S is the total weight of all sites in which the data item replicated.
How do we perform read and write on replicas?
A transaction that needs a data item for reading purpose has to lock enough sites. That is, it has
lock sites with the sum of their weight >= Qr. Read quorum must always intersect with write
quorum.
A transaction that needs a data item for writing purpose has to lock enough sites. That is, it has
lock sites with the sum of their weight >= Qw.
Example:
(How does Quorum Consensus Protocol work?)
Let us assume a fully replicated distributed database with four sites S1, S2, S3, and S4.
1. According to the protocol, we need to assign a weight to every site. (This weight can be
chosen on many factors like the availability of the site, latency etc.). For simplicity, let us
assume the weight as 1 for all sites.
2. Let us choose the values for Qr and Qw as 2 and 3. Our total weight S is 4. And according to
the conditions, our Qr and Qw values are correct;
Qr + Qw > S => 2 + 3 > 4 True
2 * Qw > S => 2 * 3 > 4 True
3. Now, a transaction which needs a read lock on a data item has to lock 2 sites. A transaction
which needs a write lock on data item has to lock 3 sites.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

CASE 1
Read Quorum Qr = 2, Write Quorum Qw = 3, Site’s weight = 1, Total weight of sites S = 4
Lock Example Discussion
1. Read request has to
lock at least two replicas
Read Lock (2 sites in our example)
2. Any two sites can be
locked
1. Write request has to
lock at least three replicas
Write (3 sites in our example)
Lock

Note that, read quorum intersects with write quorum. That is, out of available 4 sites, in our
example, 3 sites to be locked for write and 2 sites to be locked for read. It ensures that no
two transactions can read and write at the same time.
CASE 2
Read Quorum Qr = 1, Write Quorum Qw = 4, Site’s weight = 1, Total weight of sites S = 4
1. Read lock requires one
site
Read Lock

2. Write lock requires 4


sites
Write
Lock

Note that, read requires any one site and write requires all the sites, which is actually the
implementation of Biased protocol. At the same time, if we make read and write quorum
with the same value 3, then it resembles the implementation of Majority based protocol.
That is the reason why Quorum Consensus protocol is mentioned as the generalization of the
above said techniques.

Points to note:

The Quorums must be chosen very carefully. That is, if read operations are frequent then we
would choose very small read quorum value so as to make read faster in available replicas and
so on.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

The chosen read quorum value must intersect the write quorum value to avoid read-write
conflict.

Deadlock handling in distributed databases


Deadlock Handling in Distributed Database / How do we handle deadlock in distributed
database? / Deadlock prevention and detection in distributed database / What is Global Wait-
for graph? / Why deadlock handling is difficult in distributed database? / Centralized deadlock
detection technique in distributed database
Deadlock Handling in Distributed Database
A Deadlock is a situation in which two or more transactions are waiting for the data items that
are held by and to be released by the other transactions. You can find more on the following in
the given links;
Deadlock
Deadlock Handling
Deadlock Prevention
Deadlock Detection
Deadlock Recovery

Deadlock in Distributed Databases


Like in the case of centralized database systems, distributed database systems also prone to
Deadlocks. In Distributed Database systems, we need to handle transactions differently. In
every site that are part of the distributed database, we have the transaction specific
components - transaction coordinators, transaction managers, lock managers, etc. Above all,
data might be owned by many sites, or replicated in many sites. Due to these reasons, deadlock
handling is bit tough job in Distributed Database.
Deadlock can be handled in two ways;
1. Deadlock prevention – it deals with preventing the deadlock before it occurs. It is harder in
centralized databasesystem as it involves more number of rollback and slows down the
transactions. In distributed database it would cause more problems because, it rollback more
transactions that are happening in more sites (not in a single server but possibly many servers).
2. Deadlock detection – it deals with detecting deadlock if one happened. In centralized
database systems, detection is easier compare to prevention. We have handled detection using
Wait-for graphs. In the case of distributed database, the main problem is where and how to
maintain the Wait-for graphs.

Deadlock detection technique in distributed database


We have handled deadlock detection in centralized database system using Wait-for graph. The
same can be used in distributed database. That is, we can maintain Local wait-for graphs in
every site. (How to construct wait-for graph can be referred here). If the local wait-for graph of
any site formed a cycle, then we would say that a deadlock has occurred.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

On the other hand, no cycles in any of the local wait-for graph does not mean no deadlock has
occurred. Let us discuss this point with local wait-for graph examples as shown below;

Figure 1 - Local wait-for graphs of SITE 1 and 2

Figure 1 shows the lock request status for transactions T1, T2, T3 and T4 in a distributed database
system. In the local wait-for graph of SITE 1, transaction T2 is waiting for transactions T1 and T3 to
finish. In SITE 2, transactions T3 is waiting for T4, and T4 is waiting for T2. From SITE 1 and SITE 2
local wait-for graphs, it is clear that transactions T2 and T3 are involved in both sites.
How it might be happened? For example, transaction T2 which is initiated at SITE 2 may need
some data items held by transactions T1 and T3 in SITE 1. Hence, SITE 2 forwards the request to
SITE 1. If the transactions are busy, then SITE 1 inserts edges T2 T1 and T2 T3 in its local wait-for
graph.
As another example, transaction T3 which is initiated at SITE 1 may need data items held by
transaction T4 at SITE 2. Hence, SITE 1 forwards the request to SITE 2. Based on the status of T4,
SITE 2 inserts an edge T3 T4 in its local wait-for graph.
You can observe from the local wait-for graphs of SITE 1 and SITE 2, there are no symptoms of
cycles. If we merge these two local wait-for graphs into a single wait-for graph, then we would
get the graph which is given in Figure 2, below. From Figure 2, it is clear that the union of two
local wait-for graphs have formed a cycle, which means deadlock has occurred. This merged
wait-for graph is called as Global wait-for graph.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 2 - Global wait-for graph

The approach used to handle deadlock detection in distributed database


is, Centralized deadlock detection approach. Click here
What is Deadlock? / Deadlock - Explained with examples / How to handle deadlocks?
What is Deadlock?
In Database Management System, Deadlock is part of discussion in Transaction Processing
Component. Deadlock is a situation where two or more transactions waiting for locks on some
data items which are locked by other transactions in an incompatible mode.
Here, incompatible mode would mean one of the following;
A read lock request for a data item which is locked in write mode A
write lock request for a data item which is locked in read mode A
write lock request for a data item which is locked in write mode.
Example:
Assume that two transactions T1 and T2 are needed data items A and B to be locked. In the lock
acquiring process, let us suppose T1 locked A successfully and T2 locked B successfully. For
successfully completing the transactions, T1 needs B also to be locked and T2 needs A. The
problem is T1 cannot release lock on A and T2 cannot release lock on B. This situation is called
deadlock. If you carefully observe this you would understand that we have formed a cycle
which leads to deadlock.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

1 (a)
1 (b)
Figure 1 - (a) Deadlock occurrence with two transactions, (b) deadlock occurrence with three
transactions
Real time example of Deadlock situation:

Let us assume two bank transactions, namely T1 and T2 as follows;


T1 – Transaction which transfers money, say Rs. 5000 from account A to account B. T1 needs to
lock both A and B in Write mode (Exclusive Lock). T1 is said to be completed if and only if it
successfully updates the old balance of A and B with a new balance and commits the
transaction. T1 would involve two update queries;
UPDATE ACCOUNT SET balance = balance -5000 WHERE account = ‘A’;
UPDATE ACCOUNT SET balance = balance +5000 WHERE account = ‘B’;

T2 – Transaction which updates all the accounts with yearly interest, say 5%. T2 need to lock all
the accounts in Write mode (Exclusive lock). T2 is said to be completed if and only if it
successfully updates the old balances of all the accounts with the new balances and commits
the transaction. T2 would involve one update query;
UPDATE ACCOUNT SET balance = balance + (balance*0.05);

Assume that the transactions are executed as follows;


T1 has started and acquired Write lock on account A. T1 can now debit the amount to be
transferred from account A and save the new value of A. (cannot commit the transaction T1 at
this stage)
T2 also has started at the same time and acquired Write lock on B along with other accounts. T2
can now update all the accounts with their interest amounts except account A. (because
account A is held by T1. Hence, T2 cannot commit as well at this stage).
At this stage, both T1 and T2 are waiting for each other which leads to deadlock. This is shown
in Figure 2.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Figure 2 - Deadlock situation example


What is the solution to handle deadlocks if occured?
The only solution is to pick up one of the Transactions and roll back the same.

Deadlock Handling Techniques in Database


What are the Deadlock handling techniques in database?
A set of transactions are considered to be in a deadlock state, if the transactions are waiting for
one another to release the data items needed for them that are held by others. In a deadlock
state no transaction will proceed.
The deadlock can be handled by rolling back a transaction which would be chosen as the victim.
Deadlock can be handled in the following ways;
Deadlock Prevention – this concept ensures that the system never enters a deadlock state. It
chooses the transaction which would probably cause the deadlock and rolls-back the
transaction. Deadlock Detection – this identifies the deadlock if any happened and recovers the
system from deadlock.
Deadlock Recovery - recovers the system from deadlock state. It chooses the identified
transaction which caused the deadlock, and rolls-back it.

Deadlock Prevention algorithms in Database


What are the deadlock prevention algorithms in database?
Notes and Tutorials on How to prevent deadlock? / How to prevent deadlock in DBMS? /
Deadlock prevention techniques in DBMS / Wait-die and Wound-wait in DBMS / Database
deadlock avoidance / Wait-die and Wound-wait example

The Deadlock prevention protocol prevents the system from deadlock through transaction
rollbacks. It chooses rollback over waiting for the lock whenever the wait could cause a
deadlock. In this approach we have the following two deadlock prevention algorithms; 1.Wait-
die
2.Wound-wait

Availability:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

One of the goals in distributed databases is high availability, i.e the database must function
almost all the time.
Since failures are more likely in large distributed systems, a distributed database must
continue functioning even when there are various types of failures.
The ability to continue functioning even during failures is referred as Robustness.
For a distributed system to be robust, it must detect failures, reconfigure the system so
that computation may continue , and recover when a processor or a link is repaired.

If Failure has occurred it must be reconfigured & continue with normal


operation Reconfiguration:

Abort all transactions that were active at a failed site
Making them wait could interfere with other transactions since they may
hold locks on other sites
However, in case only some replicas of a data item failed, it may be
possible to continue transactions that had accessed data at a failed site
(more on this later)
If replicated data items were at failed site, update system catalog to
remove them from the list of replicas.
This should be reversed when failed site recovers, but additional care
needs to be taken to bring values up to date


 server for some subsystem, an election must be held to
If a failed site was a central
determine the new server

E.g. name server, concurrency coordinator, global deadlock detector

Since network partition may not be distinguishable from site failure, the following
situations must be avoided
 
 Two ore more central servers elected in distinct partitions
 
More than one partition updates a replicated data item

Updates must be able to continue even if some sites are down

Solution: majority based approach

Cloud Based Databases : A database accessible to clients from the cloud and delivered to
users on demand via the Internet from a cloud database provider's servers. Also referred to
as Database-as-a-Service (DBaaS), cloud databases can use cloud computing to achieve
optimized scaling, high availability, multi-tenancy and effective resource allocation.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

There are two common deployment models: users can run databases on the cloud
independently, using a virtual machine image, or they can purchase access to a database
service, maintained by a cloud database provider called.

What is Cloud Database?

A database accessible to clients from the cloud and delivered to users on demand via the
Internet from a cloud database provider's servers. Also referred to as Database-as-a-
Service (DBaaS), cloud databases can use cloud computing to achieve optimized scaling,
high availability, multi-tenancy and effective resource allocation.

Cloud computing refers to the delivery of computing and storage capacity as a service
to a heterogeneous community of end-recipients. Cloud computing entrusts services
with a user’s data, software, and computation over a network.
Just as databases are required in traditional computing, they are also required in cloud
computing. A cloud database also referred to as Database-as-a-Service (DBaaS), is a
database that is accessible to end-recipients from the cloud and delivered to users on
demand from a cloud database provider’s servers via the Internet.

While a cloud database can be a traditional database such as a MySQL or SQL Server database
that has been adopted for cloud use, a native cloud database such as Xeround's MySQL Cloud
database tends to better equipped to optimally use cloud resources and to guarantee scalability
as well as availability and stability.

Cloud databases can offer significant advantages over their traditional counterparts, including
increased accessibility, automatic failover and fast automated recovery from failures,
automated on-the-go scaling, minimal investment and maintenance of in-house hardware, and
potentially better performance. At the same time, cloud databases have their share of
potential drawbacks, including security and privacy issues as well as the potential loss of or
inability to access critical data in the event of a disaster or bankruptcy of the cloud database
service provider.

There are two primary methods to run a database on the cloud:

Virtual machine Image - cloud platforms allow users to purchase virtual machine
instances for a limited time. It is possible to run a database on these virtual machines.
Users can either upload their own machine image with a database installed on it, or
use ready-made machine images that already include an optimized installation of a
database. For example, Oracle provides a ready-made machine image with an
[1]
installation of Oracle Database 11g Enterprise Edition on Amazon EC2 and
[2]
on Microsoft Azure.
Database as a service (DBaaS) - some cloud platforms offer options for using a database as
a service, without physically launching a virtual machine instance for the database. In this
configuration, application owners do not have to install and maintain the database

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

on their own. Instead, the database service provider takes responsibility for installing and
maintaining the database, and application owners pay according to their usage. For
example, Amazon Web Services provides three database services as part of its cloud
offering, SimpleDB, a NoSQL key-value store, Amazon Relational Database Service, an SQL-
based database service with a MySQL interface, and DynamoDB. Similarly, Microsoft offers
[3]
the Azure SQL Database service as part of its cloud offering.

Advantages of Cloud Database

Advantages of cloud databases over traditional databases include increased


accessibility, automatic failover and fast automated recovery from failures,
automated on-the-go scaling, minimal investment and maintenance of in-house
hardware, and potentially better performance.

Disadvantages of Cloud Database

Disadvantages of cloud databases over traditional databases include potential security


and privacy issues as well as the potential loss of or inability to access critical data in the
event of a disaster or bankruptcy of the cloud database service provider.
There are two common deployment methods for running a database on the cloud –
virtual machine image and DBaaS. Virtual machine image involves users running
databases on the cloud independently by purchasing virtual machine instances for a
limited time and either uploading their own machine image with a database installed on
it, or using ready-made machine images, such as those provided by Oracle, that already
include an optimized installation of a database. The DBaaS deployment method involves
users purchasing access to a database service from a cloud database provider.
Let’s look at the benefits you can expect from utilizing cloud computing in general, and
a cloud database in particular.

Time-to-Market
As the pace of business rapidly increases, while at the same time internal IT resources
remain in short supply at most companies, business managers are discovering an array
of cloud solutions they can easily apply to their business operation without requiring
the steps to acquire, install and maintain software. They can simply sign up for the
solution and begin using it right away.

Economics
Cloud computing lowers technology costs in two ways. First by significantly reducing the
need for IT experts and staff. The other is by efficiencies gained through shared multi-
tenant cloud environments that eliminate purchasing hardware equipment and
software licenses. Additionally, many services are month-to-month without long term
contracts, allowing businesses to easily apply these technologies “just-in-time” and
drop them when no longer needed.

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Scalability
When a technology is custom-built or brought in-house, the IT managers must build an
infrastructure that can withstand the highest point of usage or they risk their reputation
not being able to deliver at peak times. In contrast, cloud computing services typically allow
for on-demand scalability for peak times or sustained periods. IT no longer needs to over-
engineer solutions and infrastructure or sacrifice quality of service.

Empowerment
Cloud computing solutions typically have a web-based interface for users. They can be
accessed by employees, customers and partners no matter where they are. With a
cloud database, everyone gets to work with the same set of information and
spreadsheet chaos is a thing of a past.

Best Practices
To the extent that reputable service providers are utilized, customers can be assured
that best practices in terms of security, reliability, and monitoring are in place. The
grade of service offered by leading cloud computing vendors is expensive and difficult to
implement on your own.

Eliminate Code
In addition to all of the standard benefits of cloud computing, Caspio is designed for
business users to create their own web applications without coding and without
reliance on IT. The business users can conceive and drive their own requirements and
features; everything from database reports, web forms, process approvals, dashboards,
and even mobile apps.

Go Green
Last but not least, cloud computing is all about virtualization, multi-tenancy, and
shared resources that provide more service for the amount of energy expended when
compared to in-house, single tenant solutions.

Directory Systems

Typical kinds of directory information

l Employee information such as name, id, email, phone, office addr, ..

l Even personal information to be accessed from multiple places

e.g. Web browser bookmarks

White pages

l Entries organized by name or identifier

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

Meant for forward lookup to find more about an entry


Yellow pages

l Entries organized by properties

l For reverse lookup to find entries matching specific requirements

When directories are to be accessed across an organization

l Alternative 1: Web interface. Not great for programs

l Alternative 2: Specialized directory access protocols

Coupled with specialized user interfaces

Several applications using attributes of the same entry

Question: Why not use database protocols like ODBC/JDBC?

Answer:

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

l Simplified protocols for a limited type of data access, evolved parallel to


ODBC/JDBC

l Can be optimized to economically provide more applications with rapid access


to directory data in large distributed environments (Because directories are not
intended to provide as many functions as general-purpose relational databases.)

l Provide a nice hierarchical naming mechanism similar to file system directories

Data can be partitioned amongst multiple servers for different parts of


the hierarchy, yet give a single view to user

– E.g. different servers for Bell Labs Murray Hill and Bell Labs
Bangalore

l Directories may use databases as storage mechanism

Directory Access Protocols

Most commonly used directory access protocol:

l LDAP (Lightweight Directory Access Protocol) l

Simplified from earlier X.500 protocol

LDAP

l LDAP Data Model

l Data Manipulation

l Distributed Directory Trees

Directory Information Tree (DIT)

Entries organized into a directory information tree (DIT) according to their DNs

l Leaf level usually represent specific objects

l Internal node entries represent objects such as organizational units,


organizations or countries

l Children of a node inherit the DN of the parent, and add on RDNs

E.g. internal node with DN c=USA

Prof. Khandagale S.P. UNIT NO. 3


MCA Distributed Databases

– Children nodes have DN starting with c=USA and further RDNs


such as o or ou

DN of an entry can be generated by traversing path from root

l Leaf level can be an alias pointing to another entry

Entries can thus have more than one DN

– E.g. person in more than one organizational unit

Prof. Khandagale S.P. UNIT NO. 3

You might also like