Adt Unit I
Adt Unit I
Reliability: In distributed database system, if one system fails down or stops working for
some time another system can complete the task.
Availability: In distributed database system reliability can be achieved even if sever fails
down. Another system is available to serve the client request.
Performance: Performance can be achieved by distributing database over different locations.
So the databases are available to every location which is easy to maintain.
A client server architecture has a number of clients and a few servers connected in a
network.
A client sends a query to one of the servers. The earliest available server solves it and
replies.
A Client-server architecture is simple to implement and execute due to centralized
server system.
3. Middleware architecture.
Middleware architectures are designed in such a way that single query is executed on
multiple servers.
This system needs only one server which is capable of managing queries and
transactions from multiple servers.
Middleware architecture uses local servers to handle local queries and transactions.
The softwares are used for execution of queries and transactions across one or more
independent database servers, this type of software is called as middleware.
What is fragmentation?
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the
table are called fragments. Fragmentation can be of three types: horizontal, vertical, and
hybrid (combination of horizontal and vertical). Horizontal fragmentation can further be
classified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from
the fragments. This is needed so that the original table can be reconstructed from the
fragments whenever required. This requirement is called “reconstructiveness.”
The process of dividing the database into a smaller multiple parts is called
as fragmentation.
These fragments may be stored at different locations.
The data fragmentation process should be carrried out in such a way that the
reconstruction of original database from the fragments is possible.
Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database system is
increased.
Local query optimization techniques are sufficient for most queries since data is
locally available.
Since irrelevant data is not available at the sites, security and privacy of the database
system can be maintained.
Disadvantages of Fragmentation
When data from different fragments are required, the access speeds may be very low.
In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
Lack of back-up copies of data in different sites may render the database ineffective
in case of failure of a site.
Example:
Account (Acc_No, Balance, Branch_Name, Type).
In this example if values are inserted in table Branch_Name as Pune, Baroda, Delhi.
Example:
For the above table we can define any simple condition like, Branch_Name= 'Pune',
Branch_Name= 'Delhi', Balance < 50,000
Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Pune' AND Balance < 50,000
Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
2) Derived horizontal fragmentation
Fragmentation derived from the primary relation is called as derived horizontal
fragmentation.
Fragmentation1:
SELECT * FROM Account WHERE Branch_Name= 'Baroda' AND Balance < 50,000
Fragmentation2:
SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
2. Vertical Fragmentation
Example:
Fragmentation1:
SELECT * FROM Acc_NO
Fragmentation2:
SELECT * FROM Balance
Complete vertical fragmentation
The complete vertical fragmentation generates a set of vertical fragments, which can
include all the attributes of original relation.
Reconstruction of vertical fragmentation is performed by using Full Outer
Join operation on fragments.
3) Hybrid Fragmentation
Fragmentation1:
SELECT * FROM Emp_Name WHERE Emp_Age < 40
Fragmentation2:
SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000
Data replication is the process in which the data is copied at multiple locations (Different
computers or servers) to improve the availability of data.
Data replication is the process of storing separate copies of the database at two or more sites.
It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication
Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
Reduction in Network Load − Since local copies of data are available, query
processing can be done with reduced network usage, particularly during prime hours.
Data updating can be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become
simpler in nature.
Disadvantages of Data Replication
Increased Storage Requirements − Maintaining multiple copies of data is associated
with increased storage costs. The storage space required is in multiples of the storage
required for a centralized system.
Increased Cost and Complexity of Data Updating − Each time a data item is
updated, the update needs to be reflected in all the copies of the data at the different
sites. This requires complex synchronization techniques and protocols.
Undesirable Application – Database coupling − If complex update mechanisms are
not used, removing data inconsistency requires complex co-ordination at application
level. This results in undesirable application – database coupling.
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some changes are
made in the relation table. So there is no difference between original data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the
database.
Replication Schemes
1. Full Replication
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
is required to be handled whereas the number of database updates is low.
In full replication scheme, the database is available to almost every location or user in
communication network.
2. No Replication
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
3. Partial replication
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact
that the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
the site which generate the access queries.
Partial replication means only some fragments are replicated from the database.
Various factors which are considered while processing a query are as follows:
Costs of Data transfer
This is a very important factor while processing queries. The intermediate data is
transferred to other location for data processing and the final result will be sent to the
location where the actual query is processing.
The cost of data increases if the locations are connected via high performance
communicating channel.
The DDBMS query optimization algorithms are used to minimize the cost of data
transfer.
Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,
and Durability.
Atomicity − This property states that a transaction is an atomic unit of
processing, that is, either it is performed in its entirety or not performed at all.
No partial update should exist.
Consistency − A transaction should take the database from one consistent state
to another consistent state. It should not adversely affect any data item in the
database.
Isolation − A transaction should be executed as if it is the only one in the
system. There should not be any interference from the other concurrent
transactions that are simultaneously running.
Durability − If a committed transaction brings about a change, that change
should be durable in the database and not lost in case of any failure.
Distributed Transactions
For example:
Consider that, location A sends message to location B and expects response from B but B is
unable to receive it. There are several problems for this situation which are as follows.
COMMIT PROTOCOL
In a local database system, for committing a transaction, the transaction manager has to only
convey the decision to commit to the recovery manager. However, in a distributed system,
the transaction manager should convey the decision to commit to all the servers in the
various sites where the transaction is being executed and uniformly enforce the decision.
When processing is complete at each site, it reaches the partially committed transaction state
and waits for all other transactions to reach their partially committed states. When it receives
the message that all the sites are ready to commit, it starts to commit. In a distributed system,
either all sites commit or none of them does.
The different distributed commit protocols are −
One-phase commit
Two-phase commit
Three-phase commit
Distributed One-phase Commit
Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a
controlling site and a number of slave sites where the transaction is being executed. The
steps in distributed commit are −
After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site.
The slaves wait for “Commit” or “Abort” message from the controlling site. This
waiting time is called window of vulnerability.
When the controlling site receives “DONE” message from each slave, it makes a
decision to commit or abort. This is called the commit point. Then, it sends this
message to all the slaves.
On receiving this message, a slave either commits or aborts and then sends an
acknowledgement message to the controlling site.
Commit request:
In commit phase the coordinator attempts to prepare all cohorts and take necessary
steps to commit or terminate the transactions.
Commit phase:
The commit phase is based on voting of cohorts and the coordinator decides to
commit or terminate the transaction.
The steps performed in the two phases are as follows −
Phase 1: Prepare Phase
After each slave has locally completed its transaction, it sends a “DONE”
message to the controlling site. When the controlling site has received
“DONE” message from all slaves, it sends a “Prepare” message to the slaves.
The slaves vote on whether they still want to commit or not. If a slave wants to
commit, it sends a “Ready” message.
A slave that does not want to commit sends a “Not Ready” message. This may
happen when the slave has conflicting concurrent transactions or there is a
timeout.
Phase 2: Commit/Abort Phase
After the controlling site has received “Ready” message from all the slaves −
o The controlling site sends a “Global Commit” message to the
slaves.
o The slaves apply the transaction and send a “Commit ACK”
message to the controlling site.
o When the controlling site receives “Commit ACK” message
from all the slaves, it considers the transaction as committed.
After the controlling site has received the first “Not Ready” message from any
slave −
o The controlling site sends a “Global Abort” message to the
slaves.
o The slaves abort the transaction and send a “Abort ACK”
message to the controlling site.
o When the controlling site receives “Abort ACK” message from
all the slaves, it considers the transaction as aborted.
Some problems which occur while accessing the database are as follows:
4. Distributed commit
While committing a transaction which is accessing databases stored on multiple locations, if
failure occurs on some location during the commit process then this problem is called as
distributed commit.
5. Distributed deadlock
Deadlock can occur at several locations due to recovery problem and concurrency problem
(multiple locations are accessing same system in the communication network).
There are three different ways of making distinguish copy of data by applying:
1) Lock based protocol
A lock is applied to avoid concurrency problem between two transaction in such a way that
the lock is applied on one transaction and other transaction can access it only when the lock is
released. The lock is applied on write or read operations. It is an important method to avoid
deadlock.
2) Shared lock system (Read lock)
The transaction can activate shared lock on data to read its content. The lock is shared in such
a way that any other transaction can activate the shared lock on the same data for reading
purpose.
3) Exclusive lock
The transaction can activate exclusive lock on a data to read and write operation. In this
system, no other transaction can activate any kind of lock on that same data.