Distributed Database
Distributed Database
1.1 Features
Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
1.2.1 Features
It is used to create, retrieve, update and delete distributed databases.
It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.
It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.
Need for Sharing of Data − The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common data.
Distributed database systems aid both these processing by providing synchronized data.
Database Recovery − One of the common techniques used in DDBMS is replication of data
across different sites. Replication of data automatically helps in data recovery if database in
any site is damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.
Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption
in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all
queries have to pass through the central computer for processing, which increases the response
time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not
feasible in centralized systems.
Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.
Data integrity − The need for updating data in multiple sites pose problems of data integrity.
The sites use identical DBMS or DBMS from the same vendor.
Each site is aware of all other sites and cooperates with other sites to process user requests.
Autonomous − Each database is independent that functions on its own. They are integrated
by a controlling application and use message passing to share data updates.
The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
Un-federated − The database systems employ a central coordinating module through which
the databases are accessed.
Distribution − It states the physical distribution of data across the different sites.
Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.
Multi-database Internal Level − Depicts the data distribution across different sites and
multi-database to local data mapping.
Local database Conceptual Level − Depicts local data organization at each site.
Local database Internal Level − Depicts physical data organization at each site.
Fully replicated
Partially replicated
Fragmented
Mixed
1.7.4 Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and
each fragment can be stored at different sites. This considers the fact that it seldom happens that all
data stored in a table is required at a given site. Moreover, fragmentation increases parallelism and
provides better disaster recovery. Here, there is only one copy of each fragment in the system, i.e.
no redundant data.
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially replicated
across the different sites according to the frequency of accessing the fragments.