0% found this document useful (0 votes)
19 views

Distributed Database

Uploaded by

Jane hiram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Distributed Database

Uploaded by

Jane hiram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1 Distributed Database

A distributed database is a collection of multiple interconnected databases, which are spread


physically across various locations that communicate via a computer network.

1.1 Features
 Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.

 Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.

 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.

 A distributed database is not a loosely connected file system.

 A distributed database incorporates transaction processing, but it is not synonymous with a


transaction processing system.

1.2 Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that manages
a distributed database in a manner as if it were all stored in a single location.

1.2.1 Features
 It is used to create, retrieve, update and delete distributed databases.

 It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.

 It ensures that the data modified at any site is universally updated.

 It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.

 It is designed for heterogeneous database platforms.

 It maintains confidentiality and data integrity of the databases.

1.2.2 Factors Encouraging DDBMS


The following factors encourage moving over to DDBMS −

 Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.

 Need for Sharing of Data − The multiple organizational units often need to communicate
with each other and share their data and resources. This demands common databases or
replicated databases that should be used in a synchronized manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common data.
Distributed database systems aid both these processing by providing synchronized data.

 Database Recovery − One of the common techniques used in DDBMS is replication of data
across different sites. Replication of data automatically helps in data recovery if database in
any site is damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.

 Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.

1.3 Advantages of Distributed Databases


Following are the advantages of distributed databases over centralized databases.

Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption
in current functions.

More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.

Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all
queries have to pass through the central computer for processing, which increases the response
time.

Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not
feasible in centralized systems.

1.4 Adversities of Distributed Databases


Following are some of the adversities associated with distributed databases.

 Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.

 Processing overhead − Even simple operations may require a large number of


communications and additional calculations to provide uniformity in data across the sites.

 Data integrity − The need for updating data in multiple sites pose problems of data integrity.

 Overheads for improper data distribution − Responsiveness of queries is largely dependent


upon proper data distribution. Improper data distribution often leads to very slow response
to user requests.
1.5 Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

1.5.1 Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −

 The sites use very similar software.

 The sites use identical DBMS or DBMS from the same vendor.

 Each site is aware of all other sites and cooperates with other sites to process user requests.

 The database is accessed through a single interface as if it is a single database.

1.5.1.1 Types of Homogeneous Distributed Database


There are two types of homogeneous distributed database −

 Autonomous − Each database is independent that functions on its own. They are integrated
by a controlling application and use message passing to share data updates.

 Non-autonomous − Data is distributed across the homogeneous nodes and a central or


master DBMS co-ordinates data updates across the sites.

1.5.2 Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −

 Different sites use dissimilar schemas and software.

 The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.

 Query processing is complex due to dissimilar schemas.

 Transaction processing is complex due to dissimilar software.


 A site may not be aware of other sites and so there is limited co-operation in processing user
requests.

1.5.2.1 Types of Heterogeneous Distributed Databases


 Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.

 Un-federated − The database systems employ a central coordinating module through which
the databases are accessed.

1.6 Distributed DBMS Architectures


DDBMS architectures are generally developed depending on three parameters −

 Distribution − It states the physical distribution of data across the different sites.

 Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.

 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system


components and databases.

1.6.1 Architectural Models


Some of the common architectural models are −

 Client - Server Architecture for DDBMS

 Peer - to - Peer Architecture for DDBMS

 Multi - DBMS Architecture

1.6.1.1 Client - Server Architecture for DDBMS


This is a two-level architecture where the functionality is divided into servers and clients. The server
functions primarily encompass data management, query processing, optimization and transaction
management. Client functions include mainly user interface. However, they have some functions like
consistency checking and transaction management.

The two different client - server architecture are −

 Single Server Multiple Client

 Multiple Server Multiple Client (shown in the following diagram)


1.6.1.2 Peer- to-Peer Architecture for DDBMS
In these systems, each peer acts both as a client and a server for imparting database services. The
peers share their resource with other peers and co-ordinate their activities.

This architecture generally has four levels of schemas −

 Global Conceptual Schema − Depicts the global logical view of data.

 Local Conceptual Schema − Depicts logical data organization at each site.

 Local Internal Schema − Depicts physical data organization at each site.

 External Schema − Depicts user view of data.


1.6.2 Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous database
systems.

Multi-DBMS can be expressed through six levels of schemas −

 Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.

 Multi-database Conceptual Level − Depicts integrated multi-database that comprises of


global logical multi-database structure definitions.

 Multi-database Internal Level − Depicts the data distribution across different sites and
multi-database to local data mapping.

 Local database View Level − Depicts public view of local data.

 Local database Conceptual Level − Depicts local data organization at each site.

 Local database Internal Level − Depicts physical data organization at each site.

There are two design alternatives for multi-DBMS −

 Model with multi-database conceptual level.

 Model without multi-database conceptual level.


1.7 Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −

 Non-replicated and non-fragmented

 Fully replicated

 Partially replicated

 Fragmented

 Mixed

1.7.1 Non-replicated & Non-fragmented


In this design alternative, different tables are placed at different sites. Data is placed so that it is at a
close proximity to the site where it is used most. It is most suitable for database systems where the
percentage of queries needed to join information in tables placed at different sites is low. If an
appropriate distribution strategy is adopted, then this design alternative helps to reduce the
communication cost during data processing.

1.7.2 Fully Replicated


In this design alternative, at each site, one copy of all the database tables is stored. Since, each site
has its own copy of the entire database, queries are very fast requiring negligible communication
cost. On the contrary, the massive redundancy in data requires huge cost during update operations.
Hence, this is suitable for systems where a large number of queries is required to be handled
whereas the number of database updates is low.

1.7.3 Partially Replicated


Copies of tables or portions of tables are stored at different sites. The distribution of the tables is
done in accordance to the frequency of access. This takes into consideration the fact that the
frequency of accessing the tables vary considerably from site to site. The number of copies of the
tables (or portions) depends on how frequently the access queries execute and the site which
generate the access queries.

1.7.4 Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and
each fragment can be stored at different sites. This considers the fact that it seldom happens that all
data stored in a table is required at a given site. Moreover, fragmentation increases parallelism and
provides better disaster recovery. Here, there is only one copy of each fragment in the system, i.e.
no redundant data.

The three fragmentation techniques are −

 Vertical fragmentation

 Horizontal fragmentation

 Hybrid fragmentation

 Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially replicated
across the different sites according to the frequency of accessing the fragments.

You might also like