0% found this document useful (0 votes)
12 views

Distributed Database System

Uploaded by

dhyanimishti12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Distributed Database System

Uploaded by

dhyanimishti12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Distributed Database System

A distributed database is a database that is not limited to one computer system. It is like a database that
consists of two or more files located in different computers or sites either on the same network or on an
entirely different network. Instead of storing all of the data in one database, data is divided and stored at
different locations or sites which do not share any physical component.

A distributed database is a database system that spans multiple computers or nodes that are connected by a
network. Each node in a distributed database can store a portion of the data, and the entire database is made up
of the sum of the data stored on each node.

In a distributed database, data is stored and processed in a distributed manner, and the system ensures that the
data remains consistent and available to users despite network failures or other system errors. The primary
goal of a distributed database is to provide high availability, scalability, and performance to applications that
require access to large amounts of data.

Databases can be broadly classified into two types, namely Distributed and Centralized databases. The
question here is why do we even need a Distributed Database?. Let's assume for a moment that we have only
centralized databases.

o We will be inserting all the data into one single database. Making it too large so that it will take a lot
of time to query a single piece of record.
o Once a fault occurs, we no longer be able to serve user requests as we have only one database.
o No scaling is possible even if we wanted to and availability is also less which in turn affects the
throughput.

Distributed databases resolve various issues, such


 Availability,
 Fault tolerance,
 Throughput,
 latency,
 Scalability
That's why we need distributed databases.
Distributed Databases

 Though there are many distributed databases to choose from, some examples of distributed databases
include Apache Ignite, Apache Cassandra, Apache HBase, Amazon SimpleDB, Clusterpoint,
and FoundationDB.

Features of Distributed Databases

In general, distributed databases include the following features:


1. Location independency: Data is independently stored at multiple sites and managed by
independent Distributed database management systems (DDBMS).
2. Network linking: All distributed databases in a collection are linked by a network and communicate
with each other.
3. Distributed query processing: Distributed query processing is the procedure of answering queries
(which means mainly read operations on large data sets) in a distributed environment.
o Query processing involves the transformation of a high-level query (e.g., formulated in SQL)
into a query execution plan (consisting of lower-level query operators in some variation
of relational algebra) as well as the execution of this plan.
4. Hardware independent: The different sites where data is stored are hardware-independent. There is
no physical contact between these Distributed Database In Dbms which is accomplished often through
virtualization.
5. Distributed transaction management: Distributed Database In Dbms provides a consistent
distribution through commit protocols, distributed recovery methods, and distributed concurrency
control techniques in case of many transaction failures.

Types of Distributed Database

There are two types of distributed databases:

 Homogeneous distributed database.


 Heterogeneous distributed database.

Homogenous Distributed Database

 A Homogenous distributed database is a network of identical databases stored on multiple sites. All
databases stores data identically, the operating system, DDBMS and the data structures used – all are
same at all sites, making them easy to manage.
 Below is a diagram for the same,

Heterogeneous Distributed Database

 It is the opposite of a Homogenous distributed database. It uses different schemas, operating


systems, DDBMS, and different data models causing it difficult to manage.
 In the case of a Heterogeneous distributed database, a particular site can be completely unaware of
other sites. This causes limited cooperation in processing user requests; this is why translations are
required to establish communication between sites.
 Below is a diagram for the same,

Note: Heterogeneous DDMS have local users while Homogenous DDMS does not have local users

Distributed Data Storage

In this section we will talk about data stored at different sites in distributed database management system.

 There are two ways in which data can be stored at different sites. These are,
1. Replication.
2. Fragmentation.

Replication

 As the name suggests the system stores copies of data at different sites. If an entire database is
available on multiple sites, it is a fully redundant database.
 The advantage of data replication is that it increases availability of data on different sites. As the data
is available at different sites, queries can be processed parallelly.
 However, data replication has some disadvantages as well. Data needs to be constantly updated and
synchronized with other sites, if any site fails to achieve it then it will lead to inconsistencies in the
database. Availability of data is highly benefitted from Replication.
 Constant updation complicates concurrency control and it is also overhead for the servers.

Fragmentation

 In Fragmentation, the relations are fragmented, which means they are split into smaller parts. Each
of the fragments is stored on a different site, where it is required. In this, the data is not replicated,
and no copies are created. Consistency of data is highly benefitted from Fragmentation.
 The prerequisite for fragmentation is to make sure that the fragments can later be reconstructed into
the original relation without losing any data.
 Consistency is not a problem here as each site has a different piece of information.
 There are two types of fragmentation,
o Horizontal Fragmentation – Splitting by rows.
o Vertical fragmentation – Splitting by columns.

Horizontal Fragmentation (or Sharding)

 The relation schema is fragmented into group of rows, and each group is then assigned to one
fragment.

Vertical Fragmentation

 The relation schema is fragmented into group of columns, called smaller schemas. These smaller
schemas are then assigned to each fragment.
 Each fragment must contain a common candidate key to guarantee a lossless join.

Advantages of Distributed Database

1. Better Reliability: Distributed databases offers better reliability than centralized databases. When
database failure occurs in a centralized database, the system comes to a complete stop. But in the case
of distributed databases, the system functions even when a failure occurs, only performance-related
issues occur which are negotiable.
2. Modular Development: It implies that the system can be expanded by adding new computers and
local data to the new site and connecting them to the distributed system without interruption.
3. Lower Communication Cost: Locally storing data reduces communication costs for data
manipulation in distributed databases. In centralized databases, local storage is not possible.
4. Better Response Time: As the data is distributed efficiently in distributed databases, this provides a
better response time when user queries are met locally. While in the case of centralized databases, all
of the queries have to pass through the central machine which increases response time.

Disadvantages of Distributed Database

1. Costly Software: Maintaining a distributed database is costly because we need to ensure data
transparency, coordination across multiple sites which requires costly software.
2. Large Overhead: Many operations on multiple sites require complex and numerous
calculations, causing a lot of processing overhead.
3. Improper Data Distribution: If data is not properly distributed across different sites, then
responsiveness to user requests is affected. This in turn increases the response time.
Distributed Directory System

A distributed directory is directory environment in which data is partitioned across multiple directory servers.
To make the distributed directory appear as a single directory to client applications, one or more proxy servers
are provided which have knowledge of all the servers and the data they hold.

Proxy servers distribute incoming requests to the proper servers and gather the results to return a unified
response to the client. A set of backend servers hold their portions of the distributed directory. These backend
servers are basically standard AD (Active Directory), LDAP (Lightweight Directory Access Protocol) servers
with additional support for the proxy server to issue requests on behalf of user that might be defined in a
different server, or belong to groups that are defined on different servers.

You might also like