Distributed Databases
Distributed Databases
DATABASES
Distributed Database
Ex:
One bank have branches all over India &
its head office is in Delhi.
Assume bank maintains local data in
Local Branch and copy of data of all
branches at Delhi.
Data is distributed all over India.
This eases query processing for local
customers of a branch & also of a global
customer.
Mumbai
Delhi
Chennai
Bangalore
(Head Office)
Agra
Local
Branch
Local
Branch
Local
Branch
Local
Branch
Distributed DBMS.
Software system that permits the
management of the distributed database and
makes the distribution transparent to users.
Characterstics of DDBMS
Advantages of DDBMS
1. Data sharing
.If a number of different sites are connected
to each other, then a user at one site may be
able to access data that is available at another
site.
For example, in the distributed banking
system,
is possible for a user in one branch
2.
Local itAutonomy
toThe
access
dataadvantage
in another to
branch.
.
primary
accomplishing data
sharing by means of data distribution is that
each site is able to retain a degree of control
over data stored locally.
5. Modular Growth
Any time new nodes (computers) can be added to
the network without any difficulty.
6.Speedup Query Processing:
If a query involves data at several sites, it may be
possible to split the query into sub queries that
can be executed in parallel by several sites.
Such parallel computation allows for faster
processing of a users query.
In those cases in which data is replicated, queries
may be directed by the system to the least
heavily loaded sites.
Disadvantages of DDBMSs
Types of DDBMS
Homogeneous DDBMS
Heterogeneous DDBMS
All
Homogeneous Database
Identical DBMSs
Heterogeneous Distributed
Database Systems
19
Non-identical DBMSs
20
Distributed Database
Design
Fragmentation
Allocation
Relation may be divided into a number of subrelations, which are then distributed.
Each fragment is stored at site with "optimal"
distribution.
Replication
Data Fragmentation
Types of fragmentation:
Horizontal
Vertical
Mixed
Horizontal fragmentation
each tuple of r is assigned to one or more fragments
Example : relation account with following schema
Account = (account_number, branch_name , balance
)
account relation can be divided into several different
fragments,each of which consists of tuples of
accounts belonging to a particular branch.If the
banking system has only two branchesHillside and
Valleyviewthen there are two different fragments:
We reconstruct the relation r by taking the union of all fragments; that is,
r = r1 r2 r n
balance
Hillside
Hillside
Hillside
500
336
62
account1 = branch_name=Hillside
(account )
account_number branch_name
balance
A-177
A-402
A-408
A-639
Valleyview
Valleyview
Valleyview
Valleyview
account2 = branch_name=Valleyview
(account )
205
10000
1123
750
Vertical fragmentation:
tuple_id
Lowman
1
Hillside
Camp
2
Hillside
Camp
3
Valleyview
Kahn
4
Valleyview
Kahn
5
Hillside
Kahn
6
Valleyview
Green
7
Valleyview
deposit1 = branch_name, customer_name, tuple_id (employee_info )
account_number
balance
tuple_id
500
A-305
1
336
A-226
2
205
A-177
3
10000
A-402
4
62
A-155
5
1123
A-408
6
750
A-639
7
deposit2 = account_number, balance, tuple_id (employee_info )
Mixed Fragmentation
Combination of horizontal and vertical
strategies
Is also called hybrid or nesting
A horizontal fragment that is subsequently
vertically fragmented, or a vertical fragment
that is then horizontally fragmented.
Mixed fragmentation is defined using
select and project operation of relation
algebra
Original relation can be obtained by join
and union operation
Advantages of
Fragmentation
Horizontal:
allows parallel processing on fragments of a
relation
allows a relation to be split so that tuples are
located where they are most frequently
accessed
Vertical:
allows tuples to be split so that each part of
the tuple is stored where it is most frequently
accessed
tuple-id attribute allows efficient joining of
vertical fragments
Disadvantages:
Data Replication
Full replication
Partial
replication
database
site
No duplicate database fragments
Advantages of Replication
Availability: failure of site containing relation
r does not result in unavailability of r is
replicas exist.
Parallelism: queries on r may be processed
by several nodes in parallel.
Reduced
data transfer: relation r is
available locally at each site containing a
replica of r.
Disadvantages of Replication
Data allocation
Complete Replication
Selective Replication
Components of a DDBMS