COIT20247
DATABASE DESIGN& DEVELOPMENT
Module 7 – Distributed databases
OBJECTIVES
Define what is meant by a distributed database
Describe how this differs from a decentralised
database
Describe the difference between homogeneous
and heterogeneous distributed databases
Describe location transparency and local
autonomy
Describe options for distributing: replication
and partitioning.
2
DISTRIBUTED DATABASE
So far we have mostly used Access databases
Access databases consist of a single file in a
single location
However, it is not always ideal to have all data
stored at the same physical location
A distributed database is a single logical
database that is spread physically across
computers in multiple locations that are
connected by data communications.
3
SINGLE LOGICAL DATABASE
“Physically spread” means that different
subsets of data are stored on different servers
in different locations.
A “single logical database” means that:
The database should appear the same as a single
local database to the user/program.
Any user or program that accesses the database
should be unable to tell that the data is distributed.
Users/programs should not have to “navigate”
(provide a file or network path) to find the data
4
ARCHITECTURE
A DBMS runs at each physical site
Each DBMS manages the data at that site.
Each site has a subset (or possibly a complete
set) of the data in the database
The next two slides illustrate how data might be
distributed
5
EXAMPLE
Notice the horizontal partitioning?
6
EXAMPLE
From: McFadden, F., Hoffer, J. & Prescott, M. 1998, Modern Database Management, 4th edn, Addison-
Wesley, New Jersey.
7
HOMOGENEOUS VS HETEROGENEOUS
Note the DBMS at each slide at each site.
If each site uses the same DBMS (e.g. if each
site uses Oracle), it is known as a
homogeneous system
When the DBMSs are not all the same, it is
known as a heterogeneous system
8
DISTRIBUTED VS DECENTRALISED
“Distributed” is not the same as “decentralised”
Both types of databases are physically spread
In a distributed database, the users should not
be aware of the physical spread/location of the
data
In a de-centralised database, the users typically
have to provide a navigation path to the data.
9
DISTRIBUTED VS DECENTRALISED
Distributed database
Appears as one database to the user
Users should not normally be aware of the
location of any given data
Decentralised database:
Does not appear as one database to the user
User will have to manually navigate to data at
another site – will have to know where it is.
10
GOALS OF DISTRIBUTED DATABASE
Goals of a distributed database include:
Local autonomy
Location transparency
No reliance on central site
Continuous operation
Fragmentation independence
Replication independence
Optimised distributed query processing
11
GOALS OF DISTRIBUTED DATABASE
Goals continued:
Distributed transaction management
Hardware independence
Operating system independence
Network independence
DBMS independence
We will look at just these two:
Local autonomy
Location transparency
12
LOCATION TRANSPARENCY
Location transparency means that the user or
program need not know the location of the data
Any request for data is automatically forwarded
to the appropriate DBMS at the appropriate
site.
13
LOCAL AUTONOMY
Local autonomy means that a DBMS should
still continue to operate even if other nodes
have failed (obviously data from the failed node
may be unavailable).
Each site should have the capability to provide
local users access to local data, administer
security, log transactions etc, even when any
central or coordinating site is unavailable.
Means no reliance on a central site.
14
OPTIONS FOR DISTRIBUTING
Data can be distributed among nodes in a
number of ways:
Data replication
Horizontal partitioning
Vertical partitioning
Combinations of the above.
15
DATA REPLICATION
Data replication involves duplicating some
or all data at each site, e.g.:
16
DATA REPLICATION
Advantages include:
Fasterlocal access
Greater autonomy
Disadvantages include:
Difficulty maintaining consistent copies of the
data
17
PARTITIONING
Horizontal partitioning is where different rows
of a relation are distributed to different physical
locations (see for e.g. diagram on slide 6).
Vertical partitioning is where different
attributes of a relation are distributed to
different physical locations.
We discussed this in physical design lecture.
Concept is the same here.
18
PHYSICAL DISTRIBUTION
It is important to remember that such
partitioning or replication occurs at the physical
level just like the partitioning described in the
physical design lecture.
The users should always see a complete table.
19
SUMMARY
A distributed database is one that appears as a
single local database to the user but is stored
across different physical locations.
A distributed database appears as a single
database to the user, a decentralised database
does not appear as a single database.
Two goals of a distributed database are local
autonomy and location transparency.
20
SUMMARY
A homogeneous distributed database uses the
same DBMS at each site
A hetereogeneous distributed database does
not use the same DBMS at each site.
Options for distributed a database include:
Data replication
Horizontal partitioning
Vertical partitioning
Combinations of these.
21