0% found this document useful (0 votes)
100 views15 pages

Distributed Database: Database Database Management System Storage Devices CPU Computers Network

A distributed database consists of data stored across multiple computers or sites connected through a network. It allows for decentralized control of data and improved performance through parallel processing. However, maintaining consistency across distributed sites requires synchronization techniques like replication and duplication to propagate updates. While distributed databases offer benefits like increased reliability and scalability, they also introduce greater complexity for management, security and transaction processing compared to centralized databases.

Uploaded by

Inderjeet Bal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views15 pages

Distributed Database: Database Database Management System Storage Devices CPU Computers Network

A distributed database consists of data stored across multiple computers or sites connected through a network. It allows for decentralized control of data and improved performance through parallel processing. However, maintaining consistency across distributed sites requires synchronization techniques like replication and duplication to propagate updates. While distributed databases offer benefits like increased reliability and scalability, they also introduce greater complexity for management, security and transaction processing compared to centralized databases.

Uploaded by

Inderjeet Bal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Distributed database
A database that consists of two or more data files located at different sites on a computer network. Because the database
is distributed, different users canaccess it without interfering with one another. However, the DBMS must periodically
synchronize the scattered databases to make sure that they all have consistent data.

A distributed database is a database that is under the control of a central database management system (DBMS) in
which storage devicesare not all attached to a common CPU. It may be stored in multiple computers located in the
same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed database
can reside on network servers on the Internet, on corporate intranets or extranets, or on other company networks.
Replication and distribution of databases improve database performance at end-user worksites. [1] Template:Needs
clarification

To ensure that the distributive databases are up to date and current, there are two processes: replication and
duplication. Replication involves using specialized software that looks for changes in the distributive database. Once
the changes have been identified, the replication process makes all the databases look the same. The replication
process can be very complex and time consuming depending on the size and number of the distributive databases.
This process can also require a lot of time and computer resources. Duplication on the other hand is not as
complicated. It basically identifies one database as a master and then duplicates that database. The duplication
process is normally done at a set time after hours. This is to ensure that each distributed location has the same data.
In the duplication process, changes to the master database only are allowed. This is to ensure that local data will not
be overwritten. Both of the processes can keep the data current in all distributive locations.[2]

Besides distributed database replication and fragmentation, there are many other distributed database design
technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These
technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of
the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.

Basic architecture
A database User accesses the distributed database through:

Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.

A distributed database does not share main memory or disks.


Important considerations
Care with a distributed database must be taken to ensure the following:
2
• The distribution is transparent — users must be able to interact with the system as if it were one logical
system. This applies to the system's performance, and methods of access among other things.
• Transactions are transparent — each transaction must maintain database integrity across multiple databases.
Transactions must also be divided into subtransactions, each subtransaction affecting one database system.

Advantages of distributed databases

• Management of distributed data with different levels of transparency.


• Increase reliability and availability.
• Easier expansion.
• Reflects organizational structure — database fragments are located in the departments they relate to.
• Local autonomy — a department can control the data about them (as they are the ones familiar with it.)
• Protection of valuable data — if there were ever a catastrophic event such as a fire, all of the data would not
be in one place, but distributed in multiple locations.
• Improved performance — data is located near the site of greatest demand, and the database systems
themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on
one module of the database won't affect other modules of the database in a distributed database.)
• Economics — it costs less to create a network of smaller computers with the power of a single large
computer.
• Modularity — systems can be modified, added and removed from the distributed database without affecting
other modules (systems).
• Reliable transactions - Due to replication of database.
• Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence.
• Continuous operation...
• Distributed Query processing.
• Distributed Transaction management.

Single site failure does not affect performance of system. All transactions follow A.C.I.D. property: a-atomicity, the
transaction takes place as whole or not at all; c-consistency, maps one consistent DB state to another; i-isolation,
each transaction sees a consistent DB; d-durability, the results of a transaction must survive system failures. The
Merge Replication Method used to consolidate the data between databases.

Disadvantages of distributed databases

• Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is
transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one.
3
Extra database design work must also be done to account for the disconnected nature of the database — for
example, joins become prohibitively expensive when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure means extra labour costs.
• Security — remote database fragments must be secured, and they are not centralized so the remote sites
must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links
between remote sites).
• Difficult to maintain integrity — in a distributed database, enforcing integrity over a network may require too
much of the network's resources to be feasible.
• Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily
available experience on proper practice.
• Lack of standards — there are no tools or methodologies yet to help users convert a centralized DBMS into a
distributed DBMS.
• Database design more complex — besides of the normal difficulties, the design of a distributed database has
to consider fragmentation of data, allocation of fragments to specific sites and data replication.
• Additional software is required.
• Operating System should support distributed environment.
• Concurrency control: it is a major issue. It is solved by locking and timestamping.

Advantages and disadvantages


DDBMS has many advantages. Data is located near the greatest demand site, access is faster,
processing is faster due to several sites spreading out the work load, new sites can be added
quickly and easily, communication is improved, operating costs are reduced, it is user friendly,
there is less danger of a single-point failure, and it has process independence.

Several reasons why businesses and organizations move to distributed databases include
organizational and economic reasons, reliable and flexible interconnection of existing database,
and the future incremental growth. Companies believe that a decentralized, distributed data
database approach will adapt more naturally with the structure of the organizations. Distributed
database is more suitable solution when several database already exist in an organization. In
addition, the necessity of performing global application can be easily perform with distributed
database. If an organization grows by adding new relatively independent organizational units,
then the distributed database approach support a smooth incremental growth.

Data can physically reside nearest to where it is most often accessed, thus providing users with
local control of data that they interact with. This results in local autonomy of the data allowing
users to enforce locally the policies regarding access to their data.
4
One might want to consider a parallel architecture is to improve reliability and availability of
the data in a scalable system. In a distributed system, with some careful tact, it is possible to
access some, or possibly all of the data in a failure mode if there is sufficient data replication.

DDBMS also has a few disadvantages. Managing and controlling is complex, there is less
security because data is at so many different sites.

Distributed databases provides more flexible accesses that increase the chance of security
violations since the database can be accessed throughout every site within the network. For
many applications, it is important to provide secure. Present distributed database systems do not
provide adequate mechanisms to meet these objectives. Hence the solution requires the
operation of DDBMS capable of handling multilevel data. Such a system is also called a multi
level security distributed database management systems (MLS-DDBMS). MLS-DDBMS
provides a verification service for users who wish to share data in the database at different level
security. In MLS- DDBMS, every data item in the database has correlated with one of several
classifications or sensitivities.

The ability to ensure the integrity of the database in the presence of unpredictable failures of
both hardware and software components is also an important features of any distributed
database management systems. The integrity of a database is concerned with its consistency,
correctness, validity, and accuracy. The integrity controls must be built into the structure of
software, databases, and involved personnel.

If there are multiple copies of the same data, then this duplicated data introduces additional
complexity in ensuring that all copies are updated for each update. The notion of concurrency
control and recoverability consume much of the research efforts in the area of distributed
database theory. Increasing in reliability and performance is the goal and not the status quo.

Advantages

It reflects the organizational structure.

Organizations may be distributed across a wide geographic area. It is natural for databases to be set up to reflect
this. Local areas will keep local information and this allows local users to quickly access the local database. A
headquarters may also wish to make global inquiries to local data at the local regions.

Improved shareability.

This allows users at one site to access data stored at other sites. Data is placed close to the users who normally
access this data, it gives them local control and allows them to set up and establish local policies regarding the
data use. A global administrator is responsible for the entire system and should help at a local level to develop
and manage the dbms.

Improved availability and reliability.

Data in a centralized DB is inaccessible if there is a problem with the DB. However i a distributed site the local
data is only inaccessible and if replication is i force then all data may be available at another site.

Improved performance.

Since data is kept local then local access is much quicker than access to a centralized DB.
5
Disadvantages

Complexity

A DDBMS is much more complex than a centralized DB. If there are conflicts in hardware and software in use then
this may cause performance issues and the sited advantages may become disadvantages.

Cost.

It is much more expensive to setup and maintain a DDBMS. More hardware is required the network maintenance
is increased the communication costs increase and there will be additional labour costs.

Security.

It is much more difficult to maintain a secure network system across multiple locations. The network needs to be
made secure and access to replicated data needs to be maintained across multiple sites.

Distributed Databases
What is a DDBMS?

• A Distributed DB is a related collection of data just like a normal DB, but physically distributed over a network.

• The data is split into “fragments” on separate machines, each running a local DBMS.

• A Distributed DBMS is the software that manages distribution of data and processing in a fashion that is
invisible to users.

• Local DBMSs can access local data autonomously, or access remote data through the DDBMS and other
local DBMSs.

• Apps that only use local data are called “Local Applications”

• Apps that access remote data are called “Global Applications”

• To be a DDBMS, every local DBMS must participate in at least one Global App.

• Homogeneous DDBMSs use the same DBMS on the same platform on each site

• Heterogeneous DDBMSs use different DBMSs/Platforms and require gateways or other middle-ware to
convert queries/data models between sites

Advantages of a DDBMS:

• Realistically reflects decentralised organisational structures

• If well designed can strike an optimal balance between local speed and global access

• More failure-proof than a single DBMS


6
• Shared processing and I/O leads to better performance

• Scalability

Disadvantages of a DDBMS:

• Increased complexity and higher cost

• Networks compromises security

• Validity, consistency and integrity control becomes complicated

• It’s a new technology with few standards/best practices

Note:

• A DDBMS is not the same as “distributed processing” which is centralised DB accessible through a network
(rather than the data itself being distributed)

• A DDBMS is not always the same as a “parallel DBMS” which is a single DBMS using multiple
processors/multiple disks

DDBMS Components:

• Global External Schema


User Views. Provides logical data independence

• Global Conceptual Schema


Logical description of entire DB, including entities and relationships, constraints, domains, security etc.
Provides physical data independence

• Fragmentation and Allocation Schema


Describes how the data is partitioned and where it is stored

• 3 Tier Schemas for each Local DBMS


As normal, but instead of external schema, the top level is a mapping schema designed to be used to
communicate with the frag/alloc schema above.

• Local DBMS (LDBMS) – normal DBMS controlling local data

• Data Communications (DC) – network software

• Global System Catalogue – same as a normal systems catalogue, plus frag/alloc information

• Distributed DBMS (DDBMS) – main functional unit – transaction management, backup/recovery etc.

When designing a DDBMS we seek to maximise:

• Locality of reference (letting local apps hitting local data)

• Reliability and Accessibility (by strategically replicating the data)


7
• System Performance (by avoiding over/under utilisation of resources)

When designing a DDBMS we seek to minimise:

• Cost vs. Storage (affects replication strategy)

• Communication Costs (taking into account consistency maintenance)

Design Strategies with which to approach the problem:

• Centralised Data Storage – this is not a DDB at all. No replication so no additional storage costs.

• Fragmented Data Storage – no replication but data is split up and distributed. If done correctly, locality of
reference will be very high as is performance and storage and communications costs are low. Reliability and
accessibility are only ok.

Design Strategies with which to approach the problem:

• Fully Replicated Storage – every site hosts a full copy of the DB. Very high storage costs, improved
performance for read trans/low comm costs. Write trans low performance/high comms cost.

• Partially Replicated Storage – combination of above methods. It makes the most sense – if done right. You
have high locality, high reliability/access, good all-round performance, ok storage costs and low comm. costs.

Fragmentation:

• There are a few reasons why it makes sense to fragment data:

o By keeping data not required by local apps separate, we improve security

o By maximising locality of reference, we improve efficiency

o Since most apps only use subsets of relations, it makes sense to break the relations into subsets for
storage across the network.

o Done right, we open the door for parallel processing

• There are drawbacks, like increased integrity administration and performance hits for poorly fragmented data
sets

• For a fragmentation effort to be viable: it must be complete (all items in a relations must appear in at least one
fragment of the relation), functional dependencies must be preserved, and (other than primary keys)
fragments should be disjoint.

• Types of Fragmentation:

o Horizontal – break up relation into subsets of the tuples in that relation, based on a restriction on one
or more attributes. E.g. – we could break up a table with student info into one subset for undergrads
and one subset for postgrads.
8
o Vertical – breaking up a relation into subsets of attributes. E.g. – breaking up a hypothetical student
table into grade/course related columns and contact/personal related columns.

o Mixed – fragments the data multiple times, in different ways. We could do our postgrad/undergrad
split and then our grades/course split to each of the fragments

o Derived – fragmenting a relation to correspond with the fragmentation of another relation upon which
the 1st relation is dependent in some way.

Transparency:

• Distribution Transparency allows users to ignore the physical fragmentation of data, to varying degrees:

o Frag. Transparency is high level transparency where a user could write “SELECT * FROM Student
WHERE year = 2” without needing to specify what fragment of the Student relation contains the data,
nor where that fragment is stored.

o Location Transparency – mid level transparency where a user would need to write “SELECT * FROM
S14 WHERE year = 2” where S14 is the relevant fragment of the Student relation, but still wouldn’t
need to say where the fragment is stored.

o Local Mapping Transparency – low level transparency where a user would need to write “SELECT *
FROM S14 AT SITE 7 WHERE year = 2” where S14 is the relevant fragment of the Student relation
and SITE 7 is where the fragment is physically located.

o Distribution transparency is supported by a database name server which aliases unique database
object identifiers with user friendly names.

o Transaction transparency ensures integrity and consistency vis-à-vis multi-site transactions,


concurrent users and DB failure.

o Local transactions and remote single site transactions are handled without additional difficulty, but
multi-site transactions must be broken into sub-transactions (for each site) and the independence,
atomicity and durability of a centralised DBMS.

o Performance Transparency simply means DDBMS perform at the same level as a normal DBMS.

o This puts a lot of burden on the distributed query processor, which decides what fragment to hit,
which copy (if replicated), and which location to use, as well as calculating I/O time, CPU time and
communication costs.
DBMS Transparency means that a heterogeneous DDBMS will behave like a homogenous DDBMS.

Replication:

• Advantages – better access and reliability, improved performance

• Disadvantages – storage and consistency

• Replication Options:
9
o Synchronous Updates – all copy updates are part of one transactions commit phase - a lot of admin
overhead, comm. Costs and opportunity for failure but often necessary

o Asynchronous Updates – periodic updates of all copies based on one mater copy – violates the idea
of data independence but can be useful in situations where the cost of synch updates are
unwarranted.

• What we expect from replication management:

o Copy data, either synch or asynch from one db to another

o Scalability and mappability (in heterogeneous environments)

o Replication of procedures, indexes, schema etc.

o tools for DBAs to manage replication

• Replicated data ownership models:

o Master/slave: publish and subscribe model – authoritive changes are made only to the master site
and published to the slaves asynchronously

o Workflow: like M/S, but Master status moves from site to site depending on the task at hand

o Update-Anywhere: Symmetric, shared write authority for all replicas.

o Synchronous: We can synch up our replicas using regularly scheduled “snapshots” of the master
data, or database triggers (when X happens, do Y)

Distributed Database Concepts


This chapter describes the basic concepts and terminology of Oracle Database distributed
database architecture. It contains the following topics:

• Distributed Database Architecture


• Database Links
• Distributed Database Administration
• Transaction Processing in a Distributed System
• Distributed Database Application Development
• Character Set Support for Distributed Environments

Distributed Database Architecture


10
A distributed database system allows applications to access data from local and remote
databases. In a homogenous distributed database system, each database is an Oracle
Database. In a heterogeneous distributed database system, at least one of the databases is not
an Oracle Database. Distributed databases use a client/server architecture to process
information requests.

This section contains the following topics:

• Homogenous Distributed Database Systems


• Heterogeneous Distributed Database Systems
• Client/Server Database Architecture

Homogenous Distributed Database Systems

A homogenous distributed database system is a network of two or more Oracle Databases that
reside on one or more machines. Figure 29-1 illustrates a distributed system that connects three
databases: hq, mfg, and sales. An application can simultaneously access or modify the data in
several databases in a single distributed environment. For example, a single query from a
Manufacturing client on local database mfg can retrieve joined data from the products table on
the local database and the dept table on the remote hq database.

For a client application, the location and platform of the databases are transparent. You can also
create synonyms for remote objects in the distributed system so that users can access them with
the same syntax as local objects. For example, if you are connected to database mfg but want to
access data on database hq, creating a synonym on mfg for the remote dept table enables you to
issue this query:
SELECT * FROM dept;

In this way, a distributed system gives the appearance of native data access. Users on mfg do not
have to know that the data they access resides on remote databases.

Figure 29-1 Homogeneous Distributed Database


11

Description of the illustration admin046.gif(This illustration shows a homogeneous distributed


database. The figure shows three databases, HQ.ACME.COM, SALES.ACME.COM,
andMFG.ACME.COM. Each database is connected to a number of client systems at
Headquarters and the Sales and Manufacturing divisions, respectively.)

An Oracle Database distributed database system can incorporate Oracle Databases of different
versions. All supported releases of Oracle Database can participate in a distributed database
system. Nevertheless, the applications that work with the distributed database must understand
the functionality that is available at each node in the system. A distributed database application
cannot expect an Oracle7 database to understand the SQL extensions that are only available
with Oracle Database.

Distributed Databases Versus Distributed Processing

The terms distributed database and distributed processing are closely related, yet have
distinct meanings. There definitions are as follows:

• Distributed database

A set of databases in a distributed system that can appear to applications as a single data
source.

• Distributed processing
12
The operations that occurs when an application distributes its tasks among different
computers in a network. For example, a database application typically distributes front-
end presentation tasks to client computers and allows a back-end database server to
manage shared access to a database. Consequently, a distributed database application
processing system is more commonly referred to as a client/server database application
system.

Distributed database systems employ a distributed processing architecture. For example, an


Oracle Database server acts as a client when it requests data that another Oracle Database server
manages.

Distributed Databases Versus Replicated Databases

The terms distributed database system and database replication are related, yet distinct. In
a pure (that is, not replicated) distributed database, the system manages a single copy of all data
and supporting database objects. Typically, distributed database applications use distributed
transactions to access both local and remote data and modify the global database in real-time.

The term replication refers to the operation of copying and maintaining database objects in
multiple databases belonging to a distributed system. While replication relies on distributed
database technology, database replication offers applications benefits that are not possible
within a pure distributed database environment.

Most commonly, replication is used to improve local database performance and protect the
availability of applications because alternate data access options exist. For example, an
application may normally access a local database rather than a remote server to minimize
network traffic and achieve maximum performance. Furthermore, the application can continue
to function if the local server experiences a failure, but other servers with replicated data remain
accessible.

Heterogeneous Distributed Database Systems

In a heterogeneous distributed database system, at least one of the databases is a non-Oracle


Database system. To the application, the heterogeneous distributed database system appears as a
single, local, Oracle Database. The local Oracle Database server hides the distribution and
heterogeneity of the data.

The Oracle Database server accesses the non-Oracle Database system using Oracle
Heterogeneous Services in conjunction with an agent. If you access the non-Oracle Database
data store using an Oracle Transparent Gateway, then the agent is a system-specific application.
For example, if you include a Sybase database in an Oracle Database distributed system, then
13
you need to obtain a Sybase-specific transparent gateway so that the Oracle Database in the
system can communicate with it.

Alternatively, you can use generic connectivity to access non-Oracle Database data stores so
long as the non-Oracle Database system supports the ODBC or OLE DB protocols.

Heterogeneous Services

Heterogeneous Services (HS) is an integrated component within the Oracle Database server and
the enabling technology for the current suite of Oracle Transparent Gateway products. HS
provides the common architecture and administration mechanisms for Oracle Database gateway
products and other heterogeneous access facilities. Also, it provides upwardly compatible
functionality for users of most of the earlier Oracle Transparent Gateway releases.

Transparent Gateway Agents

For each non-Oracle Database system that you access, Heterogeneous Services can use a
transparent gateway agent to interface with the specified non-Oracle Database system. The
agent is specific to the non-Oracle Database system, so each type of system requires a different
agent.

The transparent gateway agent facilitates communication between Oracle Database and non-
Oracle Database systems and uses the Heterogeneous Services component in the Oracle
Database server. The agent executes SQL and transactional requests at the non-Oracle Database
system on behalf of the Oracle Database server.

Generic Connectivity

Generic connectivity enables you to connect to non-Oracle Database data stores by using either
a Heterogeneous Services ODBC agent or a Heterogeneous Services OLE DB agent. Both are
included with your Oracle product as a standard feature. Any data source compatible with the
ODBC or OLE DB standards can be accessed using a generic connectivity agent.

The advantage to generic connectivity is that it may not be required for you to purchase and
configure a separate system-specific agent. You use an ODBC or OLE DB driver that can
interface with the agent. However, some data access features are only available with transparent
gateway agents.
14
Client/Server Database Architecture

A database server is the Oracle software managing a database, and a client is an application that
requests information from a server. Each computer in a network is a node that can host one or
more databases. Each node in a distributed database system can act as a client, a server, or both,
depending on the situation.

In Figure 29-2, the host for the hq database is acting as a database server when a statement is
issued against its local data (for example, the second statement in each transaction issues a
statement against the local dept table), but is acting as a client when it issues a statement against
remote data (for example, the first statement in each transaction is issued against the remote
table emp in the sales database).

Figure 29-2 An Oracle Database Distributed Database System

Description of the illustration admin040.gif

A client can connect directly or indirectly to a database server. A direct connection occurs
when a client connects to a server and accesses information from a database contained on that
server. For example, if you connect to the hq database and access the dept table on this database
as in Figure 29-2, you can issue the following:
SELECT * FROM dept;
15
This query is direct because you are not accessing an object on a remote database.

In contrast, an indirect connection occurs when a client connects to a server and then accesses
information contained in a database on a different server. For example, if you connect to
the hq database but access the emp table on the remote sales database as in Figure 29-2, you can
issue the following:
SELECT * FROM emp@sales;

This query is indirect because the object you are accessing is not on the database to which you
are directly connected.

You might also like