DISTRIBUTED DATABASE
SYSTEMS OVERVIEW
Presented By
Satrio Agung Wicaksono
Distributed Database Systems
Made up of database system and computer network
technologies
Databases management of data moves control of data
from applications to centralized and controlled access
systems (DBMSs)
Distributed Database Systems, Contd
Computer network technology emphasizes distributed
(non-central) control
Database systems seem to emphasize centralization
Network systems seem to emphasize distribution
Databases, however, are not really about centralizing the
management of data
Database management systems really integrate data and
supply a common access methodology to data
What is Distributed Processing?
It means many things to many people
Distributed function (single program distributed on multiple
processors)
Distributed computing (autonomous functions distributed on a
network)
Networks (independent of function)
Multiprocessors (multiple CPUs in the same computer)
etc.
In any computer, there is always some aspect of
distributed processing (e.g., CPU and I/O functions)
We need a better definition of distributed computing to
better understand distributed database computing
What is a Distributed Database System?
A collection of multiple, logically interrelated
databases distributed over a computer
network
A Distributed Database Management System
(DDBMS) is the software systems that
manages distributed databases and makes
the distribution transparent to the user
Central Database on a Network
DDBS Environment
What a Distributed Database Architecture IS!
Distributed database system consists of autonomous
databases at distributed nodes
Advantages of Distributed Database Systems
Transparent Management of Distributed and Replicated
Data
Transparency hides many of the lower-level implementation issues
Users and applications do not have to understand and manage the
distribution
The DDBMS appears as a single DBMS
A single query to the DDBMS database is translated to potentially
many queries on multiple DBMSs correctly
The effects of a single query on multiple databases are managed
consistently and automatically
The queries are semantically correct
The queries are executed in the right order
Data replication is handled properly and automatically
Advantages of Distributed Database Systems
(contd...)
Reliability Through Distributed Transactions
Maintains database consistency across multiple transactions
Multiple applications and users may execute sets of all-or-nothing
queries
Applications and users do not stepon each other
Each applications appears to be have the complete attentionof
the database
Effects of other user transactions are not noticed
This would be near impossible without a DDBMS
Advantages of Distributed Database Systems
(contd...)
Improved Performance
Algorithms are tuned for distribution
Database design tuned for distribution and usage patterns
Based on internal DDBMS statistics, efficient query plans are
calculated
Efficient query algorithms and optimal database design are beyond
the capability of most users
Even if the users have these capabilities, they do not have access
to internal DDBMS statistics to make effective design and query
choices
Advantages of Distributed Database Systems
(contd...)
Easier System Expansion
The next database node fits into a pre-existing architecture
Much of the database integration software is already in place
The new database node is managed consistently within the context
of the other database nodes
Without a DDBMS, all applications needing data at the new node
would need to be modified and tuned to the specifics of that
database
Access patterns
Query semantics
Query integrity
Transaction management
Complications Introduced by DDBMS
Essentially, this is what the course is about
It is assumed that we know how various features are implemented in a
single DMBS
We need more tools to handle the distribution
Data replication
For reliability and efficiency
Choose the site to retrieve from
Update modifies all copies
Failed sites need to be updated when they come on board
Synchronization of values at distributed sites
Software is inherently more complex
Distribution of control
Security
Design Issues
Distributed Database Design
Distributed Directory Management
Distributed Query Processing
Distributed Concurrency Control
Distributed Deadlock Management
Reliability of Distributed DBMS
Replication
Heterogeneous databases
Distributed Database Design
The question that is being addressed is how the database
and the applications that run against it should be placed
across the sites.
There are two basic alternatives to placing data:
partitioned (or non-replicated)
Replicated
The two fundamental design issues :
fragmentation,
distribution,
Distributed Directory Management
There are three levels of directories
Conceptual
Logical
Physical
Directories are consulted for most database operations
There are many issues that concern whether to distribute
or centralize the directories
Distributed Query Processing
The problem is how to decide on a strategy for executing
each query over the network in the most cost-effective
way, however cost is defined
The factors to be considered :
the distribution of data
communication costs
lack of sufficient locally-available information
Distributed Concurrency Control
Synchronization of access to distributed databases
Maintain integrity of the system
Single distributed database
Multiple copies of the database
Approaches
Pessimistic Concurrency Control
Optimistic Concurrency Control
Example approaches
Locking
Timestamps
Distributed Deadlock Management
Similar to Operating Systems deadlock management
Well known solutions
Prevention
Avoidance
Detection/recovery
Reliability of Distributed DBMS
Failure recovery among multiple sites
Make sure other systems are reliable and consistent
We will explore the ARIES algorithm
We will explore the Two Phase Commit (2PC) algorithm
Replication
If the distributed database is (partially or fully) replicated, it
is necessary to implement protocols that ensure the
consistency of the replicas,i.e., copies of the same data
item have the same value
Heterogeneous databases
Sometimes called multi-databases
Distributed databases are fully autonomous
Usually databases already exist and the distributed
database system integrates them
Requires translations among database systems with
canonical description of overall environment
Complimentary to distributed database systems