Introduction-Distributed DBMS-1-26
Introduction-Distributed DBMS-1-26
• Introduction
➡ What is a distributed DBMS
➡ Distributed DBMS Architecture
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/1
File Systems
program 1
File 1
data description 1
program 2
data description 2 File 2
program 3
data description 3 File 3
description
Application
program 2 manipulation
(with data database
semantics) control
Application
program 3
(with data
semantics)
Database Computer
Technology Networks
integration distribution
Distributed
Database
Systems
integration
integration ≠ centralization
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/4
Distributed Computing
• A number of autonomous processing elements (not necessarily
homogeneous) that are interconnected by a computer network and that
cooperate in performing their assigned tasks.
• What is being distributed?
➡ Processing logic
➡ Function
➡ Data
➡ Control
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
• Frequency
➡ Periodic
➡ Conditional
➡ Ad-hoc or irregular
• Communication Methods
➡ Unicast
➡ One-to-many
Improved performance
➡ Replication transparency
➡ Fragmentation transparency
✦ horizontal fragmentation: selection
✦ vertical fragmentation: projection
✦ hybrid
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database
User
DBMS
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
• Replication transparency
• Fragmentation transparency
• Data replication
➡ Great for read-intensive workloads, problematic for updates
➡ Replication protocols
• Parallelism in execution
➡ Inter-query parallelism
➡ Intra-query parallelism
➡ Full replication
➡ Mutual consistency
➡ Freshness of copies
• Query Processing
➡ Convert user transactions to data manipulation instructions
➡ Optimization problem
✦ min{cost = data transmission + local processing}
➡ General formulation is NP-hard
➡ Deadlock management
• Reliability
➡ How to make the system resilient to failures
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/25
Related Issues
• Operating System Support
➡ Operating system with proper support for database operations
➡ Dichotomy between general purpose processing requirements and database
processing requirements
• Open Systems and Interoperability
➡ Distributed Multidatabase Systems
➡ More probable scenario
➡ Parallel issues