Distributed Database Chapter 1 Modified
Distributed Database Chapter 1 Modified
Chapter 1
Introduction to
Distributed Database Management Systems
Contents
•Distributed Data Processing
•Concepts of Distributed Data Base Systems
•Data Delivery Alternatives
•Data Delivery Alternatives
•Distributed DBS Promises
•Distributed DBS Issues
•Review of Computer Networks
Distributed Data Processing
• Distributed data processing (or distributed computing) is a number of
autonomous processing elements that are interconnected by a computer network
and that cooperate in performing their assigned tasks.
➡ The “processing element” is a computing device that can execute a program
on its own.
➡ The term “distributed” can refer to:
✦ Processing logic - processing logic or processing elements are distributed
over the network
✦ Function -Various functions of a computer system could be delegated to
various pieces of hardware or software
✦ Data - Data used by a number of applications may be distributed to a
number of processing sites
✦ Control- The control of the execution of various tasks might be distributed
instead of being performed by one computer system. 3
Concepts of Distributed Data Base Systems
• A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.
• A distributed database management system (DDBMS) is the
software that manages the DDB and provides an access
mechanism that makes this distribution transparent to the users.
• Distributed database system (DDBS) = DDB + DDBMS
• DDBS makes distributed processing easier and more efficient
• DDBS technology is the integration of two opposed approaches to
data processing: database system and computer network
technologies.
4
Concepts of Distributed Data Base Systems(Cont.…)
• Database systems involves in data processing in which each application defines and
maintains its own data:
• It maintains its data to one in which the data are administered centrally:
5
Concepts of Distributed Data Base Systems(Cont.…)
• Computer networks, on the other hand, promotes a mode of work that goes against all
centralization efforts.
• The most important objective of the DDBS technology is integration, not centralization:
6
Concepts of Distributed Data Base Systems (Cont.…)
• A DDBS is not a system where, despite the existence of a network, the database
resides at only one node of the network:
➡ In this case, the database is centrally managed by one computer system (site
2 in the Figure) and all the requests are routed to that site.
7
Concepts of Distributed Data Base Systems(Cont.…)
• In DDBS, data are distributed among a number of sites:
8
Concepts of Distributed Data Base Systems (Cont.…)
• Implicit Assumptions:
➡ Data stored at a number of sites each site logically consists of a single processor.
➡ Processors at different sites are interconnected by a computer network no
multiprocessors
✦ parallel database systems
➡ Distributed database is a database, not a collection of files data logically related
as exhibited in the users’ access patterns
✦ relational data model
➡ D-DBMS is a full-fledged DBMS
9
Data Delivery Alternatives
• In distributed databases, data are “delivered” from the sites where they are
stored to where the query is posed.
• Data delivery alternatives can be characterized along three dimensions:
➡ delivery modes
✦ Pull-only
✦ Push-only
✦ Hybrid
➡ frequency
✦ Periodic
✦ Conditional
✦ Ad-hoc or irregular
➡ communication methods
✦ Unicast
✦ One-to-many
10
Data Delivery Alternatives (Cont.…)
• Delivery modes
➡ Pull-only: In this mode of data delivery, the transfer of data from servers to
clients is initiated by a client pull. When a client request is received at a server,
the server responds by locating the requested information.
✦ Conventional DBMSs offer primarily pull-based data delivery.
➡ Push-only: In this mode, the transfer of data from servers to clients is initiated
by a server push in the absence of any specific request from clients.
✦ Difficulty is in deciding which data would be of common interest, and when
to send them to clients
➡ Hybrid: combines the client-pull and server-push mechanisms.
✦ The continuous (or continual) query approach presents one way of hybrid
mode-namely, the transfer of information from servers to clients is first
initiated by a client pull (by posing the query), and the subsequent transfer of
updated information to clients is initiated by a server push.
11
Data Delivery Alternatives (Cont.…)
• Frequency
➡ Periodic: In periodic delivery, data are sent from the server to clients at regular
intervals. Both pull and push can be performed in periodic fashion.
✦ Periodic delivery is carried out on a regular and pre-specified repeating
schedule.
➡ Conditional: In conditional delivery, data are sent from servers whenever certain
conditions installed by clients in their profiles are satisfied.
✦ Conditional delivery is mostly used in the hybrid or push-only delivery systems.
➡ Ad-hoc: Ad-hoc delivery is irregular and is performed mostly in a pure pull-based
system.
✦ Data are pulled from servers to clients in an ad-hoc fashion whenever clients
request it. In contrast, periodic pull arises when a client uses polling to obtain
data from servers based on a regular period (schedule).
12
Data Delivery Alternatives (Cont.…)
•Communication Methods: These methods determine the
ways in which servers and clients communicate for
delivering information to clients. The alternatives are:
➡unicast and
➡one-to-many.
•In unicast, the communication from a server to a client is
one-to-one: the server sends data to one client using a
particular delivery mode with some frequency.
•In one-to-many, as the name implies, the server sends
data to a number of clients.
13
Distributed DBS Promises
Four fundamental promises/advantages of DDBS are:
Improved performance
(ASG indicates which employees have been assigned to which projects for what
duration with what responsibility)
20
Distributed DBS Promises (Cont.…)
Example (contd…):
21
Distributed DBS Promises (Cont.…)
Example (contd…):
• If all of this data are stored in a centralized DBMS, and we want to find out the names and
employees who worked on a project for more than 12 months, we would specify this using the
following SQL query:
22
Distributed DBS Promises (Cont.…)
Example (contd…):
• Under DDBS,
➡ we partition each of the relations and store each partition at a different site. This is
fragmentation.
➡ We can duplicate some of this data at other sites for performance and reliability
reasons. This is replication.
➡ The result is a distributed database which is fragmented and replicated
23
Distributed DBS Promises (Cont.…)
Reliability through Distributed Transaction
• A transaction is a basic unit of consistent and reliable computing,
consisting of a sequence of database operations executed as an
atomic action.
• It transforms a consistent database state to another consistent
database state even when a number of such transactions are
executed concurrently (sometimes called concurrency transparency),
and even when failures occur (also called failure atomicity).
•A DBMS that provides full transaction support guarantees that
concurrent execution of user transactions will not violate database
consistency in the face of system failures as long as each transaction
is correct, i.e., obeys the integrity rules specified on the database.24
Distributed DBS Promises (Cont.…)
Reliability through Distributed Transaction
•Distributed transactions execute at a number of sites at which
they access the local database.
•With full support for distributed transactions, user
applications can access a single logical image of the database
and rely on the distributed DBMS to ensure that their requests
will be executed correctly no matter what happens in the
system.
➡ User applications do not need to be concerned with coordinating
their accesses to individual local databases nor do they need to
worry about the possibility of site or communication link
failures during the execution of their transactions.
25
Distributed DBS Promises (Cont.…)
Improved Performance
• Two points of improved performance:
➡ a distributed DBMS fragments the conceptual database, enabling data
to be stored in close proximity to its points of use (also called data
localization).
➡ the inherent parallelism of distributed systems may be exploited for
✦ inter-query parallelism- results from the ability to execute multiple
queries at the same time
✦ intra-query parallelism- breaks up a single query into a number of
subqueries each of which is executed at a different site, accessing a
different part of the distributed database.
26
Distributed DBS Promises (Cont.…)
Easier System Expansion
•In a DDBS, it is much easier to accommodate increasing
database sizes.
•Expansion can be handled by adding processing and
storage power to the network.
➡One aspect of easier system expansion is economics.
✦ It costs much less to put together a system of
“smaller” computers with the equivalent power of a
single big machine.
27
Distributed DBS Issues
• The issues that arise in building a distributed DBMS are:
➡ Distributed Database Design
➡ Replication
➡ Additional Issues
28
Distributed DBS Issues (Cont.…)
Issues in Distributed Database Design: How the database and the
applications that run against it should be placed across the sites?
• Two basic alternatives to placing data:
➡ partitioned (or non-replicated) scheme: database is divided
into a number of disjoint partitions each of which is placed at a
different site.
➡ Replicated Scheme: Replicated designs can be either:
✦ fully replicated (also called fully duplicated) where the
entire database is stored at each site, or
✦ partially replicated (or partially duplicated) where each
partition of the database is stored at more than one site, but
not at all the sites.
29
Distributed DBS Issues (Cont.…)
Issues in Distributed Database Design:
•The two fundamental design issues are:
➡ Fragmentation: the separation of the database into partitions
called fragments, and
➡ Distribution: the optimum distribution of fragments.
•Research on Distributed Database Design Issue: Involves
mathematical programming in order to minimize the
combined cost of:
➡ storing the database,
➡ processing transactions against it, and
➡ message communication among sites. 30
Distributed DBS Issues (Cont.…)
Issues in Distributed Directory Management : How the directory
should be placed across the sites?
➡ A directory contains information (such as descriptions and
locations) about data items in the database.
•A directory may be
➡ global to the entire DDBS or
➡ local to each site
•A directory can be
➡ centralized at one site or
➡ distributed over several sites
31
Distributed DBS Issues (Cont.…)
Issues in Distributed Query Processing: How to decide on a strategy for
executing each query over the network in the most cost-effective way?
➡ Query processing deals with designing algorithms that analyze queries
and convert them into a series of data manipulation operations.
• The factors to be considered are:
➡ the distribution of data,
➡ communication costs, and
➡ lack of sufficient locally-available information
• The objective is to optimize the performance of executing the transaction,
subject to the above-mentioned constraints.
32
Distributed DBS Issues (Cont.…)
Issues in Distributed Concurrency Control:
Issue I: How to synchronize concurrent accesses to the
distributed database, such that the integrity of the database is
maintained?
Issue II: How to maintain the consistency of multiple copies of
the data base?
•Two fundamental primitives are:
➡ locking, which is based on the mutual exclusion of accesses
to data items, and
➡ timestamping, where the transaction executions are
ordered based on timestamps. 33
Distributed DBS Issues (Cont.…)
Issues in Distributed Deadlock Management:
➡How to manage the deadlocks that arise due to the
competition among users for access to the data, if the
synchronization mechanism is based on locking?
•The well-known alternatives that apply to DDBS are:
➡prevention,
➡avoidance, and
➡detection/recovery
34
Distributed DBS Issues (Cont.…)
Issues in Reliability in DDBMS:
•How to maintain consistency and up-to-date in the databases at
the operational sites, when a failure occurs and various sites
become either inoperable or inaccessible?
•How to recover and bring the databases at the failed sites up-to-
date when the computer system or network recovers from the
failure?
Issues in Replication:
•How to ensure the consistency of the replicas (i.e., copies of the
same data item have the same value), if the distributed database is
(partially or fully) replicated ?
35
Distributed DBS Issues (Cont.…)
Additional Issues in DDBS:
•Database design issues in multidatabase systems (i.e.,
database integration)
•Issues relating to peer-to-peer data management
•Issues that arise in web data management
•Data management issues in a parallel system
36
Review of Computer Networks
Computer Networks:
• An interconnected collection of autonomous
computers that are capable of exchanging
information among themselves.
• Components
➡ Hosts (nodes, end systems)
➡ Switches
➡ Communication link
Internet:
• Network of networks
37
Review of Computer Networks (Cont.…)
Types of Networks:
• According to scale (geographic distribution)
➡ Wide are network (WAN)
✦Distance between any two nodes > 20 km and can go as high as thousands
of kms
✦ Long delays due to distance traveled
✦ Heterogeneity of transmission media
✦ Speeds of 150 Mbps to 10 Gbps (OC192 on the backbone)
➡ Local area network (LAN)
✦ Limited in geographic scope (usually < 2km)
✦ Speeds 10-1000 Mbps
✦ Short delays and low noise
➡ Metropolitan area network (MAN)
✦ In between LAN and WAN 38
Review of Computer Networks (Cont.…)
Types of Networks (cont’d):
• Topology
➡ Irregular
✦ No regularity in the interconnection – e.g., Internet
➡ Bus
✦ Typical in LANs – Ethernet
✦ Using Carrier Sense Medium Access with Collision Detection (CSMA/CD)
✓ Listen before and while you transmit
➡ Star
➡ Ring
➡ Mesh
39
Review of Computer Networks (Cont.…)
Bus Network:
40
Review of Computer Networks (Cont.…)
Communication Schemes:
• Point-to-point (unicast)
➡ One or more (direct or indirect) links between each pair of nodes
➡ Communication always between two nodes
➡ Receiver and sender are identified by their addresses included in the message header
➡ Message may follow one of many links between the sender and receiver using
switching or routing
• Broadcast (multi-point)
➡ Messages are transmitted over a shared channel and received by all the nodes
➡ Each node checks the address and if it not the intended recipient, ignores
➡ Multi-cast: special case
✦ Message is sent to a subset of the nodes
41
Review of Computer Networks (Cont.…)
Communication Alternatives:
➡ Twisted pair
➡ Coaxial
➡ Satellite
➡ Microwave
➡ Wireless
42
Review of Computer Networks (Cont.…)
Data Communication:
• Hosts are connected by links, each of which can carry one or more channels
• Link: physical entity; channel: logical entity
• Digital signal versus analog signal
• Capacity – bandwidth
➡ The amount of information that can be transmitted over the channel in a given time unit
• Alternative messaging schemes
➡ Packet switching
✦ Messages are divided into fixed size packets, each of which is routed from the source
to the destination
➡ Circuit switching
✦ A dedicated channel is established between the sender and receiver for the duration of
the session
43
Review of Computer Networks (Cont.…)
Packet Formats:
44
Review of Computer Networks (Cont.…)
Communication Protocols:
• Software that ensures error-free, reliable and efficient communication
between hosts
• Layered architecture – hence protocol stack or protocol suite
• TCP/IP is the best-known one
➡ Used in the Internet
45
Review of Computer Networks (Cont.…)
Message Transmission using TCP/IP:
46
Review of Computer Networks (Cont.…)
TCP/IP Protocol:
47