0% found this document useful (0 votes)
82 views41 pages

Coronel DatabaseSystems 13e Ch12

Uploaded by

Arif.hossen 30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views41 pages

Coronel DatabaseSystems 13e Ch12

Uploaded by

Arif.hossen 30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter 12

Distributed Database Management


Systems
Learning Objectives

• After completing this chapter, you will be able to:


• Explain the purpose and function of distributed database management systems
(DDBMSs)
• Summarize the advantages and disadvantages of DDBMSs
• Describe the characteristics and components of DDBMSs
• Explain how database implementation is affected by different levels of data and
process distribution
• Understand how transactions are managed in a distributed database environment
• Describe how distributed database design balances performance, scalability, and
availability
• Explain the trade-offs of implementing a distributed data system

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 2
for classroom use.
The Evolution of Distributed Database Management Systems (1 of 5)

• A distributed database management system (DDBMS)


• Governs storage and processing of logically related data over interconnected
computer systems
• Distributes data and processing functions among several sites
• Centralized database management system
• Required corporate data be stored in a single central site
• Provided data access through dumb terminals
• Filled structured information needs of corporations; fell short when quickly moving
events required faster response times and equally quick access to information

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 3
for classroom use.
The Evolution of Distributed Database Management Systems (2 of 5)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 4
for classroom use.
The Evolution of Distributed Database Management Systems (3 of 5)

• Changes that affected the nature of systems


• Globalization of business operation
• Increased market needs for an on-demand transaction style, based on web-based
services
• Rapid social and technological changes fueled by low-cost smart mobile devices
• Converging data realms in the digital world
• Advent of social media to reach new customers and markets

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 5
for classroom use.
The Evolution of Distributed Database Management Systems (4 of 5)

• Database requirements in a dynamic business environment


• Rapid ad hoc data access
- Crucial in the quick-response decision-making environment
• Distributed data access
- Needed to support geographically dispersed business units

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 6
for classroom use.
The Evolution of Distributed Database Management Systems (5 of 5)

• Factors that influenced DDBMS


• Acceptance of Internet as a platform for business
• Mobile wireless revolution
• Growth of use of “application as a service”
• Focus on mobile business intelligence
• Emphasis on Big Data analytics
• Potential centralized DBMS problems
• Performance degradation
• High costs
• Reliability problems
• Scalability problems
• Organizational rigidity

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 7
for classroom use.
DDBMS Advantages and Disadvantages
Table 12.1: Distributed DBMS Advantages and Disadvantages
Advantages Disadvantages
Data is located near the site of greatest demand. The data in a distributed Complexity of management and control. Applications must recognize data location
database system is dispersed to match business requirements. and be able to stitch together data from various sites. Database administrators
must have the ability to coordinate database activities to prevent database
degradation due to data anomalies.
Faster data access. End users often work with only the nearest stored subset of the Technological difficulty. Data integrity, transaction management, concurrency
data. control, security, backup, recovery, and query optimization must all be addressed
and resolved.
Faster data processing. A distributed database system spreads out the system’s Security. Probability of security lapses increases when data is located at multiple
workload by processing data at several sites. sites. Responsibility of data management will be shared by different people at
several sites.
Growth facilitation. New sites can be added to the network without affecting the Lack of standards. No standard communication protocols at the database level.
operations of other sites. Different database vendors employ different and often incompatible techniques to
manage the distribution of data and processing in a DDBMS environment.
Improved communications. Because local sites are smaller and located closer to Increased storage and infrastructure requirements. Multiple copies of data are
customers, local sites foster better communication among departments and required at different sites, thus requiring additional storage space.
between customers and company staff.
Reduced operating costs. It’s more cost-effective to add nodes to a network than Increased training cost. Training costs are generally higher in a distributed model
to update a mainframe system. Development work is done more cheaply and than they would be in a centralized model, sometimes even to the extent of
quickly on low-cost PCs and laptops than on mainframes. offsetting operational and hardware savings.
User-friendly interface. Client devices are usually equipped with an easy-to-use Higher costs. Distributed databases require duplicated infrastructure
graphical user interface (GUI). The G UI simplifies training and use for end users. to operate, such as physical location, environment, personnel, software, and
licensing.
Less danger of a single-point failure. When one of the computers fails, the
workload is picked up by other workstations. Data is also distributed at multiple
sites.
Processor independence. The end user can access any available copy of the data,
and an end user’s request is processed by any processor at the data location.

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 8
for classroom use.
Distributed Processing and Distributed Databases (1 of 3)

• Distributed processing: database’s logical processing is shared among two or


more physically independent sites via network
• Distributed database: stores logically related database over two or more physically
independent sites via a computer network
• Database fragments: database composed of many parts in distributed database
system

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 9
for classroom use.
Distributed Processing and Distributed Databases (2 of 3)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 10
for classroom use.
Distributed Processing and Distributed Databases (3 of 3)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 11
for classroom use.
Characteristics of Distributed Management Systems (1 of 3)

• A DBMS must have several functions to be classified as distributed


• Application interface
• Validation
• Transformation
• Query optimization
• Mapping
• I/O interface
• Formatting
• Security
• Backup and recovery
• DB administration
• Concurrency control
• Transaction management

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 12
for classroom use.
Characteristics of Distributed Management Systems (2 of 3)

• Functions of fully distributed DBMS


• Receive the request of an application or end user
• Validate, analyze, and decompose the request
• Map request’s logical-to-physical data components
• Decompose request into several I/O operations
• Search, locate, read and validate data
• Ensure database consistency, security, and integrity
• Validate data for conditions specified by request
• Present data in required format
• Handle all necessary functions transparently to user

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 13
for classroom use.
Characteristics of Distributed Management Systems (3 of 3)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 14
for classroom use.
DDBMS Components

• The DDBMS must include at least the following components:


• Computer workstations or remote devices
• Network hardware and software components
• Communications media
• Transaction processor (TP)
• Data processor (DP) or data manager (DM)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 15
for classroom use.
Levels of Data and Process Distribution

Table 12.2
Database Systems: Levels of Data
and Process Distribution
Advantages Single-Site Data Multiple-Site Data

Single-site process Host DBMS Not applicable


(Requires multiple processes)
Multiple-site process File server Fully distributed
Client/server DBMS (LAN DBMS) Client/server DDBMS

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 16
for classroom use.
Single-Site Processing, Single-Site Data (SPSD) (1 of 2)

• Characteristics
• Processing is done on a single host computer
• Data stored on host computer’s local disk
• Processing cannot be done on end user’s side
• DBMS is accessed by connected terminals

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 17
for classroom use.
Single-Site Processing, Single-Site Data (SPSD) (2 of 2)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 18
for classroom use.
Multiple-Site Processing, Single-Site Data (MPSD) (1 of 2)

• Multiple processes run on different computers sharing a single data repository


• Typically requires network file server running conventional applications
- Accessed through LAN

• Client/server architecture
• Reduces network traffic
• Distributes processing
• Supports data at multiple sites

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 19
for classroom use.
Multiple-Site Processing, Single-Site Data (MPSD) (2 of 2)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 20
for classroom use.
Multiple-Site Processing, Multiple-Site Data (MPMD) (1 of 2)

• Fully distributed database management system


• Support multiple data processors and transaction processors at multiple sites
• Classifications
• Homogeneous: integrate multiple instances of same DBMS over a network
• Heterogeneous: integrate different types of DBMSs over a network
• Fully heterogeneous: support different DBMSs, each supporting different data model
running under different computer systems

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 21
for classroom use.
Multiple-Site Processing, Multiple-Site Data (MPMD) (2 of 2)

• Restrictions of DDBMS
• Remote access is provided on a read-only basis
• Restrictions on the number of remote tables that may be accessed in a single
transaction
• Restrictions on the number of distinct databases that may be accessed
• Restrictions on the database model that may be accessed

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 22
for classroom use.
Distributed Database Transparency Features

• Minimum desirable DDBMS transparency features


• Distribution transparency
• Transaction transparency
• Failure transparency
• Performance transparency
• Heterogeneity transparency

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 23
for classroom use.
Distribution Transparency

• Allows management of physically dispersed database as if centralized


• Levels: fragmentation, location, and local mapping
- Unique fragment: each row is unique, regardless of the fragment in which it is located
• Supported by distributed data dictionary (DDD) or distributed data catalog (DDC)
- DDC contains the description of the entire database as seen by the database administrator
• Distributed global schema: common database schema to translate user requests into
subqueries

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 24
for classroom use.
Transaction Transparency

• Transaction transparency: DDBMS property that ensures database transactions


will maintain the distributed database’s integrity and consistency
• Ensures transaction completed only when all database sites involved complete their
part
• Distributed database systems require complex mechanisms to manage
transactions
• Ensure the database’s consistency and integrity

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 25
for classroom use.
Distributed Requests and Distributed Transactions

• Remote request
• Single SQL statement accesses data processed by a single remote database processor
• Remote transaction
• Accesses data at single remote site composed of several requests
• Distributed transaction
• Requests data from several different remote sites on network
• Distributed request
• Single SQL statement references data at several DP sites

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 26
for classroom use.
Distributed Concurrency Control (1 of 2)

• Concurrency control is especially important in distributed databases


environment
• Multi-site, multiple-process operations are more likely to create inconsistencies and
deadlocked transactions
• Solution to inconsistent database is a two-phase commit protocol

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 27
for classroom use.
Distributed Concurrency Control (2 of 2)

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 28
for classroom use.
Two-Phase Commit Protocol (2PC)

• Guarantees if a portion of a transaction operation cannot be committed, all


changes made at the other sites will be undone
• Maintains a consistent database state
• Requires each DP’s transaction log entry be written before database fragment is
updated
• DO-UNDO-REDO protocol: roll transactions back and forward with the help of the
system’s transaction log entries
• Write-ahead protocol: forces the log entry to be written to permanent storage
before actual operation takes place
• Two-phase commit protocol defines operations between coordinator and
subordinates
• Phases of implementation
- Preparation
- Final COMMIT

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 29
for classroom use.
Performance and Failure Transparency (1 of 2)

• Performance transparency allows a DDBMS to perform as if it were a


centralized database
• Failure transparency ensures the system will operate in case of network failure
• Objective of query optimization is to minimize total costs
• Access time (I/O) cost involved in accessing data from multiple remote sites
• Communication costs associated with data transmission
• CPU time cost associated with the processing overhead

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 30
for classroom use.
Performance and Failure Transparency (2 of 2)

• Considerations for resolving data requests in a distributed data environment


• Data distribution and data replication
- Replica transparency: DDBMS’s ability to hide multiple copies of data from the user
• Network and node availability
- Network latency: delay imposed by the amount of time required for a data packet to make a
round trip
- Network partitioning: delay imposed when nodes become suddenly unavailable due to a
network failure

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 31
for classroom use.
Distributed Database Design

• Data fragmentation
• How to partition database into fragments
• Data replication
• Which fragments to replicate
• Data allocation
• Where to locate those fragments and replicas

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 32
for classroom use.
Data Fragmentation

• Breaks a single object into two or more segments


• Information is stored in distributed data catalog (DDC)
• Strategies
• Horizontal fragmentation: division of a relation into subsets (fragments) of tuples
(rows)
• Vertical fragmentation: division of a relation into attribute (column) subsets
• Mixed fragmentation: combination of horizontal and vertical strategies

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 33
for classroom use.
Data Replication (1 of 2)

• Storage of data copies at multiple sites served by a computer network


• Mutual consistency rule requires all copies of data fragments be identical
• Styles of replication
• Push replication focuses on maintaining data consistency
• Pull replication focuses on maintaining data availability and allows for temporary data
inconsistencies

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 34
for classroom use.
Data Replication (2 of 2)

• Data replication scenarios


• Fully replicated database: stores multiple copies of each database fragment at
multiple sites
• Partially replicated database: stores multiple copies of some database fragments at
multiple sites
• Unreplicated database: stores each database fragment at a single site

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 35
for classroom use.
Data Allocation

• Data allocation strategies


• Centralized data allocation: entire database stored at one site
• Partitioned data allocation: database is divided into two or more disjoined fragments
and stored at two or more sites
• Replicated data allocation: copies of one or more database fragments are stored at
several sites

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 36
for classroom use.
The CAP Theorem (1 of 2)

• CAP stands for:


• Consistency
• Availability
• Partition tolerance
• Trade-off between consistency and availability generated in a new system in
which data is basically available, soft state, eventually consistent (BASE)
• Data changes are not immediate but propagate slowly through the system until all
replicas are consistent

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 37
for classroom use.
The CAP Theorem (2 of 2)

Table 12.8
Distributed
Database
Spectrum
DBMS Type Consistency Availability Partition Transaction Trade-Off
Tolerance Model
Centralized DBMS High High N/A ACID No distributed data processing

Relational DBMS High Relaxed High ACID (2PC) Sacrifices availability to ensure
consistency and isolation
NoSQL DDBMS Relaxed High High BASE Sacrifices consistency to ensure
availability
NewSQL DDBMS High High Relaxed ACID Sacrifices partition tolerance
to ensure transaction consistency
and availability

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 38
for classroom use.
C. J. Date’s 12 Commandments for Distributed Databases

Table 12.9 C. J. Date ’s 12 Commandments for


Distributed Databases

Rule Rule Name Rule Explanation


Number

1 Local-site independence Each local site can act as an independent, autonomous, centralized DBMS.
Each site is responsible for security, concurrency control, backup, and recovery.

2 Central-site independence Each local site can act as an independent, autonomous, centralized DBMS.
Each site is responsible for security, concurrency control, backup, and recovery.

3 Failure independence The system is not affected by node failures. The system is in continuous operation even in the case of a
node failure or an expansion of the network.

4 Failure Independence The user does not need to know the location of data to retrieve that data.

5 Fragmentation transparency Data fragmentation is transparent to the user, who sees only one logical database. The user does not
need to know the name of the database fragments to retrieve them.

6 Replication transparency The user sees only one logical database. The DDBMS transparently selects the database fragment to
access. To the user, the DDBMS manages all fragments transparently.

7 Distributed query processing A distributed query may be executed at several different DP sites. Query optimization is performed
transparently by the DDBMS.

8 Distributed transaction processing A transaction may update data at several different sites, and the transaction is executed transparently.

9 Hardware independence The system must run on any hardware platform.

10 Operating system independence The system must run on any operating system platform.

11 Network independence The system must run on any network platform.

12 Database independence The system must support any vendor’s database product.

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 39
for classroom use.
Summary (1 of 2)

• A distributed database stores logically related data in two or more physically


independent sites connected via a computer network
• Distributed processing is the division of logical database processing among two
or more network nodes
• The main components of a DDBMS are the transaction processor (TP) and the
data processor (DP)
• Current database systems can be classified by the extent to which they support
processing and data distribution
• A homogeneous distributed database system integrates only one particular
type of DBMS over a computer network
• DDBMS characteristics are best described as a set of transparencies:
distribution, transaction, performance, failure, and heterogeneity

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 40
for classroom use.
Summary (2 of 2)

• A transaction is formed by one or more database requests. An undistributed


transaction updates or requests data from a single site
• Distributed concurrency control is required in a network of distributed
databases
• A distributed DBMS evaluates every data request to find the optimum access
path in a distributed database
• The design of a distributed database must consider the fragmentation and
replication of data
• A database can be replicated over several different sites on a computer network
• The CAP theorem states that a highly distributed data system has some
desirable properties of consistency, availability, and partition tolerance

© 2019 Cengage. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website 41
for classroom use.

You might also like