CH 19
CH 19
1
Database System Concepts 19.1 ©Silberschatz, Korth and Sudarshan
Distributed Database System
2
Database System Concepts 19.2 ©Silberschatz, Korth and Sudarshan
Homogeneous Distributed
Databases
4
Database System Concepts 19.4 ©Silberschatz, Korth and Sudarshan
Data Replication
5
Database System Concepts 19.5 ©Silberschatz, Korth and Sudarshan
Data Replication (Cont.)
Advantages of Replication
Availability: failure of site containing relation r does not
result in unavailability of r is replicas exist.
Parallelism: queries on r may be processed by several nodes
in parallel.
Reduced data transfer: relation r is available locally at each
site containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of relation r must be
updated.
Increased complexity of concurrency control: concurrent
updates to distinct replicas may lead to inconsistent data
unless special concurrency control mechanisms are
implemented.
One solution: choose one copy as primary copy and apply
concurrency control operations on primary copy
6
Database System Concepts 19.6 ©Silberschatz, Korth and Sudarshan
Data Fragmentation
account1=branch-name=“Hillside”(account)
account2=branch-name=“Valleyview”(account)
8
Database System Concepts 19.8 ©Silberschatz, Korth and Sudarshan
Vertical Fragmentation of employee-info
Relation
Horizontal:
allows parallel processing on fragments of a relation
allows a relation to be split so that tuples are located
where they are most frequently accessed
Vertical:
allows tuples to be split so that each part of the tuple is
stored where it is most frequently accessed
tuple-id attribute allows efficient joining of vertical
fragments
allows parallel processing on a relation
Vertical and horizontal fragmentation can be mixed.
Fragments may be successively fragmented to an
arbitrary depth.
10
Database System Concepts 19.10 ©Silberschatz, Korth and Sudarshan
Data Transparency
Data transparency: Degree to which system user
may remain unaware of the details of how and where
the data items are stored in a distributed system
Consider transparency issues in relation to:
Fragmentation transparency
Replication transparency
Location transparency
11
Database System Concepts 19.11 ©Silberschatz, Korth and Sudarshan
Naming of Data Items - Criteria
12
Database System Concepts 19.12 ©Silberschatz, Korth and Sudarshan
Centralized Scheme - Name Server
Structure:
name server assigns all names
each site maintains a record of local data items
sites ask name server to locate non-local data items
Advantages:
satisfies naming criteria 1-3
Disadvantages:
does not satisfy naming criterion 4
name server is a potential performance bottleneck
name server is a single point of failure
13
Database System Concepts 19.13 ©Silberschatz, Korth and Sudarshan
Use of Aliases
14
Database System Concepts 19.14 ©Silberschatz, Korth and Sudarshan
Distributed Transactions
16
Database System Concepts 19.16 ©Silberschatz, Korth and Sudarshan
Transaction System Architecture
17
Database System Concepts 19.17 ©Silberschatz, Korth and Sudarshan
System Failure Modes
20
Database System Concepts 19.20 ©Silberschatz, Korth and Sudarshan
Query Transformation
21
Database System Concepts 19.21 ©Silberschatz, Korth and Sudarshan
Example Query (Cont.)
22
Database System Concepts 19.22 ©Silberschatz, Korth and Sudarshan
Simple Join Processing
depositor at S2
branch at S3
23
Database System Concepts 19.23 ©Silberschatz, Korth and Sudarshan
Possible Query Processing
Strategies
25
Database System Concepts 19.25 ©Silberschatz, Korth and Sudarshan
Formal Definition
The semijoin of r1 with r2, is denoted by:
r1 r2
it is defined by:
R1 (r1 r2)
Thus, r1 r2 selects those tuples of r1 that contributed to
r1 r2.
In step 3 above, temp2=r2 r1.
For joins of several relations, the above strategy can be
extended to a series of semijoin steps.
26
Database System Concepts 19.26 ©Silberschatz, Korth and Sudarshan
Join Strategies that Exploit Parallelism
27
Database System Concepts 19.27 ©Silberschatz, Korth and Sudarshan