Distributed DBMS: Announcements
Distributed DBMS: Announcements
Announcements
• HW3 on NOSQL and MongoDB to be released soon
CompSci 516 – Install the system first
Database Systems – Due in two weeks after NOSQL in class
– Keep working on the project in the meantime!
Lecture 18
Distributed DBMS
Duke CS, Fall 2018 CompSci 516: Database Systems 5 Duke CS, Fall 2018 CompSci 516: Database Systems 6
1
11/6/18
Duke CS, Fall 2018 CompSci 516: Database Systems 7 Duke CS, Fall 2018 CompSci 516: Database Systems 8
Duke CS, Fall 2018 CompSci 516: Database Systems 9 Duke CS, Fall 2018 CompSci 516: Database Systems 10
More on Heterogeneous
Types of Distributed Databases Distributed Databases
• Database servers are accessed through well-accepted and
• Homogeneous: standard Gateway protocols
– masks the differences of DBMSs (capability, data format etc.)
– Every site runs same type of DBMS – e.g. ODBC, JDBC
• However, can be expensive and may not be able to hide all
differences
• Heterogeneous: – e.g. when a server is not capable of supporting distributed
transaction management
– Different sites run different DBMSs
– different RDBMSs or even non-relational DBMSs Gateway
– RDBMS = Relational DBMS
Duke CS, Fall 2018 CompSci 516: Database Systems 11 Duke CS, Fall 2018 CompSci 516: Database Systems 12
2
11/6/18
1. Client-Server
Distributed DBMS Architecture 2. Collaborating Server
3. Middleware
Duke CS, Fall 2018 CompSci 516: Database Systems 13 Duke CS, Fall 2018 CompSci 516: Database Systems 14
Middleware Systems
• Allows a single query to span multiple servers
3
11/6/18
• Vertical:
– Identified by projection queries
– Typically unique TIDs added to each tuple
– TIDs replicated in each fragments
– Ensures that we have a Lossless Join
Duke CS, Fall 2018 CompSci 516: Database Systems 19 Duke CS, Fall 2018 CompSci 516: Database Systems 20
Duke CS, Fall 2018 CompSci 516: Database Systems 21 Duke CS, Fall 2018 CompSci 516: Database Systems 22
SELECT AVG(S.age)
Non-Join Distributed Queries FROM Sailors S
WHERE S.rating > 3
tid sid sname rating age AND S.rating < 7
T1 4 stored at Shanghai
T2 5
stored at Tokyo
T3 9
Distributed Query Processing • Horizontally Fragmented: Tuples with rating < 5 at Shanghai, >= 5 at Tokyo.
– Must compute SUM(age), COUNT(age) at both sites.
– If WHERE contained just S.rating > 6, just one site
• Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo,
No joins tid at both.
Join – Must reconstruct relation by join on tid, then evaluate the query
– if no tid, decomposition would be lossy
• Replicated: Sailors copies at both sites.
– Choice of site based on local costs (e.g. index), shipping costs
Duke CS, Fall 2018 CompSci 516: Database Systems 23 Duke CS, Fall 2018 CompSci 516: Database Systems 24
4
11/6/18
• Tradeoff the cost of computing and shipping projection • Similar idea like semi-join
for cost of shipping full R relation
• Suppose want to ship R to London and then do
• Especially useful if there is a selection on Sailors, and
answer desired at London join with S at London (like semijoin)
LONDON PARIS
LONDON PARIS
Sailors (S) Reserves (R)
Sailors (S) Reserves (R)
500 pages 1000 pages
Duke CS, Fall 2018 CompSci 516: Database Systems 29 Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 30
5
11/6/18
LONDON PARIS • Query site constructs global plan, with suggested local
plans describing processing at each site
Sailors (S) Reserves (R) – If a site can improve suggested local plan, free to do so
Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 33 Duke CS, Fall 2018 CompSci 516: Database Systems 34
Duke CS, Fall 2018 CompSci 516: Database Systems 35 Duke CS, Fall 2018 CompSci 516: Database Systems 36
6
11/6/18
Duke CS, Fall 2018 CompSci 516: Database Systems 39 Duke CS, Fall 2018 CompSci 516: Database Systems 40
7
11/6/18
Distributed Transactions
• Distributed CC
– How can locks for objects stored across several
sites be managed?
Distributed Transactions – How can deadlocks be detected in a distributed
database?
• Distributed Recovery
Distributed CC – When a transaction commits, all its actions, across
Distributed Recovery all the sites at which is executes must persist
– When a transaction aborts, none of its actions
must be allowed to persist
Duke CS, Fall 2018 CompSci 516: Database Systems 43 Duke CS, Fall 2018 CompSci 516: Database Systems 44
2. Primary Copy: All locking for an object done at the • Each site maintains a local waits-for graph
primary copy site for this object
– Reading requires access to locking site as well as site where • A global deadlock might exist even if the local graphs contain no cycles
the object copy is stored
• Further, phantom deadlocks may be created while communicating
3. Fully Distributed: Locking for a copy done at site where – due to delay in propagating local information
the copy is stored – might lead to unnecessary aborts
– Locks at all sites while writing an object (unlike previous two)
Duke CS, Fall 2018 CompSci 516: Database Systems 45 Duke CS, Fall 2018 CompSci 516: Database Systems 46
Three Distributed
Distributed Recovery
Deadlock Detection Approaches
T1 T2 T1 T2 T1 T2 • Two new issues:
SITE A SITE B GLOBAL – New kinds of failure, e.g., links and remote sites
1. Centralized – If “sub-transactions” of a transaction execute at
• send all local graphs to one site periodically different sites, all or none must commit
• A global waits-for graph is generated – Need a commit protocol to achieve this
2. Hierarchical – Most widely used: Two Phase Commit (2PC)
• organize sites into a hierarchy and send local graphs to parent in the
hierarchy
• e.g. sites (every 10 sec)-> sites in a state (every min)-> sites in a • A log is maintained at each site
country (every 10 min) -> global waits for graph
– as in a centralized DBMS
• intuition: more deadlocks are likely across closely related sites
3. Timeout – commit protocol actions are additionally logged
• abort transaction if it waits too long (low overhead)
Duke CS, Fall 2018 CompSci 516: Database Systems 47 Duke CS, Fall 2018 CompSci 516: Database Systems 48
8
11/6/18
Example on whiteboard
Duke CS, Fall 2018 CompSci 516: Database Systems 49 Duke CS, Fall 2018 CompSci 516: Database Systems 50
When a transaction wants to commit – 1/5 When a transaction wants to commit – 2/5
Duke CS, Fall 2018 CompSci 516: Database Systems 51 Duke CS, Fall 2018 CompSci 516: Database Systems 52
When a transaction wants to commit – 3/5 When a transaction wants to commit – 4/5
9
11/6/18
Restart After a Failure at a Site – 1/4 Restart After a Failure at a Site – 2/4
• If we have a commit or abort log record for
transaction T, but not an end record, must
• Recovery process is invoked after a sites comes
redo/undo T respectively
back up after a crash
– If this site is the coordinator for T (from the log
– reads the log and executes the commit protocol record), keep sending commit/abort messages to subs
– the coordinator or a subordinate may have a crash until acks received
– one site can be the coordinator some transaction and – then write an end log record for T
subordinates for others
Duke CS, Fall 2018 CompSci 516: Database Systems 57 Duke CS, Fall 2018 CompSci 516: Database Systems 58
Restart After a Failure at a Site – 3/4 Restart After a Failure at a Site – 4/4
• If we have a prepare log record for transaction T, • If we don’t have even a prepare log record for T
but not commit/abort – T was not voted to commit before crash
– This site is a subordinate for T – unilaterally abort and undo T
– Repeatedly contact the coordinator to find status of T – write an end record
– Then write commit/abort log record • No way to determine if this site is the coordinator
or subordinate
– Redo/undo T
– If this site is the coordinator, it might have sent
– and write end log record prepare messages
– then, subs may send yes/no message – coordinator is
detected – ask subordinates to abort
Duke CS, Fall 2018 CompSci 516: Database Systems 59 Duke CS, Fall 2018 CompSci 516: Database Systems 60
10
11/6/18
Duke CS, Fall 2018 CompSci 516: Database Systems 61 Duke CS, Fall 2018 CompSci 516: Database Systems 62
Duke CS, Fall 2018 CompSci 516: Database Systems 63 Duke CS, Fall 2018 CompSci 516: Database Systems 64
11