100% found this document useful (1 vote)

91 views11 pages

Distributed DBMS: Announcements

This document contains lecture notes on distributed database management systems (DBMS). It discusses key topics like distributed data independence, distributed transaction atomicity, and types of distributed DBMSs. The lecture notes also provide an overview of parallel and distributed data processing, and announce that homework 3 on NoSQL and MongoDB will be released soon.

Uploaded by

DenfordMachado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

91 views11 pages

Distributed DBMS: Announcements

Uploaded by

DenfordMachado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

11/6/18

Announcements
• HW3 on NOSQL and MongoDB to be released soon
CompSci 516 – Install the system first
Database Systems – Due in two weeks after NOSQL in class
– Keep working on the project in the meantime!

Lecture 18
Distributed DBMS

Instructor: Sudeepa Roy

Duke CS, Fall 2018 CompSci 516: Database Systems 1 Duke CS, Fall 2018 CompSci 516: Database Systems 2

Where are we now? Reading Material

• [RG]
We learnt ü Transactions – Parallel DBMS: Chapter 22.1-22.5
ü Relational Model and ü Basic concepts – Distributed DBMS: Chapter 22.6 – 22.14
Query Languages
ü SQL, RA, RC ü Concurrency control
• [GUW]
ü Postgres (DBMS) ü Recovery – Parallel DBMS and map-reduce: Chapter 20.1-20.2
§ HW1 – Distributed DBMS: Chapter 20.3, 20.4.1-20.4.2, 20.5-20.6
ü Map-reduce and spark Next
§ HW2 • Distributed DBMS • Other recommended readings:
ü DBMS Internals • NOSQL – Chapter 2 (Sections 1,2,3) of Mining of Massive Datasets, by Rajaraman and Ullman:
ü Storage http://i.stanford.edu/~ullman/mmds.html
– Original Google MR paper by Jeff Dean and Sanjay Ghemawat, OSDI’ 04:
ü Indexing • (ARIES protocol of transactions to http://research.google.com/archive/mapreduce.html
ü Query Evaluation be covered later)
ü Operator Algorithms Acknowledgement:
ü External sort The following slides have been created adapting the
ü Query Optimization instructor material of the [RG] book provided by the authors
ü Database Normalization Dr. Ramakrishnan and Dr. Gehrke.
Duke CS, Fall 2018 CompSci 516: Database Systems 3 Duke CS, Fall 2018 CompSci 516: Database Systems 4

Parallel and Distributed Data

Topics in Distributed DBMS
Processing
• So far, query processing on a single machine
– Query Execution and Optimization • Architecture
– Transaction CC and Recovery
• Data Storage
• Now: data and operation distribution
• Query Execution
• Parallelism
– performance • Transactions – updates
– Parallel databases (will be covered soon)
• Recovery – Two Phase Commit (2PC)
• Data distribution
– increased availability, e.g. when a site goes down
– distributed local access to data (e.g. an organization may have branches in
several cities) • Warning! Many concepts and terminology
– analysis of distributed data
– Distributed DBMS (today)

Duke CS, Fall 2018 CompSci 516: Database Systems 5 Duke CS, Fall 2018 CompSci 516: Database Systems 6

1
11/6/18

Introduction: Distributed Databases Distributed Data Independence

• Users should not have to know where data is
located
• Data is stored at several sites, each – no need to know the locations of references
managed by a DBMS that can run relations, their copies or fragments (later)
independently – extends Physical and Logical Data Independence
principles

• Desired properties • Queries spanning multiple sites should be

1. Distributed Data Independence
optimized in a cost-based manner
– taking into account communication costs and
2. Distributed Transaction Atomicity differences in local computation costs

Duke CS, Fall 2018 CompSci 516: Database Systems 7 Duke CS, Fall 2018 CompSci 516: Database Systems 8

Distributed Transaction Atomicity Recent Trends on These Two Properties

• These two properties are in general desirable

1. Users should be able to write transactions • But not always efficiently achievable
accessing multiple sites just like local – e.g. when sites are connected by a slow long-distance network
transactions • Even sometimes not desirable for globally distributed sites
– too much administrative overhead of making location of data
“transparent” (not visible to user)
2. The effects of a transaction across sites • Therefore not always supported
should be atomic – Users have to be aware of where data is located
– all changes persist if transaction commits – Not much consensus on the design objectives on distributed
databases
– none persist if transaction aborts

Duke CS, Fall 2018 CompSci 516: Database Systems 9 Duke CS, Fall 2018 CompSci 516: Database Systems 10

More on Heterogeneous
Types of Distributed Databases Distributed Databases
• Database servers are accessed through well-accepted and
• Homogeneous: standard Gateway protocols
– masks the differences of DBMSs (capability, data format etc.)
– Every site runs same type of DBMS – e.g. ODBC, JDBC
• However, can be expensive and may not be able to hide all
differences
• Heterogeneous: – e.g. when a server is not capable of supporting distributed
transaction management
– Different sites run different DBMSs
– different RDBMSs or even non-relational DBMSs Gateway
– RDBMS = Relational DBMS

DBMS1 DBMS2 DBMS3

Duke CS, Fall 2018 CompSci 516: Database Systems 11 Duke CS, Fall 2018 CompSci 516: Database Systems 12

2
11/6/18

Distributed DBMS Architectures

• Three alternative approaches

1. Client-Server
Distributed DBMS Architecture 2. Collaborating Server
3. Middleware

Duke CS, Fall 2018 CompSci 516: Database Systems 13 Duke CS, Fall 2018 CompSci 516: Database Systems 14

Client-Server Systems Collaborating Server Systems

• One or more client (e.g. personal computer) and one or more server processes
(e.g. a mainframe) • Queries can span multiple sites
– A client process can ship a query to any server process – not allowed in client-servers as the clients would have
– Clients are responsible for user interfaces
– Server manages data and executes queries had to break queries and combine the results
• Advantages • When a server receives a query that requires
– clean separation and centralized server
– expensive server machines are not underutilized by simple user interactions access to data at other servers
– users can run GUI on clients that they are familiar with – it generates appropriate subqueries
• Challenges
– need to carefully handle communication costs – puts the result together
– e.g. fetching tuples one at a time might be bad – need to do caching on client side
• Eliminates distinction between client and server
QUERY

CLIENT CLIENT SERVER

SERVER
SERVER
SERVER SERVER SERVER QUERY
Duke CS, Fall 2018 CompSci 516: Database Systems 15 Duke CS, Fall 2018 CompSci 516: Database Systems 16

Middleware Systems
• Allows a single query to span multiple servers

• But does not require all db servers to be capable of

handling multi-site execution strategies
– need just one db server capable of managing queries and
transactions spanning multiple servers (called middleware) Storing Data in Distributed DBMS
– the remaining servers can handle only the local queries and
transactions

• The middleware layer is capable of executing joins and

other operations on data obtained from other servers, but
typically does not maintain any data

• Useful when trying to integrate several “legacy systems”

– whose basic capabilities cannot be extended
Duke CS, Fall 2018 CompSci 516: Database Systems 17 Duke CS, Fall 2018 CompSci 516: Database Systems 18

3
11/6/18

Storing Data in a Distributed DBMS Fragmentation

• Relations are stored across several sites • Break a relation into smaller relations or fragments
– store them in different sites as needed
• Accessing data at a remote site incurs message- TID
passing costs t1
• To reduce this overhead, a single relation may be t2
t3
partitioned or fragmented across several sites t4
• Horizontal:
– typically at sites where they are most often accessed – Usually disjoint
– Can often be identified by a selection query (employees in a city – locality of
• The data can be replicated as well reference)
– when the relation is in high demand – To retrieve the full relation, need a union

• Vertical:
– Identified by projection queries
– Typically unique TIDs added to each tuple
– TIDs replicated in each fragments
– Ensures that we have a Lossless Join
Duke CS, Fall 2018 CompSci 516: Database Systems 19 Duke CS, Fall 2018 CompSci 516: Database Systems 20

Replication Distributed Catalog Management

• When we store several copies of a relation or relation fragments • Must keep track of how data is fragmented and replicated across sites
– can be replicated at one or more sites – in addition to usual schema, authorization, and statistical information
– e.g. R is fragmented into R1, R2, R3; one copy of R2, R3; but two copies
at R1 at two sites • Must be able to uniquely identify each replica of each fragment
• Advantages – Globally unique name may compromise autonomy of servers
– Gives increased availability – e.g. when a site or communication link goes – To preserve local autonomy: Global relation name = <local-name, birth-
down site>
– Faster query evaluation – e.g. using a local copy – To identify a replica, add a replica-id field (now called global replica
• Synchronous and Asynchronous (later) name)
– Vary in how current different copies are when a relation is modified
• Site Catalog: Describes all objects (fragments, replicas) at a site +
SITE B Keeps track of replicas of relations created at this site
SITE A – To find a relation, look up its birth-site catalog
R1 R3 – Birth-site never changes, even if relation is moved
R1 R2

Duke CS, Fall 2018 CompSci 516: Database Systems 21 Duke CS, Fall 2018 CompSci 516: Database Systems 22

SELECT AVG(S.age)
Non-Join Distributed Queries FROM Sailors S
WHERE S.rating > 3
tid sid sname rating age AND S.rating < 7
T1 4 stored at Shanghai
T2 5
stored at Tokyo
T3 9

Distributed Query Processing • Horizontally Fragmented: Tuples with rating < 5 at Shanghai, >= 5 at Tokyo.
– Must compute SUM(age), COUNT(age) at both sites.
– If WHERE contained just S.rating > 6, just one site
• Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo,
No joins tid at both.
Join – Must reconstruct relation by join on tid, then evaluate the query
– if no tid, decomposition would be lossy
• Replicated: Sailors copies at both sites.
– Choice of site based on local costs (e.g. index), shipping costs

Duke CS, Fall 2018 CompSci 516: Database Systems 23 Duke CS, Fall 2018 CompSci 516: Database Systems 24

4
11/6/18

Joins in a Distributed DBMS 1. Fetch As Needed

• Page-oriented Nested Loop Join
• Can be very expensive if relations are stored at
– Sailors as outer – for each S page, fetch all R pages from Paris
different sites
– if cached at London, each R page fetched once
– Otherwise, Cost: 500 d + 500 * 1000 (d+s)
– d is cost to read/write page
1. Fetch as needed
– s is cost to ship page
2. Ship to one site – If query was not submitted at London, must add cost of shipping
3. Semi-join result to query site
– Can also do Index NL at London, fetching matching Reserves tuples to
4. Bloom join London as needed
LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages

Duke CS, Fall 2018 CompSci 516: Database Systems 25 Duke CS, Fall 2018 CompSci 516: Database Systems 26

2. Ship To One Site 3. Semijoin -1/2

• Ship Sailors (S) to Paris
– Cost: 500 (2d + s) + 4500 d • Suppose want to ship R to London and then do join with S at
– For relation S: reading in London, shipping to Paris, and saving it in Paris: 500 London. Instead,
(2d + s)
1. At London, project S onto join columns and ship this to Paris
– Assume Sort-Merge Join with cost 3(M+N), i.e. enough memory
– Here foreign keys, but could be arbitrary join
– Then join cost = 3*(500+1000)d
– If result size is very large, may be better to ship both relations to result site 2. At Paris, join S-projection with R
and then join them – Result is called reduction of Reserves w.r.t. Sailors (only these tuples are
• Not all tuples in S join with a tuple in R needed)
– unnecessary shipping 3. Ship reduction of R to back to London
– solution: Semi-join
4. At London, join S with reduction of R
LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages

Duke CS, Fall 2018 CompSci 516: Database Systems 27 Duke CS, Fall 2018 CompSci 516: Database Systems 28

3. Semijoin – 2/2 4. Bloomjoin – 1/4

• Tradeoff the cost of computing and shipping projection • Similar idea like semi-join
for cost of shipping full R relation
• Suppose want to ship R to London and then do
• Especially useful if there is a selection on Sailors, and
answer desired at London join with S at London (like semijoin)

LONDON PARIS
LONDON PARIS
Sailors (S) Reserves (R)
Sailors (S) Reserves (R)
500 pages 1000 pages
Duke CS, Fall 2018 CompSci 516: Database Systems 29 Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 30

5
11/6/18

4. Bloomjoin – 2/4 4. Bloomjoin – 3/4

1. At London, compute a bit-vector of some size k:

– Hash column values into range 0 to k-1
– If some tuple hashes to p, set bit p to 1 (p from 0 to k-1) 3. Ship “bit-vector-reduced” R to London
– Ship bit-vector to Paris 4. At London, join S with reduced R
2. At Paris, hash each tuple of R similarly
– discard tuples that hash to 0 in S’s bit-vector
– Result is called reduction of R w.r.t S

LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

Duke CS, Fall 2018 500 pages 1000 pages

CompSci 516: Database Systems 31 Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 32

4. Bloomjoin – 4/4 Distributed Query Optimization

• Cost-based approach
– consider all plans
– pick cheapest
• Bit-vector cheaper to ship, almost as effective
– the size of the reduction of R shipped back can be • Similar to centralized optimization, but have differences
larger. Why? 1. Communication costs must be considered
2. Local site autonomy must be respected
3. New distributed join methods

LONDON PARIS • Query site constructs global plan, with suggested local
plans describing processing at each site
Sailors (S) Reserves (R) – If a site can improve suggested local plan, free to do so
Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 33 Duke CS, Fall 2018 CompSci 516: Database Systems 34

Updating distributed data

• Classical view says that it should be the same as a
Distributed transactions centralized DBMS from user’s viewpoint and addressed
at implementation level
Updating Distributed Data
• so far, we had this w.r.t. “queries”

Synchronous • w.r.t “updates”, this means transactions should be atomic

Asynchronous regardless of data fragmentation and replication

• But there are other alternatives too

Duke CS, Fall 2018 CompSci 516: Database Systems 35 Duke CS, Fall 2018 CompSci 516: Database Systems 36

6
11/6/18

Updating Distributed Data Synchronous Replication

• Voting: transaction must write a majority of copies to
• Synchronous Replication: All copies of a modified modify an object; must read enough copies to be sure of
relation (or fragment) must be updated before the seeing at least one most recent copy
modifying transaction commits – E.g., 10 copies; 7 written for update; 4 copies read (why 4?)
– Each copy has version number – copy with the highest
– Data distribution is made “transparent” (not visible!) to users version number is current
– Not attractive usually because reads are common
• Asynchronous Replication: Copies of a modified
relation are only periodically updated; different copies • Read-any Write-all: Read any copy, Write all copies
– Writes are slower and reads are faster, relative to Voting
may get out of sync in the meantime
– Most common approach to synchronous replication
– Users must be aware of data distribution – A special case of voting (why?)
– More efficient – many current products follow this approach
• Choice of technique determines which locks to set
Duke CS, Fall 2018 CompSci 516: Database Systems 37 Duke CS, Fall 2018 CompSci 516: Database Systems 38

Cost of Synchronous Replication Asynchronous Replication

• Before an update transaction can commit, it must • Allows modifying transaction to commit before all
obtain locks on all modified copies copies have been changed
– Sends lock requests to remote sites, and while waiting – readers nonetheless look at just one copy
for the response, holds on to other locks – Users must be aware of which copy they are reading,
– If sites or links fail, transaction cannot commit until and that copies may be out-of-sync for short periods
they are back up of time
– Even if there is no failure, committing must follow an
expensive commit protocol with many messages (later) • Two approaches: Primary Site and Peer-to-Peer
replication
• So the alternative of asynchronous replication is – Difference lies in how many copies are “updatable" or
“master copies"
becoming widely used

Duke CS, Fall 2018 CompSci 516: Database Systems 39 Duke CS, Fall 2018 CompSci 516: Database Systems 40

Primary Site Replication Peer-to-Peer Replication

• Exactly one copy of a relation is designated the • More than one of the copies of an object can be a
primary or master copy master
– Replicas at other sites cannot be directly updated
– The primary copy is published • Changes to a master copy must be propagated to
– Other sites subscribe to this relation (or its fragments) other copies somehow
– These are secondary copies • If two master copies are changed in a conflicting
• How are changes to the primary copy propagated to manner, conflict resolution needed
the secondary copies? – e.g., Site 1: Joe’s age changed to 35; Site 2: to 36
– Done in two steps • Best used when conflicts do not arise:
– First, “capture” changes made by committed transactions
– E.g., Each master site owns a disjoint fragment
– Then, “apply” these changes
• more details in the [RG] book (optional reading) – E.g., Updating rights held by one master at a time – then
propagated to other sites
Duke CS, Fall 2018 CompSci 516: Database Systems 41 Duke CS, Fall 2018 CompSci 516: Database Systems 42

7
11/6/18

Distributed Transactions
• Distributed CC
– How can locks for objects stored across several
sites be managed?
Distributed Transactions – How can deadlocks be detected in a distributed
database?
• Distributed Recovery
Distributed CC – When a transaction commits, all its actions, across
Distributed Recovery all the sites at which is executes must persist
– When a transaction aborts, none of its actions
must be allowed to persist

Duke CS, Fall 2018 CompSci 516: Database Systems 43 Duke CS, Fall 2018 CompSci 516: Database Systems 44

Distributed Locking Distributed Deadlock Detection

• How do we manage locks for objects across many sites?
T1 T2 T1 T2 T1 T2

1. Centralized: One site does all locking SITE A SITE B GLOBAL

– Vulnerable to single site failure

2. Primary Copy: All locking for an object done at the • Each site maintains a local waits-for graph
primary copy site for this object
– Reading requires access to locking site as well as site where • A global deadlock might exist even if the local graphs contain no cycles
the object copy is stored
• Further, phantom deadlocks may be created while communicating
3. Fully Distributed: Locking for a copy done at site where – due to delay in propagating local information
the copy is stored – might lead to unnecessary aborts
– Locks at all sites while writing an object (unlike previous two)

Duke CS, Fall 2018 CompSci 516: Database Systems 45 Duke CS, Fall 2018 CompSci 516: Database Systems 46

Three Distributed
Distributed Recovery
Deadlock Detection Approaches
T1 T2 T1 T2 T1 T2 • Two new issues:
SITE A SITE B GLOBAL – New kinds of failure, e.g., links and remote sites
1. Centralized – If “sub-transactions” of a transaction execute at
• send all local graphs to one site periodically different sites, all or none must commit
• A global waits-for graph is generated – Need a commit protocol to achieve this
2. Hierarchical – Most widely used: Two Phase Commit (2PC)
• organize sites into a hierarchy and send local graphs to parent in the
hierarchy
• e.g. sites (every 10 sec)-> sites in a state (every min)-> sites in a • A log is maintained at each site
country (every 10 min) -> global waits for graph
– as in a centralized DBMS
• intuition: more deadlocks are likely across closely related sites
3. Timeout – commit protocol actions are additionally logged
• abort transaction if it waits too long (low overhead)

Duke CS, Fall 2018 CompSci 516: Database Systems 47 Duke CS, Fall 2018 CompSci 516: Database Systems 48

8
11/6/18

Two-Phase Commit (2PC)

• Site at which transaction originates is
coordinator
Two Phase Commit (2PC) • Other sites at which it executes are
subordinates
– w.r.t. coordinarion of this transaction

Example on whiteboard

Duke CS, Fall 2018 CompSci 516: Database Systems 49 Duke CS, Fall 2018 CompSci 516: Database Systems 50

When a transaction wants to commit – 1/5 When a transaction wants to commit – 2/5

2. Subordinate receives the prepare message

1. Coordinator sends prepare message to each a) decides whether to abort or commit its
subordinate subtransaction
b) force-writes an abort or prepare log record
c) then sends a no or yes message to coordinator

Duke CS, Fall 2018 CompSci 516: Database Systems 51 Duke CS, Fall 2018 CompSci 516: Database Systems 52

When a transaction wants to commit – 3/5 When a transaction wants to commit – 4/5

3. If coordinator gets unanimous yes votes from

all subordinates 4. Subordinates force-write abort/commit log
a) it force-writes a commit log record record based on message they get
b) then sends commit message to all subs a) then send ack message to coordinator
b) If commit received, commit the subtransaction
Else (if receives a no message or no response c) write an end record
from some subordinate),
a) it force-writes abort log record
b) then sends abort messages
Duke CS, Fall 2018 CompSci 516: Database Systems 53 Duke CS, Fall 2018 CompSci 516: Database Systems 54

9
11/6/18

When a transaction wants to commit – 5/5 Comments on 2PC

• Two rounds of communication
– first, voting
5. After the coordinator receives ack from all – then, termination
subordinates, – Both initiated by coordinator
– writes end log record • Any site (coordinator or subordinate) can unilaterially decide to
abort a transaction
– but unanimity/consensus needed to commit
Transaction is officially committed when the • Every message reflects a decision by the sender
coordinator’s commit log record reaches the disk – to ensure that this decision survives failures, it is first recorded in the local
log and is force-written to disk
– subsequent failures cannot affect the outcomes • All commit protocol log records for a transaction contain tid and
Coordinator-id
– The coordinator’s abort/commit record also includes ids of all
subordinates.
Duke CS, Fall 2018 CompSci 516: Database Systems 55 Duke CS, Fall 2018 CompSci 516: Database Systems 56

Restart After a Failure at a Site – 1/4 Restart After a Failure at a Site – 2/4
• If we have a commit or abort log record for
transaction T, but not an end record, must
• Recovery process is invoked after a sites comes
redo/undo T respectively
back up after a crash
– If this site is the coordinator for T (from the log
– reads the log and executes the commit protocol record), keep sending commit/abort messages to subs
– the coordinator or a subordinate may have a crash until acks received
– one site can be the coordinator some transaction and – then write an end log record for T
subordinates for others

Duke CS, Fall 2018 CompSci 516: Database Systems 57 Duke CS, Fall 2018 CompSci 516: Database Systems 58

Restart After a Failure at a Site – 3/4 Restart After a Failure at a Site – 4/4
• If we have a prepare log record for transaction T, • If we don’t have even a prepare log record for T
but not commit/abort – T was not voted to commit before crash
– This site is a subordinate for T – unilaterally abort and undo T
– Repeatedly contact the coordinator to find status of T – write an end record
– Then write commit/abort log record • No way to determine if this site is the coordinator
or subordinate
– Redo/undo T
– If this site is the coordinator, it might have sent
– and write end log record prepare messages
– then, subs may send yes/no message – coordinator is
detected – ask subordinates to abort

Duke CS, Fall 2018 CompSci 516: Database Systems 59 Duke CS, Fall 2018 CompSci 516: Database Systems 60

10
11/6/18

Blocking Link and Remote Site Failures

• If coordinator for transaction T fails, subordinates • If a remote site does not respond during the
who have voted yes cannot decide whether to commit protocol for transaction T, either because
commit or abort T until coordinator recovers. the site failed or the link failed:
– T is blocked – If the current site is the coordinator for T, should abort T
– Even if all subordinates know each other (extra – If the current site is a subordinate, and has not yet voted
overhead in prepare message) they are blocked unless yes, it should abort T
one of them voted no – If the current site is a subordinate and has voted yes, it is
• Note: even if all subs vote yes, the coordinator blocked until the coordinator responds
then can give a no vote, and decide later to – needs to periodically contact the coordinator until
abort! receives a reply

Duke CS, Fall 2018 CompSci 516: Database Systems 61 Duke CS, Fall 2018 CompSci 516: Database Systems 62

Observations on 2PC Other variants of 2PC

• 2PC with presumed abort
• Ack messages used to let coordinator know when it – When coordinator aborts T, it undoes T and removes it from the
can “forget” a transaction; until it receives all acks, it transaction Table immediately (presumes abort). Doesn’t wait for acks
must keep T in the transaction Table
• 3PC
– prepare->precommit -> commit
• If coordinator fails after sending prepare messages
but before writing commit/abort log records, when it • Not covered in class
recovers, it aborts the transaction – discussed in the book

• If a subtransaction does no updates, its commit or

abort status is irrelevant

Duke CS, Fall 2018 CompSci 516: Database Systems 63 Duke CS, Fall 2018 CompSci 516: Database Systems 64

Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
No ratings yet
Team:DBMS: by Navdeep Kaur Assistant Professor Computer Science Department
19 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Introduction To DDBMS Enhanced
No ratings yet
Introduction To DDBMS Enhanced
17 pages
.Ashwani - Mishra
No ratings yet
.Ashwani - Mishra
7 pages
NoSQL & Distributed Databases Overview
No ratings yet
NoSQL & Distributed Databases Overview
124 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
Lefikir PowerPoint
No ratings yet
Lefikir PowerPoint
15 pages
Distributed Database System
No ratings yet
Distributed Database System
9 pages
Dbms Unit5
No ratings yet
Dbms Unit5
17 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
24 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Chapter-7 Distributed Database Systems
No ratings yet
Chapter-7 Distributed Database Systems
40 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
Distributed Databases: Indu Saini (Research Scholar) IIT Roorkee Enrollment No.: 10926003
No ratings yet
Distributed Databases: Indu Saini (Research Scholar) IIT Roorkee Enrollment No.: 10926003
14 pages
Distributed Database System
No ratings yet
Distributed Database System
4 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
CSE 453 Slide 1
No ratings yet
CSE 453 Slide 1
46 pages
Advanced Distributed Databases
100% (1)
Advanced Distributed Databases
20 pages
Distributed Database
100% (1)
Distributed Database
24 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Unit 4 DDBMS
No ratings yet
Unit 4 DDBMS
58 pages
Distributed DBMS for IT Professionals
No ratings yet
Distributed DBMS for IT Professionals
46 pages
Distributed Database Essentials
No ratings yet
Distributed Database Essentials
17 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Unit 4 Distributed DBMS by ANS
No ratings yet
Unit 4 Distributed DBMS by ANS
12 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
DB 5
No ratings yet
DB 5
17 pages
DDBS Lecture1
No ratings yet
DDBS Lecture1
24 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
5 pages
Distributed Database Systems Overview
No ratings yet
Distributed Database Systems Overview
22 pages
Lec 11. Distributed Database Systems
No ratings yet
Lec 11. Distributed Database Systems
11 pages
Unit 1 DISTRIBUTED DATABASE
No ratings yet
Unit 1 DISTRIBUTED DATABASE
6 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
Distributed
No ratings yet
Distributed
83 pages
ADBMS
No ratings yet
ADBMS
84 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
46 pages
BDT Unit 02 - Part1
No ratings yet
BDT Unit 02 - Part1
153 pages
Adv DB@Chap 4 S
No ratings yet
Adv DB@Chap 4 S
29 pages
Chapter 4 Distributed Database Systems
No ratings yet
Chapter 4 Distributed Database Systems
69 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Dbms Unit
No ratings yet
Dbms Unit
71 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Distributed Database Systems
No ratings yet
Distributed Database Systems
50 pages
Lecture 1 Ho PDF
No ratings yet
Lecture 1 Ho PDF
62 pages
Lecture 1 Ho
No ratings yet
Lecture 1 Ho
62 pages
CS8492 DBMS Unit 5
No ratings yet
CS8492 DBMS Unit 5
20 pages
Distributed Databases: Daniel Marcous
No ratings yet
Distributed Databases: Daniel Marcous
41 pages
JK DBMS Ii Year (48P X 62C) Unit V
No ratings yet
JK DBMS Ii Year (48P X 62C) Unit V
48 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Distributed Database: Source
No ratings yet
Distributed Database: Source
19 pages
Flexible German Search Evaluator Job
No ratings yet
Flexible German Search Evaluator Job
2 pages
Building Information Modeling
No ratings yet
Building Information Modeling
29 pages
D XII CS Practical List 2024-25
No ratings yet
D XII CS Practical List 2024-25
24 pages
Advanced Python Final
No ratings yet
Advanced Python Final
11 pages
VCM II Hardware Manual ENG
No ratings yet
VCM II Hardware Manual ENG
18 pages
Internet Technologies: Teach Computer Science
No ratings yet
Internet Technologies: Teach Computer Science
15 pages
12.automatic Ambulance Rescue SM
No ratings yet
12.automatic Ambulance Rescue SM
5 pages
Oracle SQL - Selecting From All - Tab - Columns Does Not Find Existing Column - Stack Overflow
No ratings yet
Oracle SQL - Selecting From All - Tab - Columns Does Not Find Existing Column - Stack Overflow
2 pages
Unit 5 - Wireless Network - WWW - Rgpvnotes.in
100% (1)
Unit 5 - Wireless Network - WWW - Rgpvnotes.in
18 pages
Iso TS 19150-1-2012
No ratings yet
Iso TS 19150-1-2012
38 pages
Harold S Worst Nightmare
No ratings yet
Harold S Worst Nightmare
18 pages
Av Log
No ratings yet
Av Log
896 pages
Brocade 300 EOL OT100
No ratings yet
Brocade 300 EOL OT100
3 pages
Microsoft SC-200 - Microsoft Security Operations Analyst Exam
No ratings yet
Microsoft SC-200 - Microsoft Security Operations Analyst Exam
3 pages
Queue
No ratings yet
Queue
8 pages
Agent Structure
No ratings yet
Agent Structure
17 pages
Chapter 3 L E-Commerce Infrastructure
No ratings yet
Chapter 3 L E-Commerce Infrastructure
22 pages
OceanofPDF - Com Exam Ref DP-100 Designing and Implementing - Dayne Sorvisto
No ratings yet
OceanofPDF - Com Exam Ref DP-100 Designing and Implementing - Dayne Sorvisto
431 pages
Algo Trading vs HFT: Key Differences
No ratings yet
Algo Trading vs HFT: Key Differences
4 pages
XCMD
No ratings yet
XCMD
11 pages
Developer's Guide to B2C Commerce
No ratings yet
Developer's Guide to B2C Commerce
6 pages
APC - October 2024 AU
No ratings yet
APC - October 2024 AU
116 pages
Iris Flower Classification Final
No ratings yet
Iris Flower Classification Final
15 pages
Open Source Software System: Anjali Chaudhary
No ratings yet
Open Source Software System: Anjali Chaudhary
30 pages
Cloud Security Policy Template
No ratings yet
Cloud Security Policy Template
4 pages
Uquran Pro Instruction
No ratings yet
Uquran Pro Instruction
2 pages
Job Matching for Graduates
No ratings yet
Job Matching for Graduates
16 pages
Data Warehousing Slides
No ratings yet
Data Warehousing Slides
76 pages
Uganda ICT Exam Guide
No ratings yet
Uganda ICT Exam Guide
15 pages
Code Source Java
No ratings yet
Code Source Java
4 pages

Distributed DBMS: Announcements

Uploaded by

Distributed DBMS: Announcements

Uploaded by

11/6/18

Instructor: Sudeepa Roy

Where are we now? Reading Material

Parallel and Distributed Data

Introduction: Distributed Databases Distributed Data Independence

• Desired properties • Queries spanning multiple sites should be

Distributed Transaction Atomicity Recent Trends on These Two Properties

• These two properties are in general desirable

DBMS1 DBMS2 DBMS3

Distributed DBMS Architectures

• Three alternative approaches

Client-Server Systems Collaborating Server Systems

CLIENT CLIENT SERVER

• But does not require all db servers to be capable of

• The middleware layer is capable of executing joins and

• Useful when trying to integrate several “legacy systems”

Storing Data in a Distributed DBMS Fragmentation

Replication Distributed Catalog Management

Joins in a Distributed DBMS 1. Fetch As Needed

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages

2. Ship To One Site 3. Semijoin -1/2

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages

3. Semijoin – 2/2 4. Bloomjoin – 1/4

4. Bloomjoin – 2/4 4. Bloomjoin – 3/4

1. At London, compute a bit-vector of some size k:

LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

Duke CS, Fall 2018 500 pages 1000 pages

4. Bloomjoin – 4/4 Distributed Query Optimization

Updating distributed data

Synchronous • w.r.t “updates”, this means transactions should be atomic

• But there are other alternatives too

Updating Distributed Data Synchronous Replication

Cost of Synchronous Replication Asynchronous Replication

Primary Site Replication Peer-to-Peer Replication

Distributed Locking Distributed Deadlock Detection

1. Centralized: One site does all locking SITE A SITE B GLOBAL

Two-Phase Commit (2PC)

2. Subordinate receives the prepare message

3. If coordinator gets unanimous yes votes from

When a transaction wants to commit – 5/5 Comments on 2PC

Blocking Link and Remote Site Failures

Observations on 2PC Other variants of 2PC

• If a subtransaction does no updates, its commit or

You might also like