100% found this document useful (1 vote)
69 views

Distributed DBMS: Announcements

This document contains lecture notes on distributed database management systems (DBMS). It discusses key topics like distributed data independence, distributed transaction atomicity, and types of distributed DBMSs. The lecture notes also provide an overview of parallel and distributed data processing, and announce that homework 3 on NoSQL and MongoDB will be released soon.

Uploaded by

DenfordMachado
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
69 views

Distributed DBMS: Announcements

This document contains lecture notes on distributed database management systems (DBMS). It discusses key topics like distributed data independence, distributed transaction atomicity, and types of distributed DBMSs. The lecture notes also provide an overview of parallel and distributed data processing, and announce that homework 3 on NoSQL and MongoDB will be released soon.

Uploaded by

DenfordMachado
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

11/6/18

Announcements
• HW3 on NOSQL and MongoDB to be released soon
CompSci 516 – Install the system first
Database Systems – Due in two weeks after NOSQL in class
– Keep working on the project in the meantime!

Lecture 18
Distributed DBMS

Instructor: Sudeepa Roy


Duke CS, Fall 2018 CompSci 516: Database Systems 1 Duke CS, Fall 2018 CompSci 516: Database Systems 2

Where are we now? Reading Material


• [RG]
We learnt ü Transactions – Parallel DBMS: Chapter 22.1-22.5
ü Relational Model and ü Basic concepts – Distributed DBMS: Chapter 22.6 – 22.14
Query Languages
ü SQL, RA, RC ü Concurrency control
• [GUW]
ü Postgres (DBMS) ü Recovery – Parallel DBMS and map-reduce: Chapter 20.1-20.2
§ HW1 – Distributed DBMS: Chapter 20.3, 20.4.1-20.4.2, 20.5-20.6
ü Map-reduce and spark Next
§ HW2 • Distributed DBMS • Other recommended readings:
ü DBMS Internals • NOSQL – Chapter 2 (Sections 1,2,3) of Mining of Massive Datasets, by Rajaraman and Ullman:
ü Storage http://i.stanford.edu/~ullman/mmds.html
– Original Google MR paper by Jeff Dean and Sanjay Ghemawat, OSDI’ 04:
ü Indexing • (ARIES protocol of transactions to http://research.google.com/archive/mapreduce.html
ü Query Evaluation be covered later)
ü Operator Algorithms Acknowledgement:
ü External sort The following slides have been created adapting the
ü Query Optimization instructor material of the [RG] book provided by the authors
ü Database Normalization Dr. Ramakrishnan and Dr. Gehrke.
Duke CS, Fall 2018 CompSci 516: Database Systems 3 Duke CS, Fall 2018 CompSci 516: Database Systems 4

Parallel and Distributed Data


Topics in Distributed DBMS
Processing
• So far, query processing on a single machine
– Query Execution and Optimization • Architecture
– Transaction CC and Recovery
• Data Storage
• Now: data and operation distribution
• Query Execution
• Parallelism
– performance • Transactions – updates
– Parallel databases (will be covered soon)
• Recovery – Two Phase Commit (2PC)
• Data distribution
– increased availability, e.g. when a site goes down
– distributed local access to data (e.g. an organization may have branches in
several cities) • Warning! Many concepts and terminology
– analysis of distributed data
– Distributed DBMS (today)

Duke CS, Fall 2018 CompSci 516: Database Systems 5 Duke CS, Fall 2018 CompSci 516: Database Systems 6

1
11/6/18

Introduction: Distributed Databases Distributed Data Independence


• Users should not have to know where data is
located
• Data is stored at several sites, each – no need to know the locations of references
managed by a DBMS that can run relations, their copies or fragments (later)
independently – extends Physical and Logical Data Independence
principles

• Desired properties • Queries spanning multiple sites should be


1. Distributed Data Independence
optimized in a cost-based manner
– taking into account communication costs and
2. Distributed Transaction Atomicity differences in local computation costs

Duke CS, Fall 2018 CompSci 516: Database Systems 7 Duke CS, Fall 2018 CompSci 516: Database Systems 8

Distributed Transaction Atomicity Recent Trends on These Two Properties

• These two properties are in general desirable


1. Users should be able to write transactions • But not always efficiently achievable
accessing multiple sites just like local – e.g. when sites are connected by a slow long-distance network
transactions • Even sometimes not desirable for globally distributed sites
– too much administrative overhead of making location of data
“transparent” (not visible to user)
2. The effects of a transaction across sites • Therefore not always supported
should be atomic – Users have to be aware of where data is located
– all changes persist if transaction commits – Not much consensus on the design objectives on distributed
databases
– none persist if transaction aborts

Duke CS, Fall 2018 CompSci 516: Database Systems 9 Duke CS, Fall 2018 CompSci 516: Database Systems 10

More on Heterogeneous
Types of Distributed Databases Distributed Databases
• Database servers are accessed through well-accepted and
• Homogeneous: standard Gateway protocols
– masks the differences of DBMSs (capability, data format etc.)
– Every site runs same type of DBMS – e.g. ODBC, JDBC
• However, can be expensive and may not be able to hide all
differences
• Heterogeneous: – e.g. when a server is not capable of supporting distributed
transaction management
– Different sites run different DBMSs
– different RDBMSs or even non-relational DBMSs Gateway
– RDBMS = Relational DBMS

DBMS1 DBMS2 DBMS3

Duke CS, Fall 2018 CompSci 516: Database Systems 11 Duke CS, Fall 2018 CompSci 516: Database Systems 12

2
11/6/18

Distributed DBMS Architectures

• Three alternative approaches

1. Client-Server
Distributed DBMS Architecture 2. Collaborating Server
3. Middleware

Duke CS, Fall 2018 CompSci 516: Database Systems 13 Duke CS, Fall 2018 CompSci 516: Database Systems 14

Client-Server Systems Collaborating Server Systems


• One or more client (e.g. personal computer) and one or more server processes
(e.g. a mainframe) • Queries can span multiple sites
– A client process can ship a query to any server process – not allowed in client-servers as the clients would have
– Clients are responsible for user interfaces
– Server manages data and executes queries had to break queries and combine the results
• Advantages • When a server receives a query that requires
– clean separation and centralized server
– expensive server machines are not underutilized by simple user interactions access to data at other servers
– users can run GUI on clients that they are familiar with – it generates appropriate subqueries
• Challenges
– need to carefully handle communication costs – puts the result together
– e.g. fetching tuples one at a time might be bad – need to do caching on client side
• Eliminates distinction between client and server
QUERY

CLIENT CLIENT SERVER


SERVER
SERVER
SERVER SERVER SERVER QUERY
Duke CS, Fall 2018 CompSci 516: Database Systems 15 Duke CS, Fall 2018 CompSci 516: Database Systems 16

Middleware Systems
• Allows a single query to span multiple servers

• But does not require all db servers to be capable of


handling multi-site execution strategies
– need just one db server capable of managing queries and
transactions spanning multiple servers (called middleware) Storing Data in Distributed DBMS
– the remaining servers can handle only the local queries and
transactions

• The middleware layer is capable of executing joins and


other operations on data obtained from other servers, but
typically does not maintain any data

• Useful when trying to integrate several “legacy systems”


– whose basic capabilities cannot be extended
Duke CS, Fall 2018 CompSci 516: Database Systems 17 Duke CS, Fall 2018 CompSci 516: Database Systems 18

3
11/6/18

Storing Data in a Distributed DBMS Fragmentation


• Relations are stored across several sites • Break a relation into smaller relations or fragments
– store them in different sites as needed
• Accessing data at a remote site incurs message- TID
passing costs t1
• To reduce this overhead, a single relation may be t2
t3
partitioned or fragmented across several sites t4
• Horizontal:
– typically at sites where they are most often accessed – Usually disjoint
– Can often be identified by a selection query (employees in a city – locality of
• The data can be replicated as well reference)
– when the relation is in high demand – To retrieve the full relation, need a union

• Vertical:
– Identified by projection queries
– Typically unique TIDs added to each tuple
– TIDs replicated in each fragments
– Ensures that we have a Lossless Join
Duke CS, Fall 2018 CompSci 516: Database Systems 19 Duke CS, Fall 2018 CompSci 516: Database Systems 20

Replication Distributed Catalog Management


• When we store several copies of a relation or relation fragments • Must keep track of how data is fragmented and replicated across sites
– can be replicated at one or more sites – in addition to usual schema, authorization, and statistical information
– e.g. R is fragmented into R1, R2, R3; one copy of R2, R3; but two copies
at R1 at two sites • Must be able to uniquely identify each replica of each fragment
• Advantages – Globally unique name may compromise autonomy of servers
– Gives increased availability – e.g. when a site or communication link goes – To preserve local autonomy: Global relation name = <local-name, birth-
down site>
– Faster query evaluation – e.g. using a local copy – To identify a replica, add a replica-id field (now called global replica
• Synchronous and Asynchronous (later) name)
– Vary in how current different copies are when a relation is modified
• Site Catalog: Describes all objects (fragments, replicas) at a site +
SITE B Keeps track of replicas of relations created at this site
SITE A – To find a relation, look up its birth-site catalog
R1 R3 – Birth-site never changes, even if relation is moved
R1 R2

Duke CS, Fall 2018 CompSci 516: Database Systems 21 Duke CS, Fall 2018 CompSci 516: Database Systems 22

SELECT AVG(S.age)
Non-Join Distributed Queries FROM Sailors S
WHERE S.rating > 3
tid sid sname rating age AND S.rating < 7
T1 4 stored at Shanghai
T2 5
stored at Tokyo
T3 9

Distributed Query Processing • Horizontally Fragmented: Tuples with rating < 5 at Shanghai, >= 5 at Tokyo.
– Must compute SUM(age), COUNT(age) at both sites.
– If WHERE contained just S.rating > 6, just one site
• Vertically Fragmented: sid and rating at Shanghai, sname and age at Tokyo,
No joins tid at both.
Join – Must reconstruct relation by join on tid, then evaluate the query
– if no tid, decomposition would be lossy
• Replicated: Sailors copies at both sites.
– Choice of site based on local costs (e.g. index), shipping costs

Duke CS, Fall 2018 CompSci 516: Database Systems 23 Duke CS, Fall 2018 CompSci 516: Database Systems 24

4
11/6/18

Joins in a Distributed DBMS 1. Fetch As Needed


• Page-oriented Nested Loop Join
• Can be very expensive if relations are stored at
– Sailors as outer – for each S page, fetch all R pages from Paris
different sites
– if cached at London, each R page fetched once
– Otherwise, Cost: 500 d + 500 * 1000 (d+s)
– d is cost to read/write page
1. Fetch as needed
– s is cost to ship page
2. Ship to one site – If query was not submitted at London, must add cost of shipping
3. Semi-join result to query site
– Can also do Index NL at London, fetching matching Reserves tuples to
4. Bloom join London as needed
LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages


Duke CS, Fall 2018 CompSci 516: Database Systems 25 Duke CS, Fall 2018 CompSci 516: Database Systems 26

2. Ship To One Site 3. Semijoin -1/2


• Ship Sailors (S) to Paris
– Cost: 500 (2d + s) + 4500 d • Suppose want to ship R to London and then do join with S at
– For relation S: reading in London, shipping to Paris, and saving it in Paris: 500 London. Instead,
(2d + s)
1. At London, project S onto join columns and ship this to Paris
– Assume Sort-Merge Join with cost 3(M+N), i.e. enough memory
– Here foreign keys, but could be arbitrary join
– Then join cost = 3*(500+1000)d
– If result size is very large, may be better to ship both relations to result site 2. At Paris, join S-projection with R
and then join them – Result is called reduction of Reserves w.r.t. Sailors (only these tuples are
• Not all tuples in S join with a tuple in R needed)
– unnecessary shipping 3. Ship reduction of R to back to London
– solution: Semi-join
4. At London, join S with reduction of R
LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

500 pages 1000 pages 500 pages 1000 pages


Duke CS, Fall 2018 CompSci 516: Database Systems 27 Duke CS, Fall 2018 CompSci 516: Database Systems 28

3. Semijoin – 2/2 4. Bloomjoin – 1/4

• Tradeoff the cost of computing and shipping projection • Similar idea like semi-join
for cost of shipping full R relation
• Suppose want to ship R to London and then do
• Especially useful if there is a selection on Sailors, and
answer desired at London join with S at London (like semijoin)

LONDON PARIS
LONDON PARIS
Sailors (S) Reserves (R)
Sailors (S) Reserves (R)
500 pages 1000 pages
Duke CS, Fall 2018 CompSci 516: Database Systems 29 Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 30

5
11/6/18

4. Bloomjoin – 2/4 4. Bloomjoin – 3/4

1. At London, compute a bit-vector of some size k:


– Hash column values into range 0 to k-1
– If some tuple hashes to p, set bit p to 1 (p from 0 to k-1) 3. Ship “bit-vector-reduced” R to London
– Ship bit-vector to Paris 4. At London, join S with reduced R
2. At Paris, hash each tuple of R similarly
– discard tuples that hash to 0 in S’s bit-vector
– Result is called reduction of R w.r.t S

LONDON PARIS LONDON PARIS

Sailors (S) Reserves (R) Sailors (S) Reserves (R)

Duke CS, Fall 2018 500 pages 1000 pages


CompSci 516: Database Systems 31 Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 32

4. Bloomjoin – 4/4 Distributed Query Optimization


• Cost-based approach
– consider all plans
– pick cheapest
• Bit-vector cheaper to ship, almost as effective
– the size of the reduction of R shipped back can be • Similar to centralized optimization, but have differences
larger. Why? 1. Communication costs must be considered
2. Local site autonomy must be respected
3. New distributed join methods

LONDON PARIS • Query site constructs global plan, with suggested local
plans describing processing at each site
Sailors (S) Reserves (R) – If a site can improve suggested local plan, free to do so
Duke CS, Fall 2018 500 pages 1000 pages
CompSci 516: Database Systems 33 Duke CS, Fall 2018 CompSci 516: Database Systems 34

Updating distributed data


• Classical view says that it should be the same as a
Distributed transactions centralized DBMS from user’s viewpoint and addressed
at implementation level
Updating Distributed Data
• so far, we had this w.r.t. “queries”

Synchronous • w.r.t “updates”, this means transactions should be atomic


Asynchronous regardless of data fragmentation and replication

• But there are other alternatives too

Duke CS, Fall 2018 CompSci 516: Database Systems 35 Duke CS, Fall 2018 CompSci 516: Database Systems 36

6
11/6/18

Updating Distributed Data Synchronous Replication


• Voting: transaction must write a majority of copies to
• Synchronous Replication: All copies of a modified modify an object; must read enough copies to be sure of
relation (or fragment) must be updated before the seeing at least one most recent copy
modifying transaction commits – E.g., 10 copies; 7 written for update; 4 copies read (why 4?)
– Each copy has version number – copy with the highest
– Data distribution is made “transparent” (not visible!) to users version number is current
– Not attractive usually because reads are common
• Asynchronous Replication: Copies of a modified
relation are only periodically updated; different copies • Read-any Write-all: Read any copy, Write all copies
– Writes are slower and reads are faster, relative to Voting
may get out of sync in the meantime
– Most common approach to synchronous replication
– Users must be aware of data distribution – A special case of voting (why?)
– More efficient – many current products follow this approach
• Choice of technique determines which locks to set
Duke CS, Fall 2018 CompSci 516: Database Systems 37 Duke CS, Fall 2018 CompSci 516: Database Systems 38

Cost of Synchronous Replication Asynchronous Replication


• Before an update transaction can commit, it must • Allows modifying transaction to commit before all
obtain locks on all modified copies copies have been changed
– Sends lock requests to remote sites, and while waiting – readers nonetheless look at just one copy
for the response, holds on to other locks – Users must be aware of which copy they are reading,
– If sites or links fail, transaction cannot commit until and that copies may be out-of-sync for short periods
they are back up of time
– Even if there is no failure, committing must follow an
expensive commit protocol with many messages (later) • Two approaches: Primary Site and Peer-to-Peer
replication
• So the alternative of asynchronous replication is – Difference lies in how many copies are “updatable" or
“master copies"
becoming widely used

Duke CS, Fall 2018 CompSci 516: Database Systems 39 Duke CS, Fall 2018 CompSci 516: Database Systems 40

Primary Site Replication Peer-to-Peer Replication


• Exactly one copy of a relation is designated the • More than one of the copies of an object can be a
primary or master copy master
– Replicas at other sites cannot be directly updated
– The primary copy is published • Changes to a master copy must be propagated to
– Other sites subscribe to this relation (or its fragments) other copies somehow
– These are secondary copies • If two master copies are changed in a conflicting
• How are changes to the primary copy propagated to manner, conflict resolution needed
the secondary copies? – e.g., Site 1: Joe’s age changed to 35; Site 2: to 36
– Done in two steps • Best used when conflicts do not arise:
– First, “capture” changes made by committed transactions
– E.g., Each master site owns a disjoint fragment
– Then, “apply” these changes
• more details in the [RG] book (optional reading) – E.g., Updating rights held by one master at a time – then
propagated to other sites
Duke CS, Fall 2018 CompSci 516: Database Systems 41 Duke CS, Fall 2018 CompSci 516: Database Systems 42

7
11/6/18

Distributed Transactions
• Distributed CC
– How can locks for objects stored across several
sites be managed?
Distributed Transactions – How can deadlocks be detected in a distributed
database?
• Distributed Recovery
Distributed CC – When a transaction commits, all its actions, across
Distributed Recovery all the sites at which is executes must persist
– When a transaction aborts, none of its actions
must be allowed to persist

Duke CS, Fall 2018 CompSci 516: Database Systems 43 Duke CS, Fall 2018 CompSci 516: Database Systems 44

Distributed Locking Distributed Deadlock Detection


• How do we manage locks for objects across many sites?
T1 T2 T1 T2 T1 T2

1. Centralized: One site does all locking SITE A SITE B GLOBAL


– Vulnerable to single site failure

2. Primary Copy: All locking for an object done at the • Each site maintains a local waits-for graph
primary copy site for this object
– Reading requires access to locking site as well as site where • A global deadlock might exist even if the local graphs contain no cycles
the object copy is stored
• Further, phantom deadlocks may be created while communicating
3. Fully Distributed: Locking for a copy done at site where – due to delay in propagating local information
the copy is stored – might lead to unnecessary aborts
– Locks at all sites while writing an object (unlike previous two)

Duke CS, Fall 2018 CompSci 516: Database Systems 45 Duke CS, Fall 2018 CompSci 516: Database Systems 46

Three Distributed
Distributed Recovery
Deadlock Detection Approaches
T1 T2 T1 T2 T1 T2 • Two new issues:
SITE A SITE B GLOBAL – New kinds of failure, e.g., links and remote sites
1. Centralized – If “sub-transactions” of a transaction execute at
• send all local graphs to one site periodically different sites, all or none must commit
• A global waits-for graph is generated – Need a commit protocol to achieve this
2. Hierarchical – Most widely used: Two Phase Commit (2PC)
• organize sites into a hierarchy and send local graphs to parent in the
hierarchy
• e.g. sites (every 10 sec)-> sites in a state (every min)-> sites in a • A log is maintained at each site
country (every 10 min) -> global waits for graph
– as in a centralized DBMS
• intuition: more deadlocks are likely across closely related sites
3. Timeout – commit protocol actions are additionally logged
• abort transaction if it waits too long (low overhead)

Duke CS, Fall 2018 CompSci 516: Database Systems 47 Duke CS, Fall 2018 CompSci 516: Database Systems 48

8
11/6/18

Two-Phase Commit (2PC)


• Site at which transaction originates is
coordinator
Two Phase Commit (2PC) • Other sites at which it executes are
subordinates
– w.r.t. coordinarion of this transaction

Example on whiteboard

Duke CS, Fall 2018 CompSci 516: Database Systems 49 Duke CS, Fall 2018 CompSci 516: Database Systems 50

When a transaction wants to commit – 1/5 When a transaction wants to commit – 2/5

2. Subordinate receives the prepare message


1. Coordinator sends prepare message to each a) decides whether to abort or commit its
subordinate subtransaction
b) force-writes an abort or prepare log record
c) then sends a no or yes message to coordinator

Duke CS, Fall 2018 CompSci 516: Database Systems 51 Duke CS, Fall 2018 CompSci 516: Database Systems 52

When a transaction wants to commit – 3/5 When a transaction wants to commit – 4/5

3. If coordinator gets unanimous yes votes from


all subordinates 4. Subordinates force-write abort/commit log
a) it force-writes a commit log record record based on message they get
b) then sends commit message to all subs a) then send ack message to coordinator
b) If commit received, commit the subtransaction
Else (if receives a no message or no response c) write an end record
from some subordinate),
a) it force-writes abort log record
b) then sends abort messages
Duke CS, Fall 2018 CompSci 516: Database Systems 53 Duke CS, Fall 2018 CompSci 516: Database Systems 54

9
11/6/18

When a transaction wants to commit – 5/5 Comments on 2PC


• Two rounds of communication
– first, voting
5. After the coordinator receives ack from all – then, termination
subordinates, – Both initiated by coordinator
– writes end log record • Any site (coordinator or subordinate) can unilaterially decide to
abort a transaction
– but unanimity/consensus needed to commit
Transaction is officially committed when the • Every message reflects a decision by the sender
coordinator’s commit log record reaches the disk – to ensure that this decision survives failures, it is first recorded in the local
log and is force-written to disk
– subsequent failures cannot affect the outcomes • All commit protocol log records for a transaction contain tid and
Coordinator-id
– The coordinator’s abort/commit record also includes ids of all
subordinates.
Duke CS, Fall 2018 CompSci 516: Database Systems 55 Duke CS, Fall 2018 CompSci 516: Database Systems 56

Restart After a Failure at a Site – 1/4 Restart After a Failure at a Site – 2/4
• If we have a commit or abort log record for
transaction T, but not an end record, must
• Recovery process is invoked after a sites comes
redo/undo T respectively
back up after a crash
– If this site is the coordinator for T (from the log
– reads the log and executes the commit protocol record), keep sending commit/abort messages to subs
– the coordinator or a subordinate may have a crash until acks received
– one site can be the coordinator some transaction and – then write an end log record for T
subordinates for others

Duke CS, Fall 2018 CompSci 516: Database Systems 57 Duke CS, Fall 2018 CompSci 516: Database Systems 58

Restart After a Failure at a Site – 3/4 Restart After a Failure at a Site – 4/4
• If we have a prepare log record for transaction T, • If we don’t have even a prepare log record for T
but not commit/abort – T was not voted to commit before crash
– This site is a subordinate for T – unilaterally abort and undo T
– Repeatedly contact the coordinator to find status of T – write an end record
– Then write commit/abort log record • No way to determine if this site is the coordinator
or subordinate
– Redo/undo T
– If this site is the coordinator, it might have sent
– and write end log record prepare messages
– then, subs may send yes/no message – coordinator is
detected – ask subordinates to abort

Duke CS, Fall 2018 CompSci 516: Database Systems 59 Duke CS, Fall 2018 CompSci 516: Database Systems 60

10
11/6/18

Blocking Link and Remote Site Failures


• If coordinator for transaction T fails, subordinates • If a remote site does not respond during the
who have voted yes cannot decide whether to commit protocol for transaction T, either because
commit or abort T until coordinator recovers. the site failed or the link failed:
– T is blocked – If the current site is the coordinator for T, should abort T
– Even if all subordinates know each other (extra – If the current site is a subordinate, and has not yet voted
overhead in prepare message) they are blocked unless yes, it should abort T
one of them voted no – If the current site is a subordinate and has voted yes, it is
• Note: even if all subs vote yes, the coordinator blocked until the coordinator responds
then can give a no vote, and decide later to – needs to periodically contact the coordinator until
abort! receives a reply

Duke CS, Fall 2018 CompSci 516: Database Systems 61 Duke CS, Fall 2018 CompSci 516: Database Systems 62

Observations on 2PC Other variants of 2PC


• 2PC with presumed abort
• Ack messages used to let coordinator know when it – When coordinator aborts T, it undoes T and removes it from the
can “forget” a transaction; until it receives all acks, it transaction Table immediately (presumes abort). Doesn’t wait for acks
must keep T in the transaction Table
• 3PC
– prepare->precommit -> commit
• If coordinator fails after sending prepare messages
but before writing commit/abort log records, when it • Not covered in class
recovers, it aborts the transaction – discussed in the book

• If a subtransaction does no updates, its commit or


abort status is irrelevant

Duke CS, Fall 2018 CompSci 516: Database Systems 63 Duke CS, Fall 2018 CompSci 516: Database Systems 64

11

You might also like