Unit 1
Unit 1
ntrol Recovery.
1.1 DISTRIBUTED DATABASES VS CONVENTIONAL DATABASES mimics organisational structure with data local access and autonomy without exclusion cheaper to create and easier to expand improved availability/reliability/performance by removing reliance on a central site Reduced communication overhead Most data access is local, less expensive and performs better Improved processing power Many machines handling the database rather than a single server more complex to implement more costly to maintain security and integrity control standards and experience are lacking Design issues are more complex
1.2 DISTRIBUTED DATABASES ARCHITECTURE Defines the structure of the system o components identified o functions of each component defined o interrelationships and interactions between components defined What is a distributed database? A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.
DDBS = DB + Communication non-centralised DDBMS Motivated by need to integrate operational data and to provide controlled access manages the Distributed database makes the distribution transparent to the user Centralized DBMS on a Network
Site 4
Site 3
Site 4
Implicit Assumptions
Site 3
processor. Processors at different sites are interconnected by a computer network no multiprocessors o parallel database systems Distributed database is a database, not a collection of files data logically related as exhibited in the users access patterns o relational data model D-DBMS is a full-fledged DBMS o not remote file system, not a TP system Dimensions of the Problem Distribution o Whether the components of the system are located on the same machine or not Heterogeneity o Various levels (hardware, communications, operating system)
o DBMS important one data model, query language,transaction management algorithms Autonomy o Not well understood and most troublesome o Various versions Design autonomy: Ability of a component DBMS to decide on issues related to its own design. Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs. Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.
Issues of a DDBMS
Data Allocation Where to locate data and whether to replicate? Data Fragmentation Partition the database Distributed catalog management Distributed transactions Distributed Queries Making all of the above transparent to the user is the key of DDBMSs Replication If a site (or network path) fails, the data held there is unavailable Consider replication (duplication) of data to improve availability No replication: Disjoint fragments Partial replication: Site dependent Full replication: Every site has copy of all data slows down update for consistency expensive
Advantages of distributed databases Capacity and incremental growth Increase reliability and availability Modularity Reduced communication overhead Protection of valuable data Efficiency and Flexibility
Disadvantages of distributed databases DDB design more complex, fragmentation & replication; extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Economics, Concurrency control, Inexperience, Security, Difficult to maintain integrity Applications
Manufacturing - especially multi-plant manufacturing Military command and control Electronic fund transfers and electronic trading Corporate MIS Airline restrictions Hotel chains Any organization which has a decentralized organization structure
1.3 FRAGMENTATION Distribution Design Issues Why fragment at all? How to fragment? How much to fragment? How to test correctness? How to allocate? Information requirements? Can't we just distribute relations? What is a reasonable unit of distribution? o relation views are subsets of relations locality extra communication o fragments of relations (sub-relations) concurrent execution of a number of transactions that access different portions of a relation
views that cannot be defined on a single fragment will require extra processing semantic data control (especially integrity enforcement) more difficult Types of Fragmentation Horizontal Fragmentation (HF) splitting the database by rows e.g. A-J in site 1, K-S in site 2 and T-Z in site 3 o Primary Horizontal Fragmentation (PHF) o Derived Horizontal Fragmentation (DHF) Vertical Fragmentation (VF) Splitting database by columns/fields e.g. columns/fields 1-3 in site A, 4-6 in site B Take the primary key to all sites Hybrid Fragmentation (HF) -Horizontal and vertical could even be combined
PHF Example
Find the name and budget of projects given their no. Issued at three sites Access project information according to budget one site accesses 200000 other accesses >200000 o Simple predicates o For application (1) p1 : LOC = Montreal p2 : LOC = New York p3 : LOC = Paris o For application (2) p4 : BUDGET 200000 p5 : BUDGET > 200000 o Pr = Pr' = {p1,p2,p3,p4,p5} o Minterm fragments left after elimination m1 : (LOC = Montreal) (BUDGET 200000) m2 : (LOC = Montreal) (BUDGET > 200000) m3 : (LOC = New York) (BUDGET 200000) m4 : (LOC = New York) (BUDGET > 200000) m5 : (LOC = Paris) (BUDGET 200000) m6 : (LOC = Paris) (BUDGET > 200000) PHF Correctness Completeness o Since Pr' is complete and minimal, the selection predicates are complete Reconstruction o If relation R is fragmented into FR = {R1,R2,,Rr} R = "Ri FR Ri Disjointness o Minterm predicates that form the basis of fragmentation should be mutually exclusive.
Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are defined as Ri = R F Si, 1iw where w is the maximum number of fragments that will be defined on R and Si = sFi (S) where Fi is the formula according to which the primary horizontal fragment Si is defined.
which is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join attribute between R and S. Then, for each tuple t of R, there should be a tuple t' of S such that t[A]=t'[A] Reconstruction o Same as primary horizontal fragmentation. Disjointness o Simple join graphs between the owner and the member fragments.
Vertical Fragmentation
Has been studied within the centralized context o design methodology o physical clustering More difficult than horizontal, because more alternatives exist. o Two approaches : o grouping attributes to fragments o splitting relation to fragments Overlapping fragments o grouping Non-overlapping fragments o splitting We do not consider the replicated key attributes to be overlapping. Advantage: Easier to enforce functional dependencies (for integrity checking etc.) VF Information Requirements Application Information o Attribute affinities a measure that indicates how closely related the attributes are This is obtained from more primitive usage data o Attribute usage values Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1, A2,, An], use(qi,Aj) = 1 if attribute Aj is referenced by query qi 0 otherwise
Two problems :
Cluster forming in the middle of the CA matrix
o Shift a row up and a column left and apply the algorithm to find the best partitioning point o Do this for all possible shifts o Cost O(m2) More than two clusters o m-way partitioning o try 1, 2, , m1 split points along diagonal and try to find the best point for each of these o Cost O(2m) VF Correctness A relation R, defined over attribute set A and key K, generates the vertical partitioning FR = {R1, R2, , Rr}. n Completeness The following should be true for A: A = ARi n Reconstruction Reconstruction can be achieved by R=K Ri "Ri FR n Disjointness TID's are not considered to be overlapping since they are maintained by the system Duplicated keys are not considered to be overlapping
using client-server architecture user creates query client parses and sends to server(s) (SQL?) servers return appropriate Tables client combines into one Table Issue of data transfer cost over a network o optimise the query to transfer the least amount
Query Processing Components Query language that is used o SQL: intergalactic dataspeak Query execution methodology o The steps that one goes through in executing high-level (declarative) user queries. Query optimization o How do we determine the best execution plan?
Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks o communication cost will dominate low bandwidth low speed high protocol overhead o most algorithms ignore all other cost components Local area networks o communication cost not that dominant o total cost function should be considered Can also maximize throughput Query Optimization Issues Types of Optimizers Exhaustive search o cost-based o optimal o combinatorial complexity in the number of relations Heuristics o not optimal o regroup common sub-expressions o perform selection, projection first o replace a join by a series of semijoins
o reorder operations to reduce intermediate relation size o optimize individual operations Optimization Granularity Single query at a time o cannot use common intermediate results Multiple queries at a time o efficient if many similar queries o decision space is much larger Optimization Timing Static o compilation optimize prior to the execution o difficult to estimate the size of the intermediate results error propagation o can amortize over many executions o R* Dynamic o run time optimization o exact information on the intermediate relation sizes o have to reoptimize for multiple executions o Distributed INGRES Hybrid o compile using a static algorithm o if the error in estimate sizes > threshold, reoptimize at run time o MERMAID Statistics Relation o cardinality o size of a tuple o fraction of tuples participating in a join with another relation Attribute o cardinality of domain o actual number of distinct values Common assumptions o independence between different attribute values o uniform distribution of attribute values within their domain Decision Sites Centralized o single site determines the best schedule o simple
o need knowledge about the entire distributed database Distributed o cooperation among sites to determine the schedule o need only local information o cost of cooperation Hybrid o one site determines the global schedule each site optimizes the local subqueries Network Topology Wide area networks (WAN) point-to-point o characteristics low bandwidth low speed high protocol overhead o communication cost will dominate; ignore all other cost factors o global schedule to minimize communication cost o local schedules according to centralized query optimization Local area networks (LAN) o communication cost not that dominant o total cost function should be considered o broadcasting can be exploited (joins) o special algorithms exist for star networks
Normalization o manipulate query quantifiers and qualification Analysis o detect and reject incorrect queries o possible for only a subset of relational calculus Simplification o eliminate redundant predicates Restructuring o calculus query algebraic query o more than one translation is possible o use transformation rules Step 2 Data Localization Input: Algebraic query on distributed relations Determine which fragments are involved Localization program o substitute for each global query its materialization program o optimize Step 3 Global Query Optimization Input: Fragment query Find the best (not necessarily optimal) global schedule o Minimize a cost function o Distributed join processing Bushy vs. linear trees Which relation to ship where? Ship-whole vs ship-as-needed o Decide on the use of semijoins Semijoin saves on communication at the expense of more local processing. o Join methods nested loop vs ordered joins (merge join or hash join) Centralized Query Optimization INGRES o dynamic o interpretive System R o static o exhaustive search
12 Rules of DDBMS (Date, 1987) 1. Local autonomy 2. No reliance on a central site 3. Continuous operation 4. Location independence 5. Fragmentation independence 6. Replication independence 7. Distributed Query processing 8. Distributed transaction processing 9. Hardware independence 10.Operating System independence 11.Network independence
12.Database independence 1.5 TRANSACTION PROCESSING Transaction A transaction is a collection of actions that make consistent transformations of system states while preserving system consistency. concurrency transparency failure transparency Database in a consistent state Database may be temporarily in an inconsistent state during execution Database in a consistent state
Global Site 2: T1 waiting for T2 WaitFor Graph But if each site only sees its bit, deadlock not apparent T2
Begin Transaction Execution of Transaction
End Transaction
1.6 CONCURRENCY CONTROL The problem of synchronizing concurrent transactions such that the consistency of the database is maintained while, at the same time, maximum degree of concurrency is achieved. Anomalies: o Lost updates The effects of some transactions are not reflected on the database. o Inconsistent retrievals A transaction, if it reads the same data item more than once, should always read the same value. Extends centralised concurrency mechanisms Multiple copies of data items o maintain consistency failures in individual sites/network o continue operations, update and rejoin distributed commit o 2-phase protocol (local and global) distributed deadlock Global serialisation must occur o i.e. serialise local serialisations! o Locks and timestamping apply If database not replicated and transactions all local or performable at one remote site then:
o Use centralised concurrency mechanisms Otherwise mechanisms need to be extended o To deal with replication or transactions involving multiple sites Need to consider deadlocks at local and global levels Distributed Locks Just like centralised mechanisms. But we need to consider locks that manage replication and sub-transactions Four modes of management possible: o Centralised 2PL Read any copy, update all for updates Single site, bottleneck, failure? o Primary Copy 2PL Distributes locks, one copy designated primary, others slaves Only primary copy locked for updates, slaves updated later o Distributed 2PL Each site manages its own data locks All copies locked for an update, high cost of comms o Majority Locking Diagrammatic representation
D=Data item (PC=Primary Copy, only for Primary copy 2PL) Site 1 D1 D2 Site 3 Site 2 D2 (PC) D3 D1 (PC) D2 D3 (PC)
Centralised: e.g. Site 1 is the only Lock Manager Primary Copy: e.g. Site 1 handles locks on D1/D3 o Site 3 handles locks on D2 o remember the site does NOT have to hold the PC Distributed: All sites lock own data (lock all copies for writing)
Majority Locking Extension of distributed 2PL Doesnt lock all copies before update Needs more than half of locks on a copy to proceed If so, it informs other sites Otherwise it cancels request Only one transaction with an exclusive lock Many transactions can hold a majority lock on a shared lock Deadlock
T1
Example
Locally:
Maybe Deadlock?
Text T3 T1 T2 Text
Site 2 sends WFG to site 3, site 3 combines WFG to
Text T3 T1 T2 T3 Text
Definitely Deadlock!
Distributed Reliability Protocols Commit protocols o How to execute commit command for distributed transactions. o Issue: how to ensure atomicity and durability? Termination protocols o If a failure occurs, how can the remaining operational sites deal with it. o Non-blocking : the occurrence of failures should not force the sites to wait until the failure is repaired to terminate the transaction. Recovery protocols o When a failure occurs, how do the sites where the failure occurred deal with it. o Independent : a failed site can determine the outcome of a transaction without having to obtain remote information. Independent recovery non-blocking termination 1.7 RECOVERY.
S1 S2 S3
S4
S5
Recovery after failure? Distributed recovery maintains atomicity and durability What happens then? Abort transactions affected by the failure
Including all subtransactions Flag the site as failed Check for recovery or wait for message to confirm On restart, abort partial transactions which were active at the time of the failure Perform local recovery Update copy of database to be consistent with remainder of the system
Rcovery Protocol Protocols at failed site to complete all transactions outstanding at the time of failures. Classes of failures 1. Site failure 2. Lost messages 3. Network partitioning 4. Byzantine failures Effects of failures 1. Inconsistent database 2. Transaction processing is blocked 3. Failed component unavailable Independent Recovery A recovering site makes a transition directly to a final state without communicating with other sites. Lemma For a protocol, if a local states concurrency set contains both an abort and commit, it is not resilient to an arbitrary failure of a single site. Si commit because other sites may be in abort Si abort because other sites may be in commit Rule 1: S: Intermediate state If C(s) contains a commit failure transition from S to commit Otherwise failure transition from S to abort