0% found this document useful (0 votes)
18 views

Query Decomposition and Data Localization

Uploaded by

786 Gaming Zone
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
18 views

Query Decomposition and Data Localization

Uploaded by

786 Gaming Zone
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 26
Chapter 6: Query Decomposition and Data Localization © Query Decomposition Data Localization Acknowledgements: | am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1 Query Decomposition © Query decomposition: Mapping of calcu- lus query (SQL) to algebra operations (select, project, join, rename) Both input and output queries refer to global re- lations, without knowledge of the distribution of data. © The output query is semantically correct and good in the sense that redundant work avoided. © Query decomposistion consists of 4 steps: 1. Normalization: Transform query to a normalized form 2. Analysi calculus 3. Elimination of redundancy: Eliminate redundant predicates 4. Rewriting: Transform query to RA and optimize query Detect and reject “incorrect” queries; possible only for a subset of relational DDB 2008/09 J.Gamper Page 2 Query Decomposition - Normalization © Norm Consists mainly of two steps. tion: Transform the query to a normalized form to facilitate further processing. 1. Lexical and syntactic analysis — Check validity (similar to compilers) — Check for attributes and relations — Type checking on the qualification 2. Put into normal form — With SQL, the query qualification (WHERE clause) is the most difficult part as it might be an arbitrary complex predicate preceeded by quantifiers (3, V) Conjunctive normal form (pi V Piz V+ V Pin) A+++ A (Pmt V Pm2 V +++ V Pn) Disjunctive normal form (pir A pie A+++ A Pin) V ++ ¥ (Prt A m2 A+++ A Pn In the disjunctive normal form, the query can be processed as independent conjunctive subqueries linked by unions (corresponding to the disjunction) DDB 2008/09 J.Gamper Page 3 Query Decomposition — Normalization ... ¢ Example: Consider the following query: Find the names of employees who have been working on project P1 for 12 or 24 months? © The query in SQL: SELECT ENAME FROM EMP, ASG WHERE EMP.ENO = ASG.ENO AND ASG.PNO = ‘‘P1'’ aND DUR = 12 oR DUR = 24 © The qualification in conjunctive normal form: EMP.ENO = ASG.ENO A ASG.PNO =" PV" \(DUR = 12V DUR = 24) © The qualification in disjunctive normal form: NO = ASG.ENO A ASG.PNO =" P1” \ DUR = 12) V (O = ASG.ENO \ ASG.PNO =" P1" \ DUR = 24) DDB 2008/09 J. Gamper Page 4 Query Decomposition - Analysis © Analysis: Identify and reject type incorrect or semantically incorrect queries © Type incorrect — Checks whether the attributes and relation names of a query are defined in the global schema = Checks whether the operations on attributes do not conflict with the types of the attributes, e.g., a comparison > operation with an attribute of type string © Semantically incorrect — Checks whether the components contribute in any way to the generation of the result — Only a subset of relational calculus queries can be tested for correctness, i.e., those that do not contain disjunction and negation — Typical data structures used to detect the semantically incorrect queries are: * Connection graph (query graph) x Join graph DDB 2008/09 J. Gamper Page 5 Query Decomposition - Analysi © Example: Consider a query SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND PNAME = "CAD/CAM" AND DUR > 36 AND TITLE = "Programmer" Query/connection graph — Nodes represent operand or result relation — Edge represents a join if both connected nodes represent an operand relation, oth- emise it is a projection © Join graph — a subgraph of the query graph that consid- ers only the joins DDB 2008/09 J. Gamper Query graph PNAME=CADICAM" Join graph EMP.ENO=ASG. ew ASE Dre PNO=PROJ.PNO © Since the query graph is connected, the query is semantically correct Page 6 Query Decomposition - Analysis ... © Example: Consider the following query and its query graph: SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND PNAME = "CAD/CAM" AND DUR > 36 AND TITLE = "Programmer" Since the graph is not connected, the query is semantically incorrect. © 3 possible solutions: — Reject the query — Assume an implicit Cartesian Product between ASG and PROJ — Infer from the schema the missing join predicate ASG.PNO = PROJ.PNO DDB 2008/09 J. Gamper Page 7 Query Decomposition - Elimination of Redundancy Elimination of redundancy: Simplify the query by eliminate redundancies, e.g., redundant predicates — Redundancies are often due to semantic integrity constraints expressed in the query language — @g., queries on views are expanded into queries on relations that satiesfy certain integrity and security constraints Transformation rules are used, €.g., -p\p = p -pVp =p -pAtrue => p - pV false => p -pAfalse <> false -pVtrue => true -pA-p = false -pV-p <= true - pi A (pV p2) => Pr -—mV (pi Ap) = pr DDB 2008/09 J. Gamper Page 8 Query Decomposition - Elimination of Redundancy ... © Example: Consider the following query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" OR (NoT(EMP.TITLE = "Programmer" ) AND ( EMP.TITLE = "Elect. Eng." OR EMP.TITLE Programmer" ) AND NOT(EMP.TITLE = "Elect. Eng.")) © Let p be ENAME = "J. Doe”, 2 be TITLE = "Programmer” and p3 be TITLE = "Elect. Eng.” © Then the qualification can be written as p; V (—p2 A (p2 V ps) A ap3) and then be transformed into p1 © Simplified query: SELECT TITLE FROM EMP WHERE EMP.ENAME = "J. Doe" DDB 2008/09 J. Gamper Page 9 Query Decomposition - Rewriting efficient expression. Example: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years. @ SELECT ENAME FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND ENAME #4 "J. Doe” AND PNAME "CAD/CAM" AND (DUR = 12 OR DUR = 24) A query tree represents the RA-expression — Relations are leaves (FROM clause) — Result attributes are root (SELECT clause) — Intermediate leaves should give a result from the leaves to the root DDB 2008/09 J. Gamper "ape Oouret2 f DuR=24 Spnamescaoicam’ cone Doe PAtpno PReno PROJ ASG Rewriting: Convert relational calculus query to relational algebra query and find an } Project Select Join EMe. Page 10 Query Decomposition - Rewriting ... By applying transformation rules, many different trees/expressions may be found that are equivalent to the original tree/expression, but might be more efficient. In the following we assume relations R(A;,..., An), S(Bi,..., Bn), and T which is union-compatible to R. ‘© Commutativity of binary operations -RxS=SxR -RxS=SmHR -RUS=SUR Associativity of binary operations -(Rx$8)xT=Rx(SxT) ~ (Ru S)¢T=Rm ($m T) Idempotence of unary operations ~ Ha(la(R)) = Wa(R) = Frat) (%p2(42)(R)) = Fp1arjrp2(.a2)(R) DDB 2008/09 J. Gamper Page 11 Query Decomposition - Rewriting ... ‘© Commuting selection with binary operations = oy4(RX 8) + oy4)(R) x S = OA) (R ™Mp(A2,B2) 3) > Op(Ar)(R) ™p(A,Ba) 5 = Oy4)(RUT) => op4)(R)U oy )(T) « (A belongs to R and T) © Commuting projection with binary operations (assume C' = A’ U B’, A'C A,B’ C B) ~WolRx 8) <> Usk) x Ue (8) - Me(R Myay.e) S$) => Ta(R) Mpar.p7 Tr(S) -TIc(RUS) => Io(R) Ul¢(S) DDB 2008/09 J. Gamper Page 12 Query Decomposition - Rewriting ... © Example: Two equivalent query trees for the previous example — Recall the schemas: EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET) ASG(ENO, PNO, RESP, DUR) Tleyswe } Project Tlenawe Sonne ouR=24 | (pname=cADICAM” A(DUR=12-v DUR=24) A ENAME#, DOE ! Opnamexcadicaw’ Select Senamegs 00€' Pdteno PROJ ASG DDB 2008/09 J. Gamper Page 13 Query Decomposition - Rewriting ... © Example (contd.): Another equivalent query tree, which allows a more efficient query evaluation, since the most selective operations are applied first. Teno SpNAME="CADICAM™ | PROJ DDB 2008/09 ” ee (EMP) * ASG = opno<* ey (ASG) DDB 2008/09 J. Gamper Page 18 Data Localizations Issues © Various more advanced reduction techniques are possible to generate simpler and optimized queries. © Reduction of horizontal fragmentation (HF) — Reduction with selection — Reduction with join © Reduction of vertical fragmentation (VF) — Find empty relations DDB 2008/09 J. Gamper Page 19 Reduction with selection for HF - Consider relation R with horizontal fragmentation F = { Ry, Ro. Data Localizations Issues - Reduction of HF Ri = o,(R) — Rulet: Selections on fragments, 7), (/2;), that have a qualification contradicting the qualification of the fragmentation generate empty relations, i.e., Rx}, where op, (Ri) =0 => Vx € R(pi(a) A pj(x) = false) — Can be applied if fragmentation predicate is inconsistent with the query selection predicate. Example: Consider the query: SELECT * FROM EMP WHERE ENO="E5" DDB 2008/09 EMP, EMP, EMP, After commuting the selec- tion with the union operation, it is easy to detect that the selection predicate contra- dicts the predicates of EMP} and EMPs, J. Gamper Page 20 Data Localizations Issues - Reduction for HF ... Reduction with join for HF — Joins on horizontally fragmented relations can be simplified when the joined relations are fragmented according to the join attributes. — Distribute join over union (Ri UR) oS <> (Rw S)U(R2 ™ S) ~ Rule 2: Useless joins of fragments, Ri = op,(R) and Rj = op,(R), canbe determined when the qualifications of the joined fragments are contradicting, i.e., Rx Ry =0 => Va € Ri, Vy € Rj(pi(x) A p(y) = false) DDB 2008/09 Page 21 Data Localizations Issues - Reduction for HF ... © Example: Consider the following query and fragmentation: — Query: SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO — Horizontal fragmentation: * EMP1 = cenos"es' (EMP) ov es * ne (EMP) + ASG] = = Generic query EMP, EMP, — The query reduced by distribut- ing joins over unions and apply- ing rule 2 can be implemented as a union of three partial joins that can be done in parallel. Ln en, Rec, ene, «ASG, en, Re, DDB 2008/09 J. Gamper Page 22 Data Localizations Issues - Reduction for HF ... Reduction with join for derived HF — The horizontal fragmentation of one relation is derived from the horizontal fragmentation of another relation by using semi ¢ If the fragmentation is not on the same predicate as the join (as in the previous example), derived horizontal fragmentation can be applied in order to make efficient join processing possible. © Example: Assume the following query and fragmentation of the EMP relation: — Query: SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO — Fragmentation (not on the join attribute) * EMP1 = OT|TLE="Prgrammer"(EMP) * EMP2 = OTITLE2Prgrammer"(EMP) ~ To achieve efficient joins ASG can be fragmented as follows: * ASG1= ASGD< pNoEMP1 * ASG2= ASGD< pyoEMP2 — The fragmentation of ASG is derived from the fragmentation of EMP — Queries on derived fragments can be reduced, e.g., ASG, » EMP, = 0 DDB 2008/09 J. Gamper Page 23 Data Localizations Issues - Reduction for VF Reduction for Vertical Fragmentation — Recall, VF distributes a relation based on projection, and the reconstruction operator is the join. — Similar to HF, it is possible to identify useless intermediate relations, i.e., fragments that do not contribute to the result. — Assume a relation R(A) with A = {Aj,..., An}, which is vertically fragmented as R; =74)(R), where Al C A. — Rule 3: 7p, (Rj) is useless if the set of projection attributes D is not in Ai and KC is the key attribute. — Note that the result is not empty, but it is useless, as it contains only the key attribute. DDB 2008/09 J. Gamper Page 24 Data Localizations Issues - Reduction for VF ... © Example: Consider the following query and vertical fragmentation: - Query: SELECT ENAME FROM EMP — Fragmentation: * EMP1= Ueno,ewame(EMP) + EMP2 = Uenoririe(EMP) © Generic query EMP, EMP, © Reduced query — By commuting the projection with the join (i.e., pro- Tlesane jecting on ENO, ENAME), we can see that the pro- jection on EMP is useless because ENAME is not in EMP2. EMP, DDB 2008/09 J. Gamper Page 25 Conclusion ‘© Query decomposition and data localization maps calculus query into algebra operations and applies data distribution information to the algebra operations. © Query decomposition consists of normalization, analysis, elimination of redundancy, and rewriting. © Data localization reduces horizontal fragmentation with join and selection, and vertical fragmentation with joins, and aims to find empty relations. DDB 2008/09 J. Gamper Page 26

You might also like