0% found this document useful (0 votes)
2 views

3 Query Processing and Optimization-1

This document discusses query processing and optimization in databases, detailing the steps involved such as parsing, optimization, and evaluation. It emphasizes the importance of energy efficiency and cost measurement in query execution, including factors like disk access and CPU time. Additionally, it covers various selection operations and sorting techniques, particularly focusing on external sort-merge algorithms for handling large datasets.

Uploaded by

neupanepratik1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
2 views

3 Query Processing and Optimization-1

This document discusses query processing and optimization in databases, detailing the steps involved such as parsing, optimization, and evaluation. It emphasizes the importance of energy efficiency and cost measurement in query execution, including factors like disk access and CPU time. Additionally, it covers various selection operations and sorting techniques, particularly focusing on external sort-merge algorithms for handling large datasets.

Uploaded by

neupanepratik1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
QUERY PROCESSING AND OPTIMIZATION 6 II \fter comprehensive study of this chapter, you will bo able to: % Concept of Query Processing Query Trees and Heuristics for Query Optimization * Choice of Query Execution Plans * Cost-Based Optimization. TIO —_ Advanced Database OVERVIEW OF QUERY PROCESSING Energy effcieny is an important feature in designing and executing databases. The in query processing are to transform a query written in a high-level language, typically soy, if correct and efficient execution strategy expressed in a low-level language (implementing relational algebra), and to execute tho strategy to retrieve the required data, Thus, gy Processing is the activities involved in parsing, validating, optimizing, and executing » ' = . a ‘The steps involved in processing a query processing is shown in figure 8.1 and they 7 are: 1. Parsing and translation 2 Optimization 3. Evaluation ‘Query in high-level language Query Optimizer ‘Query Evaluation Engine Query Output Figure 3.1: Steps in query processing \\ ETI Query Prcessingand Optimization TH \ parsing and Transtat ne he Query \ phe main work of a query proc; ‘e880r is Lo convert a q * ye query submitted by the user, Into A fort converts the search string into definite instruc Mery string into ‘query objects i.e., conver lunderstood by the query processing engine. It h i ‘ Hone. The query parser must analyze the query language ies recognizing and interpreting operntors (AND, OR, NOT, +, - ete,), placing the \ operators into Rroups ete. The basic job of the 8 (.f., keywords, operators, operands, literal strings ete data elements (i. relational algebra operations and open query graph), Parser also verifies the validity and gyntay o ting \ Parser is to extract the token ) into their corresponding internal ands) and structures (ie., query treo, F the query string. \ Optimizing the Query “In this stage, Query optimizer t long with the implementation methods to beemployed for each relational operator. | Beample 3.1: Consider the following SQL. query respectively: SELECT Stu_name, Stu_address FROM Student WHERE age < 25; This query can be translated into either of the following relational-algebra expressions: © Gree 525 (Tsu. same, stu sdden(Student)) Tsu same Si address (Gages (Student)) This can be represented as either of the following query trees: Gage 25 TU sans stato Tsu pane, st ates , a" Student Student Figure 3.2: Query Tree hs ~ . is then transforme After parsing and translation into a relational algebra expression, the query eisietia eg into a form, usually a query tree or query graph, that can be handled by the opt | "| 2 Advanced Database ‘The optimization engine then performs various analyses on the query data, generatin, i 1 8 Number f valid evaluation plans. From there, it determines the most appropriate of e 3 evaluation plan execute. ‘After the evaluation plan has boon selected, itis passed into the DMB query-oxccutionengig (ato refered tone the runtime database processor), where the plan is executed and the rn are returned, MEASURING OF QueRY Cost Cost of query is the time taken by the query to hit the database and return the result. ft invo ves query optimize it, evaluate, execute and return the rsul rocessing time i, time taken fo parse and transiate the query, optimize it, eva 4 il oe teri called cost of the query, Executing the optimized query involves hitting the primary and secant emory base onthe file organization method, Depending on file organization and the indexes used, mean), tovev the data may vary Query cst concise nub of diferent esau ta eae * The number of disk accesses / the number of disk block transfers / the size of the table + Time taken by CPU for executing the query ‘The time taken by CPU is negligible in most systems when compared with the number fdisk accesses, If we consider the number of block transfers as the main component in calculating te cost of a query, it would include more sub-components. Those are: © Rotational latency: of the disk. * Seek time: time taken to bring and spin the required data under the read-write head {ime taken to position the read-write head over the required track or cylinder. * Sequential YO: reading data that are stored in contiguous blocks of the disk Random W/O; reading data that are stored in different blocks that are not contiguous. For simplicity we just use the number of block tr: seeks as the cost measures of a query-evaluation plan, fetch a record and there are b bloc calculated as below Query Cost =b x tr+S x ts ‘ansfers from disk and the number of - Suppose a query need to seek S times to ‘ks needs to be returned to the user. The disk U0 cast i Where, © b-block transfer © S-seeks * tr-time to transfer one block + ts~time for one seek The values of tr and ts must be calibrated for the disk system used, if tr=0.1 ms, ts =4 ms th block size is 4 KB, and its transfor rate is 40 MB per second. With this, we can easily caleult® the estimated cost of the given query evaluation plan, an ‘the wes Generally, for estimating the cost, we consider the worst ease that could happen. The ies assume that initially, the data is rend from the disk only. But there must be a chance ta information is already present in the main memory. However, the users usually ign effect, and due to this, the actual cost of execution comes out less than the estimated valu. yr ETO Query Processing and Optimization — 179 nse time, ie., the timo rev spo ot attired to execu of the query evaluation plan, But due to the | a seule the response time without netually executing : i plan, could bo used for estimating the ollowing reasons, it becomes difficult to Mt the query evaluati ‘The response time depends on the contonte of peeled this information is not available when the gi even if it were available, “ tho buffer whon the query begins execution: ery in of is 'Y is optimized, and is hard to account for In a aystem with multiple disks, the . ; n response time depend: distributed among disks, which is hard to esti ithout detailed roeecee na ise ward to estimate without detailed knowledge of data SELECTION OPERATION (0) ee Queries are ultimately reduced to a number of file scan operations on the underlying physical file structures. For each relational operation there can exist several different acca ate Ze ~ particular records needed, The query execution engine can have a multitude of specialized algorithms designed to process particular relational operation and access path combinations. Selections Using File Scans File scans are search algorithms that locate and retrieve records that fulfill a selection condition. The Select operation must search through the data files for records meeting the selection criteria. The following are some ways of simple (one attribute) selection algorithms: + Al @inear search): Retrieve every record in the file, and test whether its attribute values satisfy the selection condition. Worst Case Costs = b, x tr + ts. Where, br is the number of blocks containing records from relation r. Ifa selection is on a key attribute, can stop on finding record © Average Cost = (b1/2) x tr + ts. Linear search is slow, but it is general ordering of the file, or the availability of in operation. * — A2 (binary search): If the selection condition i ane attribute on which the file is ordered, binary search (which is search) ean be used. © Worst Case Costs =[logs(b)] x (tr + ts) 1 because it can be applied regardless of the dices, or the nature of the selection nyolves an equality comparison on & key more efficient than tinear Selections Using Indices n index sean and the index structure search algorithm that makes use of an index is called a1 is called access path. * — A8 (primary index, equality © with a primary index, we can use corresponding equality condition. © Cost = (ut 1) x (tr + t9) where hi comparison on a key attribute n Icey): For an equality compar at satisfies the the index to retrieve a single is the height of the index. 1m ‘Advanced Database . Ad (primary index, equality on non-key): For an equality comparison on no attribute with a primary index, we can use the index to retrieve multiple recon io spread over b successive blocks) that satisfy the corresponding equality condition, "¥ ° Cost = hi X (tr + ts) + ts + tr x b, where hi is the height of the index. «_ Ab(secondary index, equality): Selection specifying an equality condition can y secondary index. This strategy can retrieve if the indexing field is not a key. Retry single record if the search-key is a candidate key a Cost = (hi + 1) X (tr + ts), where hi is the height of the index. Retrieve multiple records if search-key is not a candidate key each of n may, records may be on a different block. ig Cost = (hi +n) x (tr + ts), where hi is the height of the index. For large number of blocks n with matching records, this ean be very expensie x cost even more than a linear sean! Selections Involving Comparisons We assume that the relation is sorted on attribute A. Consider a selection of the form oy We can implement the selection either by using linear search, binary search or by using nics in one of the following ways: + AG (primary index, comparison): A primary ordered index (for example, a primary B- tree index) can be used when selection condition is a comparison. 0 For caze(r) use index to find first tuple > v and scan relation sequentially from there. For oaso(r) just scans relation sequentially till first tuple > v without using anv index. . ‘AT (secondary index, comparison): We can use a secondary ordered index to guide retrieval for comparison conditions involving <, <, 2, or >. ° For o42¥(r) use index to find first index entry 2 v and scan index sequentially fr there, to find pointers to records. o For oxv(?) just scan leaf pages of index finding pointers to records, till first 2™"* : v. ‘The secondary index provides pointers to the records, but to get the actual rece to fetch the records by using the pointers. This step may require an UO operation fo" 0 record fetched, since consecutive records may be on different disk blocks: as bef eh operation requires a disk seek and a block transfer, If the number of retriet large, using the secondary index may be even more expensive than using lines ‘Therefore, the secondary index should be used only if very few records are selected. es we Selections of Complex Selections it pit form A op B, wher? we have considered only simple selection conditions of the dicates So far, ‘gon operation. We now consider more complex selection pre’ equality or comparit © Conjunction: A conjunctive selection is a selection of the form: pi p02noan00F) Disjunetion: A disjunctive eetectio CLAPTEI}O Query Processing and Optimization 1 mii r © ptatsent isn solection of the forn isjunctive condition is wati A dig} . ton is satisfied by the union of all simple conditions Oi, all records satisfying the individual, Nogation: The result of a selection 6 u(r) “w(?) is the vet, n evatnten fle. Tn the nbwence sta of tuples of r for which the condition 0 etn out. Imply the set of tuples in r that are A8 (conjunctive selection using on e fi 1: available for an attribute in one of the nite itt che k if there is an access path t simple conditions 0, to redw h. ice th 4 04 and one of algorithms Al through A8 for which the conbieuie Rast Es iss sae cost for oui(r). The cost of algorithm A8 is giver E Ag| anlunetive eslecticn aaa A given by the cost of the chosen algorithm. ae 7 ‘omposite index): An appropriate composite (multiple-key) index may be available for some conjunctive selections. If ‘te i exists on the combined attribute fields, then the index can be searched dinate ™ + A10 (conjunetive selection by interesting of identifiers): This algorithm requires indices with record pointers, on the fields involved in the individual conditions, The algorithm uses corresponding index for each condition, and take intersection of all the obtained sets of record pointers. Then fetch records from the file and if some conditions do not have appropriate indices, apply test in memory. + All (disjunctive selection by union of identifiers): Indices can only be used if there is an index for all conditions; otherwise, a linear scan of the relation has to be performed any way. Uses corresponding index for each condition, and take union of all the obtained sets of record pointers. Then fetch records from file. SoRrTING pak ee Sorting in database system is important for two reasons: 1. Aquery may specify that the output should be sorted / 2. The processing of some relational query operations can be implement , i tions e.g,, join operation. efficiently based on sorted rela aa For relations that fit in memory, techniques like quick-sort can be used and for relations ft in mer . i be used. not fit in memory an external sort-merge algorithm can be ted more Igorithm External Sort-Merge Als nal sorting, The most commonly ' is called exter : M denote i ; in memory is ea rithm, Let 3 Porting * ae . mee phe is the external sort-merse algo! technique for exter Memory size (in pages). 1. Create sorted runs. Initialize (=O. of the elation (Let the final valu Repeat the following till the end we 2) Read M blocks of relation nto met ks b) Sort the in-memory blo“ ©) Write sorted data tour Re d) intel | ye of i be N) 416. Advanced Database 2. Merge the runs (N-way merge). We assume that N M, several merge passes are required. In each pass, contiguous groups of M- 1 ry, | merged. A pass reduces the number of runs by a factor of M -1, and creates runs longer _ | same factor. Repeated passes are performed till all runs have been merged into one, * a|w a| at a | 19 s | 4 3 | 2 b | 14 a |u| a | 9 =) ae ila a | 31 b | uw ales val | < | 3 c | 33 The eis b er e | 16 oan ala e 16 a [2 £12 r | 16 a | ala m| 3 a [a = a[7 7 li r m | 3 a [a etm Pi? a | 14 m | 3 id a[7 7 P p|2 a [1 p|2 r | 16 1 | 6 runs initial runs sorted relation ‘output create merge ity runs pass -1 pase Figure 3.3: External sorting using sort-merge. Figure 3.3 illustrates the steps of the external sort-merge for an example rest a | illustration purposes, we assume that only one tuple fits in a block (f= 1), and we #5 memory holds at most three blocks. During the merge stage, two blocks are used for int one for output. Cost Analysis of external Sort-Merge Let b- denote the number of blocks containing records of relation r ‘The initial number of runs =lbe/M1. Since the number of runs decrease by a factor of M - 1 in each merge passi te Go ‘The total number of merge passes required =[ logs a scram =| loga(by /M . rst stage reads every block of the relation and writ x eransfers. Each of these passes re tae gain, writes it out First, the final pass , water Pi ne produce the sorted output without writ cond, there may be Hing its rest is y be runs that are not read in or written out duri 2 ing a pass \ber of block otal number transfers for external sorting of the relation = b,x (2 *{ ov loguci(b, /M1+ 1). JOINING Saae———ooo ee. OS ike selection, the join operation (oini i i ty ani a algorithms is er ical in minimizing a query’s execution time. cap aie tppesof join algorithms are: ing are 5 well-known + — Nested-Loop Join + Block Nested-Loop Join «Indexed Nested-Loop Join + Sort-Merge Join « — Hash Join Nested-Loop Join ‘his algorithm consists of an inner for loop nes algorithm, we will use the following notations: 1,6 Relations rand s t ‘Puple (record) in relation r t. _Tuple (record) in relation ¢ in relation r ted within an outer for loop. To illustrate this ne Number of records ne Number of records in relation § b Number of blocks with records in relation weds in relation § be ‘Number of blocks with reco! f for joining the two relation rand s utilizing the nested-for Here is a sample pseudo-code Tisting loop: TB Advanced Database In the algorithm, t+ and ts are the tuples of relations r and s, respectively. The not ation 1, tuple constructed by concatenating the attribute values of tuples t; and t, With the help of the algorithm, we understood the following points: * The nested-loop join does not need any indexing similar to a linear file sean for ae the data, " + Nested-loop join does not eare about the given join condition. It is suitable for etch join condition. * The nested-loop join algorithm is expensive in nature. It is because it compute i 8 a examines each pair of tuples in the given two relations, z Block Nested-Loop Join: If the buffer is too small to hold either relation entirely in memory, we can still obtain a saving in block accesses if we process the relations on a per-block basis, rather thay aod tuple basis. Figure 3.5 shows block nested-loop join, which is a variant of the nested. oop cg where every block of the inner relation is paired with every block of the outer relation Wks cach pair of blocks, ever tuple in one block is paired with every tuple in the othe, Hack Generate all pairs of tuples. As before, all pairs of tuples that satisfy the join condition Fe ads to the result. 7 ffor each block 6, of r { for each block b, of s { for each tuple t, in b, { for each tuple t, in b, { if join condition is true for (t,, t,) add tuple t,xt, to the result; + + } Figure 3.5: Block nested-loop join ‘The primary difference in cost between the bl is that, in the worst case, outer relation, instead of lock nested-loop join and the basic nested-loop = each block in the inner relation s is read only once for each bleck in once for each tuple in the outer relation. Clearly, ‘use the smaller relation as the outer relation, in case neither of the relations Index Nested-Loop Join it is more efficient» s fits in memory. This algorithm is the same as the Nested-Loop Join, (6) join attribute is used versus a data-file scan on essentially an equality selection on s utilizing one of t Sort-Merge Join except an index file on the inner relti’* $ - each index lookup in the inner lop * the selection algorithms. This algorithm can be used to perform natural joins and equi-joins and requires that relation (F and s) be sorted by the common attributes between them (Ra §). The details ; a this algorithm works will not be presented here. However, it is notable to point out thit ee EGRET O Query Processing and Optimization — 170 in rand s is only scanne record ony Scanned once, thus producing a worst nnd hest-cnse cost. of br + by Variations of the Sort-Merge loin algorith ‘orithm are used, for insta en the las are: , orted order, but there exit secondary indies tance, when the data files are in un: # for the two relations, Hash Join Like a sort-meree join, the hash join algorithm ean be used to perform natural joins and equi joins, The concept be hind the Hash join algorithm is to partition the tuples of ench given relation into sets. “The partition is done on the basis of the same hash value on the join attributes. The hash function provides the hash value. ‘The main goal of using the hash function in the ithm is to ° een See of comparisons and increase the efficiency to complete the For example, suppose there are two tuples a and b where both of them satisfy the join condition Tt means they have the same value for the join attributes. Suppose that both a and b tuples consist of a hash value as i. It implies that tuple a should be in ai, and tuple b should be in by ‘Thus, only compare a tuples in ai with b tuples of bi. There is no need to compare the b tuples in any other partition. Therefore, in this way, the hash join operation works. EVALUATION OF EXPRESSION We have studied how individual relational operations are carried out. The obvious way to evaluate an expression is simply to evaluate one operation at a time, in an appropriate order. Now we consider how to evaluate an expression containing multiple operations. There are two approaches how a query execution tree can be evaluated: + Materialization: Compute the result of an evaluation primitive and materialize (store) the new relation on the disk. + Pipelining: Pass on tuples to parent operations even while an operation is still being executed, Materialization Itis easiest to understand intuitively how to evaluate an expression by looking at a pictorial representation of the expression in an operator tree. Example 3.2: Consider the expression: Theat pane( pope sts (Department) » Staff) Past eit Department 1 representation ofan expression (@U0"Y tree). Figure 3.6: Picto 480 Advanced Database slntional operation at a time ven expression evaluates one Alto, « In this method, the given expressio ge " ne qperation is evaluated in an appropriate sequence or order. After evaluating all the opera, mporary relation for their subsequent uses. The exam 5 are materialized in at the output figure 3.6 is computed as followin) 1. Compute areas (Department) and store relation] 2. Compute Staff »¢ materialized relation! and store relation? 3. Compute Msi nnson materialized relation? By repeating the process, we will eventually evaluate the operation at the root of the tree, grizg the final result of the expression, In our example, we get the final result by executing 4, projection operation at the root of the tree, using as input the temporary relation created by rp, join. ‘The cost of this type of evaluation is always more leading to a disadvantage. The disadvanta, that it needs to construct those temporary relations for materializing the results of th, evaluated operations, respectively. These temporary relations are written on the disks unlea they are small in size. Double buffering (using two buffers, with one continuing execution of the algorithm while te other is being written out) allows the algorithm to execute more quickly by performing CPL activity in parallel with I/O activity. The number of seeks can be reduced by allocating ex: blocks to the output buffer, and writing out multiple blocks at once. Pipelining In this method, DBMS do not store the records into temporary tables. Instead, it qui query and result of which will be passed to next query to process and so on. It will process query one after the other and each will use the result of previous query for its procssst Pipelining evaluates multiple operations simultaneously by-passing results of one operative ® the next one without storing the tuples on the disk, In the example of figure 3.6, all three operations can be placed in a pipeline, which passes results of the selection to the join as they are generated. In turn, it passes the results of ti to the projection as they are generated, ‘The memory requirements are low, since results 2 operation are not stored for long. However, as a result of pipelining, the inputs to the oper! are not available all at onee for processing, Creating a pipeline of operations can provide two benefits: * It eliminates the cost of rea query evaluat i i “ ing and writing temporary relations, reducing the + It can start generating query results quickly, plan is combined in a pipeline with its inputs, displayed to a user as they are generates before the user sees any query results, if the root operator of a query evil ‘This can be quite useful if the resul | | sinee otherwise there may be a lens Implementation of pipelining Pipelines can be executed in either of two ways; ad Demand-driven (or Lazy eval ue) ray uation) Pipeli s is not i ni tere aoede iae9 Passed to the higher level automatically. It wil tere en ‘hen is reawested by the higher level, In this tn, ee alu a1 with it and it will be transferred tothe mnt eet a rel ee he next level only when it is } Query Processing and Optimization ‘181 ing: In this meth od, the result of lower. g, Producer-driven (or Eager) Pipelining: eagerly pass the results to hi t igher level quer queries to request for the results. In this me In this method, the lower-level queries Tes. Tt does not wait for find hae he higher-level ; |, lowerslev store the results and the higher-level queries pulle the Tele fortran te se full, then the lower-level query waits for th - AF the butter is ie higher. i ‘ also called as PULL and PUSH pipelining, nnn {Y®! UY 1 empty it. Hence i ie QueRY OPTIMIZATION ‘The function of query optimization engine is to find execution cost of a query. We have seen in the pre particular operations such as select and join can va Example 3.3: Consider 2 relations r and s, an evaluation plan that reduces the overall ‘vious sections that the costs for performing ry quite dramatically. with the following characteristics: 10,000 = ny = Number of tuples in r 1,000 1,000 100= Number of tuples ins = Number of blocks with tuples in r Number of blocks with tuples in s Selecting a single record from ron a non-key attribute can have, * acost of {logs(b,)1= 10 (binary search) or + a cost of bi2 = 5,000 (linear search). Joining r and s can have, © acost of n-X bs+ br = 1,001,000 (nested-loop join) or © cost of 3(b, + bs) + 4na = 73,000 (hash-join where ns = 10,000), 7 sins by Notice that the cost difference between the 2 selects differs by a ate of tonsa te he 4 factor of ~14. Clearly, selecting lower-cost methods can result in Performance. i i sries incl Query optimization strategies for lowering tho cent is quia °Plimization, heuristic-based optimization and semanti Judes: cost-based st-bas ization - a 1 cost:based optimization. This is i wed on indexes, constraints, ifferent paths ba: tatisties like record size, number of snes whether whole table fits in a bl tc, Some of the features of jem is known @ This process of selecting a lower-cost mechanism is se di on the eost of the query. The auery can 8 Sorting methods ete. This method mainly sors table size, umber of records per block, number of blocks Oe TT Seanization of tables, uniqueness of column ¥ the cost-based optimization are as follows: I 482, Advanced Database «Ibis based on the cost of the query that to be optimized. «The query can use a lot of paths based on the value of indexes, available sorting me, constraints, etc. «The aim of query optimization is to choose the most efficient path of implementing 4, query at the possible lowest minimum cost in the form of an algorithm, . «The cost of executing the algorithm needs to be provided by the query Optimizers the most suitable query can be selected for an operation. . ‘The cost of an algorithm also depends upon the cardinality of the input. Heuristic-based Optimization Heuristic optimization transforms the query-tree by using a set of rules (Heuristics) th, typically (out not in all cases) improve execution performance. Some common the common heuristic rules are: * Perform selection early (reduces the number of tuples) «Perform projection early (reduces the number of attributes) © Perform most restrictive selection and join operations (ie., with smallest result siz) before other similar operations Initially query tree from SQL statement is generated. Query tree is transformed into mon efficient query tree, via a series of tree modifications, each of which hopefully reduces the execution time. A single query tree is involved at last. Semantie-based Optimization ‘This strategy uses constraints specified on the database schema—such as unique attributes ani other more complex constraints—in order to modify one query into another query that is mor efficient to execute, Example 3.4: Consider the following SQL query: SELECT elname, m.iname FROM EMPLOYEE as e, EMPLOYEE as m WHERE e.super_ssn=m.ssn and e.salary>m.salary; ‘This query retrieves the names of employees who earn more than their supervisors. S17 that we had a constraint on the database schema that stated that no employee can ea than his or her direet supervisor. Ifthe semantic query optimizer checks forthe exstene oF constraint, it does not need to execute the query at all because it knows that the rest ae query will be empty. This may save considerable timo if the constraint checking ©? ee efficiently. However, searching through many constraints to find those that are applicable ’ given query and that may semantically optimize it can also be quite time-consuming: inclusion of active rules and additional metadata in database systems, semantic ptimization techniques are being gradually incorporated into the DBMS. afer get eransfort the expression and tree into equival pie 35: Consider the following SQL, queryy pon SELECT Stu_name, Marks Obtained FROM Student, Marks corresponding relational algebra expression jg: A Tis. name, Marks Obtained(GStu.id=10 (Sub jte09 (Student o4 Marks) Tle sae Ma ties sade es Figure 3.7: Initial expression tree Suppose the Student and Marks relations both have 100 records each and the number of Stu _id=10 is 50. Note that the Cartesian product resulting in 10,000 records can be reduced by 50% if the o Stu_id=10 operation is performed first. We can also combine the Sub_id=20 and Cartesian product operations into a more efficient join operation, as well as eliminating any unneeded columns before the expensive join is performed. The diagram below shows this better, “optimized” version of the tres Tsay ames Matis bined Pasa iansruid an Ths ont Sat ie | de red tree of figure 3.7) MTree after transformation (opti yuery optimizer can use to it and theorems the a e several aro equivalent relations states that the set of the definit tbe the same—because they are sets, the order does not be the y > st | algebra theorems: Figure 3.1 Inrelational algebra, there ar "ransform the query. For instance, ‘Attributes (domain) of each relation mu Matter, Here is a partial list of relationa! 484 Advanced Database Cascade of o: A select with con} cascade of selects upon selec iaiaata anal) # ON1(FAa(~e(GAN(P)---)) 2, Commutativity of o: The select operation is commutative: oai(oaa(r)) = 5a2(oai(?)) 3. Cascade of II: A cascade of proj the caseade: Tatts (TTatisa(.(TTatia(*))---)) = Hatin () 4. Commutating o with TI: Given a 11's and o's attribute of Ar, Az, . operations can be commuted: Fanatic orn (6e(P)) = Oc(FTata....An(P)) Commutativity of b¢ (or x): The join and Cartesian product operations are commutative reasesoar andr 6. Commuting & with »4 (or x): Select can be commu! 1 junctive conditions on the attribute list is equiva i TL a ect operations is equivalent to the last project operat tion An, the Mand g xr ted with join ( or Cartesian product) as follows: Ifall of the attributes in the select’s condition are in relation r then or 64 8) = (6-(7)) 248 b. Given select the condition c composed of conditions cl and c2, and cl contains only attributes from r and 2 contains only attributes from s then - ‘o<(r #4 $) = (ca(?)) * (als) 7. Commutativity of set operations (U, 0, -) commutative; but the difference operation is not: TUSSSULFASESOr retest 8 Associativity of 4, x, U and 1: Alll four of these operations are individually associative. Le be any one of these operators, then: (r0s) Ot=r0(sOt) 9. Commuting o with set operations (U, 4, - ): Let @ be any one of the three set ‘operations, the c(t 8 8) = (Ge()) 8 (@c(8)) 10. | Commuting II with U: Project and union operations can be commuted: Tlaua(t Us) = (Tats (2)) U (Tani (8)) Using these theorems, an algorithm can be defined to transform the original a expression/tree created by the parser into a more optimized query. Some of the key ‘concepts be summarized as follow: a. Union and intersection operations are 1. © One primary objective is to reduce the size of the intermediate relations, both in terms! bytes per record as well as number of records, as soon as possible so that subsea! operations will have less data to process and thus execute quicker. 2. Operations, such as conjunctive selections, should be broken down into their ea" set of smaller units to allow the individual units to be moved into “better” position" the query tree. : . eins td Combine Cartesian products with corresponding selects to create joins et optimized join algorithms like the sort-merge join and hash join ca” ont magnitude more efficient. CERT 0 Query Processing and Opiniation 185 far down the tree relations that ca Move selects and projects produce smaller intermediate operations above. 8 possible, as these operat © operations. will in be proces be processed more quickly by the choice of Evaluation Plans \ — spo query optimization engine typically go, ‘ in heuristic theory, vs walt eory, produce a faster, more effi , © efficient execution. Ot execution. Others may, ical results, be more effici ten efficient than the theoreti for queries dependent on the semantic nature of ical models—this can very well be ‘e of the di ery * gre efficient due to “outside agencies” 7 a eon oo such as network congestion, i sane PU, te Th ct tasaleaalence econ . competing applications on ee valuation plan to exeoute a ay ich the query execution engine can jven time, . Sua 1, Explain query processing in detail with example, ‘ates a sot i & set of candidate evaluation plans. S by prior freee the case © processed. Still others can be Yq, 2% What are query optimization techniques? Explain. 3, _ How does query processing and query optimization related? ‘4 wae o porte relational-algebra expression of following SQL query and draw their SELECT Stu_name, Dept_id il FROM Student WHERE Dept_id <=2; 6. How do you measure the cost of query? Explain. mula to calculate the cost of searching algorithm 6 Define access path. Write the for: selections using indices. the external sort-merge algori rategies for implementin jthm with suitable example. 7. Explain 1g the Join operation? & What are the main st tion expression? on works with example? 9. What do you mean by evaluat 37 10. Explain how materialization evaluati "(11 Explain pipelining approach of evaluation of €X pression in detail. optimization. gard to query yy be mapped: istic rules with re nested queries ma! 12, Contrast cost estimation and hew . . i whiel ©; 18, Disouss semi-join and antijoin 2° operations (© , provide an example of each- J 14. How outer join and non-ea a 15. 17. 18. 19. 20. ‘Advanced Database int a relational algebra expression? What is meany b Jes for transformation of query trees, ang ; oat 1 and j mization. How does a query tree represel execution of a query tree? Discuss the ru hg when each rule should be applied during opti en What is meant by semantic query optimization? How does it differ from othe, optimization techniques? ery What is the difference between pil What are the problems associated with keeping views materialized? What do you mean by query processing? What are the various steps involved in processing? Explain with the help of a block diagram. Mery Discuss the cost components for a cost function that is used to estimate query exeent cost, Which cost components are used most often as the basis for cost functions? “ pelining and materialization? Q00

You might also like