0% found this document useful (0 votes)

108 views20 pages

Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The

Database tuning involves optimizing the performance of a database through activities like designing database files, selecting a database management system, and configuring system resources. The goal is to maximize efficiency by customizing settings for the database and DBMS. Tuning a DBMS involves configuring memory allocation, processing resources, recovery intervals, and parallelism. It is important to allocate memory efficiently for data caching to improve performance.

Uploaded by

ahsivirah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views20 pages

Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The

Uploaded by

ahsivirah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Database tuning

Database tuning describes a group of activities used to optimize and homogenize the

performance of a database. It usually overlaps with query tuning, but refers to design of
the database files, selection of the database management system (DBMS), operating
system and CPU the DBMS runs on.

The goal is to maximize use of system resources to perform work as efficiently and
rapidly as possible. Most systems are designed to manage work efficiently, but it is
possible to greatly improve performance by customizing settings and the configuration
for the database and the DBMS being tuned.

DBMS tuning refers to tuning of the DBMS and the configuration of the memory and
processing resources of the computer running the DBMS. This is typically done through
configuring the DBMS, but the resources involved are shared with the host system.

Tuning the DBMS can involve setting the recovery interval (time needed to restore the
state of data to a particular point in time), assigning parallelism (the breaking up of work
from a single query into tasks assigned to different processing resources), and network
protocols used to communicate with database consumers.

Memory is allocated for data, execution plans, procedure cache, and work space. It is
much faster to access data in memory than data on storage, so maintaining a sizable
cache of data makes activities perform faster. The same consideration is given to work
space. Caching execution plans and procedures means that they are reused instead of
recompiled when needed. It is important to take as much memory as possible, while
leaving enough for other processes and the OS to use without excessive paging of
memory to storage.

Processing resources are sometimes assigned to specific activities to

improve concurrency. On a server with eight processors, six could be reserved for the
DBMS to maximize available processing resources for the database.
Basic Algorithms for Executing Relational Query Operations
 An RDBMS must include one or more alternative algorithms that implement each
relational algebra operation (SELECT, JOIN,…) and, in many cases, that implement each
combination of theses operations.
 Each algorithm may apply only to particular storage structures and access paths (such
index,…).
 Only execution strategies that can be implemented by the RDBMS algorithms and that
apply to the particular query and particular database design can be considered by the
query optimization module.

1. Algorithms for implementing SELECT operation

 These algorithms depend on the file having specific access paths and may apply only to
certain types of selection conditions.
 We will use the following examples of SELECT operations:
– (OP1):SSN=‘123456789’ (EMPLOYEE)

– (OP2):DNUMBER > 5 (DEPARTMENT)

– (OP3):DNO=5 (EMPLOYEE)

– (OP4):DNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE)

– (OP5):ESSN=‘123456789’ AND PNO=10 (WORKS_ON)

 Many search methods can be used for simple selection: S1 through S6

• S1: Linear Search (brute force) –full scan in Oracle’s terminology-

– Retrieves every record in the file, and test whether its attribute values satisfy the
selection condition: an expensive approach.
– Cost: b/2 if key and b if no key
• S2: Binary Search

– If the selection condition involves an equality comparison on a key attribute on

which the file is ordered.
– SSN is the ordering attribute.
– Cost: log2b if key.
• S3: Using a Primary Index (hash key)

– An equality comparison on a key attribute with a primary index (or hash key).
– This condition retrieves a single record (at most).
– Cost : primary index : bind/2 + 1 (hash key: 1bucket if no collision).
• S4: Using a primary index to retrieve multiple records

– Comparison condition is >, >=, <, or <= on a key field with a primary index
–
– Use the index to find the record satisfying the corresponding equality condition
(DNUMBER=5), then retrieve all subsequent records in the (ordered) file.
– For the condition (DNUMBER <5), retrieve all the preceding records.
– Method used for range queries too(i.e. queries to retrieve records in certain
range)
– Cost: bind/2 + ?. ‘?’ could be known if the number of duplicates known.
–
• S5: Using a clustering index to retrieve multiple records:

– If the selection condition involves an equality comparison on a non-key attribute

with a clustering index.
– DNO=5(EMPLOYEE)
– Use the index to retrieve all the records satisfying the condition.
– Cost: log2bind + ?. ‘?’ could be known if the number of duplicates known.
• S6: Using a secondary (B+-tree) index on an equality comparison:

– The method can be used to retrieve a single record if the indexing field is a key
or to retrieve multiple records if the indexing field is not a key.
– This can also be used for comparisons involving >, >=, <, or <=. – Method used
for range queries too.
– Cost to retrieve: a key= height + 1; a non key= height+1(extra-level)+?,
comparison=(height-1)+?+?
• Many search methods can be used for complex selection which involve a Conjunctive Condition:
S7 through as S9.

– Conjunctive condition: several simple conditions connected with the AND

logical connective.
– (OP4): sDNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE).
• S7:Conjunctive selection using an individual index.

– If an attribute involved in any single simple condition in the conjunctive condition has an
access path that permits the use of one of the Methods S2 to S6, use that condition to
retrieve the records.
– Then check whether each retrieved record satisfies the remaining simple conditions in the
conjunctive condition
• S8:Conjunctive selection using a composite index:

– If two or more attributes are involved in equality conditions in the conjunctive condition
and a composite index (or hash structure) exists on the combined fields.
– Example: If an index has been created on the composite key (ESSN, PNO) of the
WORKS_ON file, we can use the index directly.
– (OP5):ESSN=‘123456789’ AND PNO=10 (WORKS_ON).
• S9: Conjunctive selection by intersection of record pointers

– If the secondary indexes are available on more than one of the fields involved in simple
conditions in the conjunctive condition, and if the indexes include record pointers (rather
than block pointers), then each index can be used to retrieve the set of record pointers that
satisfy the individual condition.
– The intersection of these sets of record pointers gives the record pointers that satisfy the
conjunctive condition.
– If only some of the conditions have secondary indexes, each retrieval record is further
tested to determine whether it satisfies the remaining conditions.
• Commercial systems: Informix uses S9. Sybase ASE does it using bitmap operations. Oracle 8
uses many ways for intersection of record pointer (“hash join of indexes” and “AND bitmap”).
Microsoft SQL Server implements intersection of record pointers by index join.
2. Algorithms for implementing JOIN Operation
• Join: time-consuming operation. We will consider only natural join operation

– Two-way join: join on two files.

– Multiway join: involving more than two files.

• The following examples of two-way JOIN operation (R A=B S) will be used:

– OP6: EMPLOYEE DNO=DNUMBER DEPARTMENT

– OP7: DEPARTMENT MGRSSN=SSN EMPLOYEE
• J1: Nested-loop join (brute force)

– For each record t in R (outer loop), retrieve every record s from S (inner loop) and test
whether the two records satisfy the join condition t[A] = s[B].
• J2: Single-loop join (using an access structure to retrieve the matching records)

– If an index (or hash key) exists for one of the two join attributes (e.g B of S), retrieve
each record t in R, one at a time (single loop), and then use the access structure to retrieve
directly all matching records s from S that satisfy s[B] = t[A]
• J3. Sort-merge join:

– If the records of R and S are physically sorted (ordered) by value of the join attributes A
and B, respectively, we can implement the join in the most efficient way.
– Both files are scanned concurrently in order of the join attributes, matching the records
that have the same values for A and B.
– If the files are not sorted, they may be sorted first by using external sorting.
– Pairs of file blocks are copied into memory buffers in order and records of each file are
scanned only once each for matching with the other file if A & B are key attributes.
– The method is slightly modified in case where A and B are not key attributes.
• J4: Hash-join

– The records of files R and S are both hashed to the same hash file using the same hashing
function on the join attributes A of R and B of S as hash keys.
• Partitioning Phase

– First, a single pass through the file with fewer records (say, R) hashes its records to the
hash file buckets.
– Assumption: The smaller file fits entirely into memory buckets after the first phase.
• If the above assumption is not satisfied, the method is a more complex one and number of
variations have been proposed to improve efficiency: partition has join and hybrid hash join.

• Probing Phase

– A single pass through the other file (S) then hashes each of its records to probe
appropriate bucket, and that record is combined with all matching records from R in that
bucket.
• Commercials systems: Sybase ASE supports single-loop join and sort-merge join. Oracle 8
supports page-oriented nested loop join, sort-merge join, and variant of hybrid hash join. IBM
DB2 supports single-loop join, sort-merge, and hybrid hash-join. Microsoft SQL Server supports
single-loop join, sort-merge, hash join, and technique called hash teams. Informix supports nested
loops, singleloops, and hybrid hash join.

3. Algorithms for implementing PROJECTION Operation

 If the attribute list of the projection operation includes the key: the result will have the
same number of tuples but with only the values of the attribute list.
 In the other case:
– Remove unwanted attributes (not specified in the projection).

– Eliminate any duplicate tuples.

SELECT DISTINCT SSN, LNAME

FROM EMPLOYEE

(Duplication not removed if DISTINCT not used)

• Projection based on Sorting

– Scan EMPLOYEE and produce a set of tuples that contain only the desired attributes.
– Sort this set of tuples using the combination of all its attributes as the key for sorting.
– Scan the sorted result, comparing adjacent tuples, and discard duplicates.

• Projection Based on Hashing: Hashing is used to eliminate duplicates.

– As each record is hashed (hash function on the attribute list of the projection operation)
and inserted into a bucket of the hash file in memory, it is checked against those already
in the bucket.
– If it is a duplicate, it is not inserted.

• Projection Based Indexing:

– An existing index is useful if the key includes all the attributes that we wish to retain in
the projection.
– We can simply retrieve the key values from the index (without ever accessing the actual
relation) and apply our projection techniques to this (much smaller) set of pages. This
technique is called an index-only scan.
– If we have an ordered index whose search key includes the wanted attributes as a prefix,
we can do even better: Just retrieve the data entries in order, discarding unwanted fields,
and compare adjacent entries to check for duplicates.
 Since external sorting is required for a variety of reasons, most database systems
have a sorting utility, which can be used to implement projection relatively easily.
 Sorting is the standard approach for projection.
 Commercial Systems: Informix uses hashing, IBM DB2, Oracle 8 and Sybase
ASE uses sorting. Microsoft SQL Server and Sybase ASIQ implement both hash-based and
sort-based algorithms.

Combining Operations

 An SQL query will be translated into a sequence of relational operations.

– OP8: ΠLNAME(SEX=‘M’ (EMPLOYEE) MGRSSN=SS DEPARTMENT)

• Materialization Alternative:

– Execute a single operation at a time which generates a temporary file that will be used as
an input to the next operation.
– OP8: compute and store in a temporary fileSEX=‘M’ (EMPLOYEE). Then compute and
store in a new temporary file what have been already stored join with DEPARTMENT.
Finally, compute, as a file result, the projection. So 2 input files, 2 temporary files, and
result file.
– Time consuming approach because it will generate and sort many temporary files.

• Pipelining (stream-based) Alternative:

– Generate query execution code that correspond to algorithms for combinations of

operations in a query.
– As the result tuples from one operation are produced, they are provided as input for
parent operations. So no need to store temporary files to disk.
– OP8: Don’t store result ofSEX=‘M’ (EMPLOYEE), instead, pass tuples directly to the
join. Similarly, don’t store result of join, pass tuples directly to projection. So, only two
input files and one result file.
– Pipelines can be executed in two ways: demand driven and producer driven.

expression to an equivalent one then we discuss the generation of query execution plan.

2.1 Transformation of Relational Expression

As we mentioned, one aspect of optimization occurs at relational algebra level.
This involves transforming an initial expression (tree) into an equivalent expression (tree)
which is more efficient to execute. Two relational algebra expressions are said to be
equivalent if the two expressions generate two relation of the same set of attributes and
contain the same set of tuples although their attributes may be ordered differently.

The query tree is a data structure that represents the relational algebra expression in the
query optimization process. The leaf nodes in the query tree corresponds to the input
relations of the query. The internal nodes represent the operators in the query. When
executing the query, the system will execute an internal node operation whenever its
operands available, then the internal node is replaced by the relation which is obtained
from the preceding execution.

2.1.1 Equivalence Rules for transforming relational expressions

There are many rules which can be used to transform relational algebra operations
to equivalent ones. We will state here some useful rules for query optimization.

In this section, we use the following notation:

 E1, E2, E3,… : denote relational algebra expressions

 X, Y, Z : denote set of attributes
 F, F1, F2, F3 ,… : denote predicates (selection or join conditions)
 Commutativity of Join, Cartesian Product operations

 Note that: Natural Join operator is a special case of Join, so Natural Joins are also
commutative.
 Associativity of Join , Cartesian Product operations

Join operation associative in the following manner: F1 involves attributes from only E1
and E2 and F2 involves only attributes from E2 and E3

 Cascade of Projection
πX1(πX2(...(πXn(E))...))≡πX1(E)
 Cascade of Selection
σ F1∧F2∧...∧Fn (E)≡σ (σ (...(σ (E))...))
F1 F2 Fn

 Commutativity of Selection
σF1(σF2(E))≡σF2(σF1(E))
 Commuting Selection with Projection
πX(σF(E))≡σF(πX(E))
This rule holds if the selection condition F involves only the attributes in set X.
 Selection with Cartesian Product and Join
 If all the attributes in the selection condition F involve only the attributes of one
of the expression say E1, then the selection and Join can be combined as follows:

 If the selection condition F = F1 AND F2 where F1 involves only attributes of

expression E1 and F2 involves only attribute of expression E2 then we have:

 If the selection condition F = F1 AND F2 where F1 involves only attributes of

expression E1 and F2 involves attributes from both E1 and E2 then we have:

The same rule apply if the Join operation replaced by a Cartersian Product operation.
 Commuting Projection with Join and Cartesian Product
 Let X, Y be the set of attributes of E1 and E2 respectively. If the join condition
involves only attributes in XY (union of two sets) then :

The same rule apply when replace the Join by Cartersian Product

 If the join condition involves additional attributes say Z of E1 and W of E2 and

Z,W are not in XY then :

 Commuting Selection with set operations

The Selection commutes with all three set operations (Union, Intersect, Set Difference) .

The same rule apply when replace Union by Intersection or Set Difference
 Commuting Projection with Union

 Commutativity of set operations: The Union and Intersection are commutative but
Set Difference is not.

 Associativity of set operations: Union and Intersection are associative but Set
Difference is not

 Converting a Catersian Product followed by a Selection into Join.

If the selection condition corresponds to a join condition we can do the convert as
follows:

2.1.2 Example of Transformation

Consider the following query on COMPANY database: “Find the name of employee born
after 1967 who work on a project named ‘Greenlife’ “.

The SQL query is:

SELECT Name
FROM EMPLOYEE E, JOIN J, PROJECT P
WHERE E.EID = J.EID and PCode = Code and Bdate > ’31-12-1967’ and P.Name
= ‘Greenlife’;

The initial query tree for this SQL query is

Figure 2: Initial query tree for query in example

We can transform the query in the following steps:

- Using transformation rule number 7 apply on Catersian Product and Selection

operations to moves some Selection down the tree. Selection operations can help
reduce the size of the temprary relations which involve in Catersian Product.
Figure 3: Move Selection down the tree

- Using rule number 13 to convert the sequence (Selection, Cartesian Product) in to a

Join. In this situation, since the Join condition is equality comparison in same attributes
so we can convert it to Natural Join.

Figure 4: Combine Catersian Product with

subsequent Selection into Join

- Using rule number 8 to moves Projection operations down the tree.

Figure 5: Move projections down the query tree

2.2 Heuristic Algebraic Optimization algorithm

Here are the steps of an algorithm that utilizes equivalence rules to transfrom the query
tree.

 Break up any Selection operation with conjunctive conditions into a cascade of

Selection operations. This step is based on equivalence rule number 4.
 Move selection operations as far down the query tree as possible. This step use
the commutativity and associativity of selection as mentioned in equivalence rule
number 5,6,7 and 9.
 Rearrange the leaf nodes of the tree so most restrictive selections are done first.
Most restrictive selection is the one that produces the fewest number of tuples. In
addition, make sure that the ordering of leaf nodes does not cause the Catersian
Product operation. This step relies on the rules of associativity of binary operations
such as rule 2 and 12
 Combine a Cartesian Product with a subsequent Selection operation into a Join
operation if the selection condition represents a join condition (rule 13)
 Break down and move lists of projection down the tree as far as possible. Creating
new Projection operations as needed (rule 3, 6, 8, 10)
 Identify sub-tree that present groups of operations that can be pipelined and
executing them using pipelining

The previous example illustrates the transforming of a query tree using this algorithm.

2.3 Converting query tree to query evaluation plan

Query optimizers use the above equivalence rules to generate a enumeration of logically
equivalent expressions to the given query expression. However, expression generating is
just one part of the optimization process. As mentioned above, the evaluation plan
include the detail algorithm for each operation in the expression and how the execution of
the operations is coordinated. The figure 6 shows an evaluation plan.

Figure 6: An evaluation plan

As we know, the output of Parsing and Translating step in the query processing is a
relational algebra expression. For a complex query, this expression consists of several
operations and interact with various relations. Thus the evaluation of the expression is
very costly in terms of both time and memory space. Now we consider how to evaluate
an expression containing multiple operations. The obvious way to evaluate the expression
is simply evaluate one operations at a time in an appropriate order. The result of an
individual evaluation will be stored in a temporary relation, which must be written to disk
and might be use as the input for the following evaluation. Another approach is evaluate
several operations simultaneously in a pipeline, in which result of one operation passed
on to the next one, no temporary relation is created.

These two approaches for evaluating expression are materialization and pipelining.

Materialization
We will illustrate how to evaluate an expression using materialization approach by
looking at an example expression.

Consider the expression

The corresponding query tree is

Figure 7: Sample query tree for a relational algebra expression

When we apply the materialization approach, we start from the lowest-level operations in
the expression. In our example, there is only one such operation: the SELECTION on
DEPARTMENT. We execute this opeartion using the algorithm for it for example
Retriving multiple records using a secondary index on DName. The result is stored in the
temporary relations. We can use these temporary relation to execute the operation at the
next level up in the tree . So in our example the inputs of the join operation are
EMPLOYEE relation and temporary relation which is just created. Now, evaluate the
JOIN operation, generating another temporary relation. The execution terminates when
the root node is executed and produces the result relation for the query. The root node in
this example is the PROJECTION applied on the temporary relation produced by
executing the join.

Using materialized evaluation, every immediate operation produce a temporary relation

which then used for evaluation of higher level operations. Those temporary relations vary
in size and might have to stored in the disk . Thus, the cost for this evaluation is the sum
of the costs of all operation plus the cost of writing/reading the result of intermediate
results to disk (if it is applicable) .

Pipelining
We can improve query evaluation efficiency by reducing the number of temporary
relations that are produced. To archieve this reduction, it is common to combining several
operations into a pipeline of operations. For illustrate this idea, consider our example,
rather being implemented seperately, the JOIN can be combined with the SELECTION
on DEPARTMENT and the EMPLOYEE relation and the final PROJECTION operation.
When the select operation generates a tuple of its result, that tuple is passed immediately
along with a tuple from EMPLOYEE relation to the join. The join receive two tuples as
input, process them , if a result tuple is generated by the join, that tuple again is passed
immediately to the projection operation process to get a tuple in the final result relation.

We can implement the pipeline by create a query execution code

Using pipelining in this situation can reduce the number of temporary files, thus reduce
the cost of query evaluation. And we can see that in general, if pipelining is applicable,
the cost of the two approaches can differ substantially. But there are cases where only
materialization is feasible.

3. Cost Estimates in Query Optimization

Typically, a query optimizer is not only depended on heuristic rules but also on
estimating and compare the cost of executing different plans then choose the query
execution plans with lowest cost.

3.1 Measure of Query Cost

The cost of a query execution plan includes the following components:

 Access cost to secondary storage: This is the cost of searching for, reading,
writing data blocks of secondary storage such as disk.
 Computation cost: This is the cost of performing in-memory operation on the data
buffer during execution. This can be considered as CPU time to execute a query
 Storage cost: This is the cost of storing immediate files that are generated during
execution
 Communication cost: This is the cost of transfering the query and its result from
site to site ( in a distributed or parallel database system)
 Memory usage cost: Number of buffers needed during execution.

In a large database, access cost is usually the most important cose since disk accesses are
slow compared to in-memory operations.
In a small database, when almost data reside in the memory, the emphasis is on
computation cost. In the distributed system, communication cost should be minimize.

It is difficult to include all the cost components in a cost function. Therefore, some cost
functions consider only disk access cost as the reasonable measure of the cost of a query-
evaluation plan.

3.2 Catalog Information for Cost Estimation

Query optimizers use the statistic information stored in DBMS catalog to estimate the
cost of a plan. The relevant catalog information about the relation includes:

 Number of tuples in a relation r; denote by nr

 Number of blocks containing tuple of relation r: br
 Size of the tuple in a relation r ( assume records in a file are all of same types): sr
 Blocking factor of relation r which is the number of tuples that fit into one block:
fr
 V(A,r) is the number of distinct value of an attribute A in a relation r. This value
is the same as size of πA(r). If A is a key attribute then V(A,r) = nr
 SC(A,r) is the selection cardinality of attribute A of relation r. This is the average
number of records that satisfy an equality condition on attribute A.

In addition to relation information, some information about indices is also used:

 Number of levels in index i.

 Number of lowest –level index blocks in index i ( number of blocks in leaf level
of the index)

The statistical information listed here is simplified. The optimizer on real database
management system might have further information to improve the accuracy of their cost
estimates.

With the statistical information maintained in DBMS catalog and the measures of query
cost based on number of disk accesses, we can estimate the cost for different relational
algebra operations. In here, we will give an simple example of using cost model to
estimate the cost for selection operation. However, we are not intend to go into detail of
this issue in this course, please refer to the textbook and reference books if you want to
go deeper in this issue.

Example of cost functions for SELECTION

Consider a selection operation on a relation whose tuples are all stored in one file. The
simplest algorithms to implement seletion are linear search and binary search.

 Linear search: Scan all file blocks, all records in a block are checked to see
whether they satisfy the search condition. In general, cost for this method is C=br.
For a selection on a key attribute, half of the blocks are scanned on average, so C =
(br/2)
 Binary search: If the file is ordered on an attribute A and selection condition is a
equality comparison on A, we can use binary search. The estimate number of
blocks to be scan is

The first term is the cost to locate the first satisfied tuple by a binary search, the second
term is the number of blocks contains records that satisfy the select condition of which
one has already been retrieved that why we have the third term

Now, consider a selection in EMPLOYEE file

σDeptId=1(EMPLOYEE)
The file EMPLOYEE has the following statistical information:

 f = 20 (there are 20 tuples can fit in one block)

 V(DeptID, EMPLOYEE) = 10 (there are 10 different departments)
 n = 1000 ( there are 1000 tuples in the file)

Cost for doing linear search is b = 1000/20 = 50 block accesses

Cost for doing binary search on ordering attribute DeptID:

 Average number of records that satisfy the condition is : 1000/10 = 100 records
 Number of blocks contains these tuples is: 100/20 = 5
 A binary search for the first tuple would take log250 = 6
 Thus the total cost is : 5 + 6 – 1 = 10 block accesses

HASHING TECHNIQUES:

 Hashing provides very fast access to records on certain search conditions. This
organization is usually called a hash file.
 The search condition must be an equality condition on a single field, called the hash field
of the file. The hash field is also called as hash key.
 The idea behind hashing is to provide a function ‘h’ called a hash function (or)
randomizing function, that is applied to the hash field value of a record and yields the
address of the disk block in which the record is stored.
 Hashing is also used as an internal search within a program whenever a group of records
is accessed or exclusively by using the value of one field.

Static Hashing

 A bucket is a unit of storage containing one or more records (a bucket is typically a disk
block).
 The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1...
bucketM-1.Typically, a bucket corresponds to one (or a fixed number of) disk block.
 In a hash file organization we obtain the bucket of a record directly from its search-key
value using a hash function, h (K).
 The record with hash key value K is stored in bucket, where i=h(K).
 Hash function is used to locate records for access, insertion as well as deletion.
 Records with different search-key values may be mapped to the same bucket; thus
entire bucket has to be searched sequentially to locate a record.
 primary pages fixed, allocated sequentially, never de-allocated; overflow pages if
needed.

h(K) mod M = bucket to which data entry with key k belongs. (M = # of buckets)

Static External Hashing

 One of the file fields is designated to be the hash key, K, of the file.
 Collisions occur when a new record hashes to a bucket that is already full.
 An overflow file is kept for storing such records. Overflow records that hash to each
bucket can be linked together.
 To reduce overflow records, a hash file is typically kept 70-80% full.
 The hash function h should distribute the records uniformly among the buckets;
otherwise, search time will be increased because many overflow records will exist.
 Hash function works on search key field of record r. Must distribute values over range
0 ... M-1.

H (K) = (a * K + b) usually works well.

a and b are constants;
lots known abut how to tune h.

 Typical hash functions perform computation on the internal binary representation of the
search-key.

For example, for a string search-key, the binary representations of all the
characters in the string could be added and the sum modulo the number of
buckets could be returned.

 Ideal hash function is random, so each bucket will have the same number of records
assigned to it irrespective of the actual distribution of search-key values in the file.

Dynamic and Extendible Hashing Techniques

 Hashing techniques are adapted to allow the dynamic growth and shrinking of the
number of file records.

These techniques include the following:

o Dynamic hashing
o Extendible hashing
o Linear hashing.

 These hashing techniques use the binary representation of the hash value h(K).
 In dynamic hashing the directory is a binary tree.
 In extendible hashing the directory is an array of size 2d where d is called the global
depth.
 The directories can be stored on disk, and they expand or shrink dynamically. Directory
entries point to the disk blocks that contain the stored records.
 An insertion in a disk block that is full causes the block to split into two blocks and the
records are redistributed among the two blocks.
 The directory is updated appropriately.
 Dynamic and extendible hashing do not require an overflow area.
 Linear hashing does require an overflow area but does not use a directory. Blocks are
split in linear order as the file expands.

Dynamic Hashing

 Good for database that grows and shrinks in size

 Allows the hash function to be modified dynamically

Extendable hashing – one form of dynamic hashing

 Hash function generates values over a large range —typically b-bit integers, with b = 32.
 At any time use only a prefix of the hash function to index into a table of bucket
addresses.
 Let the length of the prefix be i bits, 0 _ i _ 32.
 Bucket address table size = 2i. Initially i = 0
 Value of i grows and shrinks as the size of the database grows and shrinks.
 Multiple entries in the bucket address table may point to a bucket.
 Thus, actual number of buckets is < 2i
 The number of buckets also changes dynamically due to coalescing and splitting of
buckets.

General Extendable Hash Structure

Linear Hashing

 This is another dynamic hashing scheme, an alternative to Extendible Hashing.

 LH handles the problem of long overflow chains without using a directory, and handles
duplicates.
 Idea: Use a family of hash functions h0, h1, h2,...

hi(key) = h(key) mod(2iN); N = initial # buckets

h is some hash function (range is not 0 to N-1)
If N = 2d0, for some d0, hi consists of applying h and looking at the last di bits,
where di = d0 + i.
hi+1 doubles the range of hi (similar to directory doubling)

Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
53 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
13 QP1
No ratings yet
13 QP1
33 pages
Chapter ONE
No ratings yet
Chapter ONE
48 pages
QEII
No ratings yet
QEII
44 pages
Advance Database Chapter 1-1
No ratings yet
Advance Database Chapter 1-1
76 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Chapter 2 Query Processing & Optmzn
No ratings yet
Chapter 2 Query Processing & Optmzn
64 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
54 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
Chapter 2
No ratings yet
Chapter 2
64 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
77 pages
Ad Database All Slide
No ratings yet
Ad Database All Slide
49 pages
Final Review
No ratings yet
Final Review
96 pages
4.6 Algorithms For Select and Join Operations
No ratings yet
4.6 Algorithms For Select and Join Operations
6 pages
Ch1 Query Processing
No ratings yet
Ch1 Query Processing
49 pages
Cs410 Notes Ch15
No ratings yet
Cs410 Notes Ch15
20 pages
Module - 4
No ratings yet
Module - 4
60 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Query Processing
No ratings yet
Query Processing
39 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
CH 1 Query Processing
No ratings yet
CH 1 Query Processing
38 pages
2 Select Optimization
No ratings yet
2 Select Optimization
23 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Chapter 15
No ratings yet
Chapter 15
66 pages
Introduction To Query Processing and Query Optimization Techniques
No ratings yet
Introduction To Query Processing and Query Optimization Techniques
77 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Database Modeling - notes-VI
No ratings yet
Database Modeling - notes-VI
8 pages
Query Processing
No ratings yet
Query Processing
77 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
L10-Query Evaluaion
No ratings yet
L10-Query Evaluaion
50 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Query Processing
No ratings yet
Query Processing
8 pages
Chapter 1 Query Processing
100% (1)
Chapter 1 Query Processing
45 pages
DBMS
No ratings yet
DBMS
68 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
DBMS Short Notes
No ratings yet
DBMS Short Notes
6 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Lesson 05
No ratings yet
Lesson 05
29 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
11 Query Evaluations
No ratings yet
11 Query Evaluations
17 pages
Query Processing and Optimisation - Intr
No ratings yet
Query Processing and Optimisation - Intr
41 pages
Query Execution
No ratings yet
Query Execution
87 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Oracle Explain Plans EXPLAINED
100% (1)
Oracle Explain Plans EXPLAINED
35 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
17 Query Processing PDF
No ratings yet
17 Query Processing PDF
23 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Efficient Priority Based Load Balancing in Cloud Computing Environment
No ratings yet
Efficient Priority Based Load Balancing in Cloud Computing Environment
62 pages
Service Orchestration Workflow Using Bpel
No ratings yet
Service Orchestration Workflow Using Bpel
10 pages
Sub Procedures
No ratings yet
Sub Procedures
1 page
Normal Is at Ion
No ratings yet
Normal Is at Ion
3 pages
ECODAN AtW Split WWW - Zubadan
No ratings yet
ECODAN AtW Split WWW - Zubadan
2 pages
Textbook of Psychiatry 7th Edition
100% (5)
Textbook of Psychiatry 7th Edition
1,767 pages
Artificial Intelligence Financial Services
No ratings yet
Artificial Intelligence Financial Services
27 pages
Soal Bhs Inggris Sem Genap Kls 8
No ratings yet
Soal Bhs Inggris Sem Genap Kls 8
4 pages
SAC Higg Index Comm Guidelines v11
No ratings yet
SAC Higg Index Comm Guidelines v11
57 pages
Operat Or'S Manual: Electric Scissor Lifts
100% (1)
Operat Or'S Manual: Electric Scissor Lifts
42 pages
MELSEC iQ-R WS Safety Controller Ethernet Communication Function Block Reference - 00A
No ratings yet
MELSEC iQ-R WS Safety Controller Ethernet Communication Function Block Reference - 00A
30 pages
Repair Part List - MFI2568AES
No ratings yet
Repair Part List - MFI2568AES
30 pages
Primary Health Care Management
No ratings yet
Primary Health Care Management
11 pages
Magoos Originals
No ratings yet
Magoos Originals
18 pages
Home Economics
No ratings yet
Home Economics
2 pages
Sensor Lab 2
No ratings yet
Sensor Lab 2
4 pages
Understanding Indias Digital Personal Data Protection Act 2023 GLA Update
100% (2)
Understanding Indias Digital Personal Data Protection Act 2023 GLA Update
24 pages
MAT 112 Fundamentals of Pure Mathematics
No ratings yet
MAT 112 Fundamentals of Pure Mathematics
4 pages
Final Principle - 2 - GPHC - Action - Plan
No ratings yet
Final Principle - 2 - GPHC - Action - Plan
5 pages
How To Read The Hindu Newspaper For UPSC CSE
No ratings yet
How To Read The Hindu Newspaper For UPSC CSE
6 pages
Kotler-Chapter-10-MCQ Kotler-Chapter-10-MCQ
No ratings yet
Kotler-Chapter-10-MCQ Kotler-Chapter-10-MCQ
23 pages
SGS 5319 (M) Q1 - R0
No ratings yet
SGS 5319 (M) Q1 - R0
6 pages
Dictonaire and Teacher Role
No ratings yet
Dictonaire and Teacher Role
6 pages
I2 Adexa SAPcomparison
No ratings yet
I2 Adexa SAPcomparison
21 pages
The History of Biological Warfare 1
No ratings yet
The History of Biological Warfare 1
4 pages
2021082516110361260927cc0ad - Bai Tap Cau Dieu Kien Loai 1 2 3 0 Hon Hop Co Dap An
No ratings yet
2021082516110361260927cc0ad - Bai Tap Cau Dieu Kien Loai 1 2 3 0 Hon Hop Co Dap An
47 pages
Moonlight On Manila Bay by Fernando M
100% (1)
Moonlight On Manila Bay by Fernando M
4 pages
Skull Crusher 36 - Physics
No ratings yet
Skull Crusher 36 - Physics
3 pages
Behavior Rating Inventory of Executive Function R
No ratings yet
Behavior Rating Inventory of Executive Function R
7 pages
Low Power PDF
No ratings yet
Low Power PDF
42 pages
HeavyDuty&IndDampers Catalog
No ratings yet
HeavyDuty&IndDampers Catalog
24 pages
Chapter 4. Monoprotic Acid-Base Equilibria
No ratings yet
Chapter 4. Monoprotic Acid-Base Equilibria
38 pages
Magale 2
No ratings yet
Magale 2
8 pages
003 DOING - MY - CHORES Free Childrens Book by Monkey Pen
No ratings yet
003 DOING - MY - CHORES Free Childrens Book by Monkey Pen
22 pages

Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The

Uploaded by

Database Tuning: Database Tuning Describes A Group of Activities Used To Optimize and Homogenize The

Uploaded by

Database tuning

Database tuning describes a group of activities used to optimize and homogenize the

Processing resources are sometimes assigned to specific activities to

1. Algorithms for implementing SELECT operation

– (OP2):DNUMBER > 5 (DEPARTMENT)

– (OP4):DNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE)

– (OP5):ESSN=‘123456789’ AND PNO=10 (WORKS_ON)

 Many search methods can be used for simple selection: S1 through S6

– If the selection condition involves an equality comparison on a key attribute on

– If the selection condition involves an equality comparison on a non-key attribute

– Conjunctive condition: several simple conditions connected with the AND

– Two-way join: join on two files.

• The following examples of two-way JOIN operation (R A=B S) will be used:

– OP6: EMPLOYEE DNO=DNUMBER DEPARTMENT

3. Algorithms for implementing PROJECTION Operation

– Eliminate any duplicate tuples.

SELECT DISTINCT SSN, LNAME

(Duplication not removed if DISTINCT not used)

• Projection based on Sorting

• Projection Based on Hashing: Hashing is used to eliminate duplicates.

• Projection Based Indexing:

 An SQL query will be translated into a sequence of relational operations.

• Pipelining (stream-based) Alternative:

– Generate query execution code that correspond to algorithms for combinations of

2.1 Transformation of Relational Expression

2.1.1 Equivalence Rules for transforming relational expressions

In this section, we use the following notation:

 E1, E2, E3,… : denote relational algebra expressions

 If the selection condition F = F1 AND F2 where F1 involves only attributes of

 If the selection condition F = F1 AND F2 where F1 involves only attributes of

 If the join condition involves additional attributes say Z of E1 and W of E2 and

 Commuting Selection with set operations

 Converting a Catersian Product followed by a Selection into Join.

2.1.2 Example of Transformation

The SQL query is:

The initial query tree for this SQL query is

Figure 2: Initial query tree for query in example

We can transform the query in the following steps:

- Using transformation rule number 7 apply on Catersian Product and Selection

- Using rule number 13 to convert the sequence (Selection, Cartesian Product) in to a

Figure 4: Combine Catersian Product with

- Using rule number 8 to moves Projection operations down the tree.

2.2 Heuristic Algebraic Optimization algorithm

 Break up any Selection operation with conjunctive conditions into a cascade of

2.3 Converting query tree to query evaluation plan

Figure 6: An evaluation plan

Consider the expression

The corresponding query tree is

Figure 7: Sample query tree for a relational algebra expression

Using materialized evaluation, every immediate operation produce a temporary relation

We can implement the pipeline by create a query execution code

3. Cost Estimates in Query Optimization

3.1 Measure of Query Cost

The cost of a query execution plan includes the following components:

3.2 Catalog Information for Cost Estimation

 Number of tuples in a relation r; denote by nr

In addition to relation information, some information about indices is also used:

 Number of levels in index i.

Example of cost functions for SELECTION

Now, consider a selection in EMPLOYEE file

 f = 20 (there are 20 tuples can fit in one block)

Cost for doing linear search is b = 1000/20 = 50 block accesses

Cost for doing binary search on ordering attribute DeptID:

h(K) mod M = bucket to which data entry with key k belongs. (M = # of buckets)

Static External Hashing

H (K) = (a * K + b) usually works well.

Dynamic and Extendible Hashing Techniques

These techniques include the following:

 Good for database that grows and shrinks in size

Extendable hashing – one form of dynamic hashing

General Extendable Hash Structure

 This is another dynamic hashing scheme, an alternative to Extendible Hashing.

hi(key) = h(key) mod(2iN); N = initial # buckets

You might also like